Intermediate Representations (IRs) and the Query System in rustc

So, as you continue your journey through the rest of the guide, keep these things in mind. They will often inform decisions that we make. ### Intermediate representations As with most compilers, `rustc` uses some intermediate representations (IRs) to facilitate computations. In general, working directly with the source code is extremely inconvenient and error-prone. Source code is designed to be human-friendly while at the same time being unambiguous, but it's less convenient for doing something like, say, type checking. Instead most compilers, including `rustc`, build some sort of IR out of the source code which is easier to analyze. `rustc` has a few IRs, each optimized for different purposes: - Token stream: the lexer produces a stream of tokens directly from the source code. This stream of tokens is easier for the parser to deal with than raw text. - Abstract Syntax Tree (`AST`): the abstract syntax tree is built from the stream of tokens produced by the lexer. It represents pretty much exactly what the user wrote. It helps to do some syntactic sanity checking (e.g. checking that a type is expected where the user wrote one). - High-level IR (HIR): This is a sort of desugared `AST`. It's still close to what the user wrote syntactically, but it includes some implicit things such as some elided lifetimes, etc. This IR is amenable to type checking. - Typed `HIR` (THIR) _formerly High-level Abstract IR (HAIR)_: This is an intermediate between `HIR` and MIR. It is like the `HIR` but it is fully typed and a bit more desugared (e.g. method calls and implicit dereferences are made fully explicit). As a result, it is easier to lower to `MIR` from `THIR` than from HIR. - Middle-level IR (`MIR`): This IR is basically a Control-Flow Graph (CFG). A CFG is a type of diagram that shows the basic blocks of a program and how control flow can go between them. Likewise, `MIR` also has a bunch of basic blocks with simple typed statements inside them (e.g. assignment, simple computations, etc) and control flow edges to other basic blocks (e.g., calls, dropping values). `MIR` is used for borrow checking and other important dataflow-based checks, such as checking for uninitialized values. It is also used for a series of optimizations and for constant evaluation (via `MIRI`). Because `MIR` is still generic, we can do a lot of analyses here more efficiently than after monomorphization. - `LLVM-IR`: This is the standard form of all input to the LLVM compiler. `LLVM-IR` is a sort of typed assembly language with lots of annotations. It's a standard format that is used by all compilers that use LLVM (e.g. the clang C compiler also outputs `LLVM-IR`). `LLVM-IR` is designed to be easy for other compilers to emit and also rich enough for LLVM to run a bunch of optimizations on it. One other thing to note is that many values in the compiler are _interned_. This is a performance and memory optimization in which we allocate the values in a special allocator called an _[arena]_. Then, we pass around references to the values allocated in the arena. This allows us to make sure that identical values (e.g. types in your program) are only allocated once and can be compared cheaply by comparing pointers. Many of the intermediate representations are interned. ### Queries The first big implementation choice is Rust's use of the _query_ system in its compiler. The Rust compiler _is not_ organized as a series of passes over the code which execute sequentially. The Rust compiler does this to make incremental compilation possible -- that is, if the user makes a change to their program and recompiles, we want to do as little redundant work as possible to output the new binary. In `rustc`, all the major steps above are organized as a bunch of queries that call each other. For example, there is a query to ask for the type of something and another to ask for the optimized `MIR` of a function. These queries can call

This section details the intermediate representations (IRs) used by the Rust compiler (`rustc`) for analysis and optimization. It describes the token stream, Abstract Syntax Tree (AST), High-level IR (HIR), Typed HIR (THIR), Middle-level IR (MIR), and LLVM-IR. It also mentions interning, a performance optimization technique. The section then introduces the query system in rustc, which avoids sequential passes over code to enable incremental compilation.