Parsing is organized by semantic construct. Separate
`parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir]
directory. The source file name follows the construct name. For example, the
following files are found in the `parser`:
- [`expr.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/expr.rs)
- [`pat.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/pat.rs)
- [`ty.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/ty.rs)
- [`stmt.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/stmt.rs)
This naming scheme is used across many compiler stages. You will find either a
file or directory with the same name across the parsing, lowering, type
checking, [Typed High-level Intermediate Representation (`THIR`)][thir] lowering, and
[Mid-level Intermediate Representation (`MIR`)][mir] building sources.
Macro-expansion, `AST`-validation, name-resolution, and early linting also take
place during the lexing and parsing stage.
returned from the parser while the standard [`Diag`] API is used
for error handling. Generally Rust's compiler will try to recover from errors
by parsing a superset of Rust's grammar, while also emitting an error type.
### `AST` lowering
Next the `AST` is converted into [High-Level Intermediate Representation
(`HIR`)][hir], a more compiler-friendly representation of the `AST`. This process
is called "lowering" and involves a lot of desugaring (the expansion and
formalizing of shortened or abbreviated syntax constructs) of things like loops
and `async fn`.
We then use the `HIR` to do [*type inference*] (the process of automatic
detection of the type of an expression), [*trait solving*] (the process of
pairing up an impl with each reference to a `trait`), and [*type checking*]. Type
checking is the process of converting the types found in the `HIR` ([`hir::Ty`]),
which represent what the user wrote, into the internal representation used by
the compiler ([`Ty<'tcx>`]). It's called type checking because the information
is used to verify the type safety, correctness and coherence of the types used
in the program.
### `MIR` lowering
The `HIR` is further lowered to `MIR`
(used for [borrow checking]) by constructing the `THIR` (an even more desugared `HIR` used for
pattern and exhaustiveness checking) to convert into `MIR`.
We do [many optimizations on the MIR][mir-opt] because it is generic and that
improves later code generation and compilation speed. It is easier to do some
optimizations at `MIR` level than at `LLVM-IR` level. For example LLVM doesn't seem
to be able to optimize the pattern the [`simplify_try`] `MIR`-opt looks for.
Rust code is also [_monomorphized_] during code generation, which means making
copies of all the generic code with the type parameters replaced by concrete
types. To do this, we need to collect a list of what concrete types to generate
code for. This is called _monomorphization collection_ and it happens at the
`MIR` level.
### Code generation
We then begin what is simply called _code generation_ or _codegen_. The [code
generation stage][codegen] is when higher-level representations of source are
turned into an executable binary. Since `rustc` uses LLVM for code generation,
the first step is to convert the `MIR` to `LLVM-IR`. This is where the `MIR` is
actually monomorphized. The `LLVM-IR` is passed to LLVM, which does a lot more
optimizations on it, emitting machine code which is basically assembly code
with additional low-level types and annotations added (e.g. an ELF object or
`WASM`). The different libraries/binaries are then linked together to produce
the final binary.
## How it does it
Now that we have a high-level view of what the compiler does to your code,
let's take a high-level view of _how_ it does all that stuff. There are a lot
of constraints and conflicting goals that the compiler needs to
satisfy/optimize for. For example,