Code Generation, Compiler Constraints, and Intermediate Representations

code for. This is called _monomorphization collection_ and it happens at the `MIR` level. ### Code generation We then begin what is simply called _code generation_ or _codegen_. The [code generation stage][codegen] is when higher-level representations of source are turned into an executable binary. Since `rustc` uses LLVM for code generation, the first step is to convert the `MIR` to `LLVM-IR`. This is where the `MIR` is actually monomorphized. The `LLVM-IR` is passed to LLVM, which does a lot more optimizations on it, emitting machine code which is basically assembly code with additional low-level types and annotations added (e.g. an ELF object or `WASM`). The different libraries/binaries are then linked together to produce the final binary. ## How it does it Now that we have a high-level view of what the compiler does to your code, let's take a high-level view of _how_ it does all that stuff. There are a lot of constraints and conflicting goals that the compiler needs to satisfy/optimize for. For example, - Compilation speed: how fast is it to compile a program? More/better compile-time analyses often means compilation is slower. - Also, we want to support incremental compilation, so we need to take that into account. How can we keep track of what work needs to be redone and what can be reused if the user modifies their program? - Also we can't store too much stuff in the incremental cache because it would take a long time to load from disk and it could take a lot of space on the user's system... - Compiler memory usage: while compiling a program, we don't want to use more memory than we need. - Program speed: how fast is your compiled program? More/better compile-time analyses often means the compiler can do better optimizations. - Program size: how large is the compiled binary? Similar to the previous point. - Compiler compilation speed: how long does it take to compile the compiler? This impacts contributors and compiler maintenance. - Implementation complexity: building a compiler is one of the hardest things a person/group can do, and Rust is not a very simple language, so how do we make the compiler's code base manageable? - Compiler correctness: the binaries produced by the compiler should do what the input programs says they do, and should continue to do so despite the tremendous amount of change constantly going on. - Integration: a number of other tools need to use the compiler in various ways (e.g. `cargo`, `clippy`, `MIRI`) that must be supported. - Compiler stability: the compiler should not crash or fail ungracefully on the stable channel. - Rust stability: the compiler must respect Rust's stability guarantees by not breaking programs that previously compiled despite the many changes that are always going on to its implementation. - Limitations of other tools: `rustc` uses LLVM in its backend, and LLVM has some strengths we leverage and some aspects we need to work around. So, as you continue your journey through the rest of the guide, keep these things in mind. They will often inform decisions that we make. ### Intermediate representations As with most compilers, `rustc` uses some intermediate representations (IRs) to facilitate computations. In general, working directly with the source code is extremely inconvenient and error-prone. Source code is designed to be human-friendly while at the same time being unambiguous, but it's less convenient for doing something like, say, type checking. Instead most compilers, including `rustc`, build some sort of IR out of the source code which is easier to analyze. `rustc` has a few IRs, each optimized for different purposes: - Token stream: the lexer produces a stream of tokens directly from the source code. This stream of tokens is easier for the parser to deal with than raw text. - Abstract Syntax Tree (`AST`): the abstract syntax tree is built from the stream of tokens produced by the lexer. It represents

This section discusses code generation in the Rust compiler, involving conversion from MIR to LLVM-IR and linking to produce the final binary. It then explores the various constraints and goals that the compiler must optimize for, such as compilation speed, memory usage, program speed/size, and stability. Finally, it introduces intermediate representations (IRs) used by `rustc` to facilitate computations, including the token stream and Abstract Syntax Tree (AST).