Home Explore Blog CI



rustc

3rd chunk of `src/overview.md`
c789cf730c93c85dba94c0d67d038b560dfb23d97834e1a80000000100000fa8
code for. This is called _monomorphization collection_ and it happens at the
`MIR` level.


### Code generation

We then begin what is simply called _code generation_ or _codegen_. The [code
generation stage][codegen] is when higher-level representations of source are
turned into an executable binary. Since `rustc` uses LLVM for code generation,
the first step is to convert the `MIR` to `LLVM-IR`. This is where the `MIR` is
actually monomorphized. The `LLVM-IR` is passed to LLVM, which does a lot more
optimizations on it, emitting machine code which is basically assembly code
with additional low-level types and annotations added (e.g. an ELF object or
`WASM`). The different libraries/binaries are then linked together to produce
the final binary.


## How it does it

Now that we have a high-level view of what the compiler does to your code,
let's take a high-level view of _how_ it does all that stuff. There are a lot
of constraints and conflicting goals that the compiler needs to
satisfy/optimize for. For example,

- Compilation speed: how fast is it to compile a program? More/better
  compile-time analyses often means compilation is slower.
  - Also, we want to support incremental compilation, so we need to take that
    into account. How can we keep track of what work needs to be redone and
    what can be reused if the user modifies their program?
    - Also we can't store too much stuff in the incremental cache because
      it would take a long time to load from disk and it could take a lot
      of space on the user's system...
- Compiler memory usage: while compiling a program, we don't want to use more
  memory than we need.
- Program speed: how fast is your compiled program? More/better compile-time
  analyses often means the compiler can do better optimizations.
- Program size: how large is the compiled binary? Similar to the previous
  point.
- Compiler compilation speed: how long does it take to compile the compiler?
  This impacts contributors and compiler maintenance.
- Implementation complexity: building a compiler is one of the hardest
  things a person/group can do, and Rust is not a very simple language, so how
  do we make the compiler's code base manageable?
- Compiler correctness: the binaries produced by the compiler should do what
  the input programs says they do, and should continue to do so despite the
  tremendous amount of change constantly going on.
- Integration: a number of other tools need to use the compiler in
  various ways (e.g. `cargo`, `clippy`, `MIRI`) that must be supported.
- Compiler stability: the compiler should not crash or fail ungracefully on the
  stable channel.
- Rust stability: the compiler must respect Rust's stability guarantees by not
  breaking programs that previously compiled despite the many changes that are
  always going on to its implementation.
- Limitations of other tools: `rustc` uses LLVM in its backend, and LLVM has some
  strengths we leverage and some aspects we need to work around.

So, as you continue your journey through the rest of the guide, keep these
things in mind. They will often inform decisions that we make.

### Intermediate representations

As with most compilers, `rustc` uses some intermediate representations (IRs) to
facilitate computations. In general, working directly with the source code is
extremely inconvenient and error-prone. Source code is designed to be human-friendly while at
the same time being unambiguous, but it's less convenient for doing something
like, say, type checking.

Instead most compilers, including `rustc`, build some sort of IR out of the
source code which is easier to analyze. `rustc` has a few IRs, each optimized
for different purposes:

- Token stream: the lexer produces a stream of tokens directly from the source
  code. This stream of tokens is easier for the parser to deal with than raw
  text.
- Abstract Syntax Tree (`AST`): the abstract syntax tree is built from the stream
  of tokens produced by the lexer. It represents

Title: Code Generation, Compiler Constraints, and Intermediate Representations
Summary
This section discusses code generation in the Rust compiler, involving conversion from MIR to LLVM-IR and linking to produce the final binary. It then explores the various constraints and goals that the compiler must optimize for, such as compilation speed, memory usage, program speed/size, and stability. Finally, it introduces intermediate representations (IRs) used by `rustc` to facilitate computations, including the token stream and Abstract Syntax Tree (AST).