# Overview of the compiler
<!-- toc -->
This chapter is about the overall process of compiling a program -- how
everything fits together.
The Rust compiler is special in two ways: it does things to your code that
other compilers don't do (e.g. borrow-checking) and it has a lot of
unconventional implementation choices (e.g. queries). We will talk about these
in turn in this chapter, and in the rest of the guide, we will look at the
individual pieces in more detail.
## What the compiler does to your code
So first, let's look at what the compiler does to your code. For now, we will
avoid mentioning how the compiler implements these steps except as needed.
### Invocation
Compilation begins when a user writes a Rust source program in text and invokes
the `rustc` compiler on it. The work that the compiler needs to perform is
defined by command-line options. For example, it is possible to enable nightly
features (`-Z` flags), perform `check`-only builds, or emit the LLVM
Intermediate Representation (`LLVM-IR`) rather than executable machine code.
The `rustc` executable call may be indirect through the use of `cargo`.
Command line argument parsing occurs in the [`rustc_driver`]. This crate
defines the compile configuration that is requested by the user and passes it
to the rest of the compilation process as a [`rustc_interface::Config`].
### Lexing and parsing
The raw Rust source text is analyzed by a low-level *lexer* located in
[`rustc_lexer`]. At this stage, the source text is turned into a stream of
atomic source code units known as _tokens_. The `lexer` supports the
Unicode character encoding.
The token stream passes through a higher-level lexer located in
[`rustc_parse`] to prepare for the next stage of the compile process. The
[`Lexer`] `struct` is used at this stage to perform a set of validations
and turn strings into interned symbols (_interning_ is discussed later).
[String interning] is a way of storing only one immutable
copy of each distinct string value.
The lexer has a small interface and doesn't depend directly on the diagnostic
infrastructure in `rustc`. Instead it provides diagnostics as plain data which
are emitted in [`rustc_parse::lexer`] as real diagnostics. The `lexer`
preserves full fidelity information for both IDEs and procedural macros
(sometimes referred to as "proc-macros").
The *parser* [translates the token stream from the `lexer` into an Abstract Syntax
Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax
analysis. The crate entry points for the `parser` are the
[`Parser::parse_crate_mod()`][parse_crate_mod] and [`Parser::parse_mod()`][parse_mod]
methods found in [`rustc_parse::parser::Parser`]. The external module parsing
entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod].
And the macro-`parser` entry point is [`Parser::parse_nonterminal()`][parse_nonterminal].
Parsing is performed with a set of [`parser`] utility methods including [`bump`],
[`check`], [`eat`], [`expect`], [`look_ahead`].
Parsing is organized by semantic construct. Separate
`parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir]
directory. The source file name follows the construct name. For example, the
following files are found in the `parser`:
- [`expr.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/expr.rs)
- [`pat.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/pat.rs)
- [`ty.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/ty.rs)
- [`stmt.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/stmt.rs)
This naming scheme is used across many compiler stages. You will find either a
file or directory with the same name across the parsing, lowering, type
checking, [Typed High-level Intermediate Representation (`THIR`)][thir] lowering, and
[Mid-level Intermediate Representation (`MIR`)][mir] building sources.
Macro-expansion, `AST`-validation, name-resolution, and early linting also take