# Parallel compilation
<div class="warning">
As of <!-- date-check --> November 2024,
the parallel front-end is undergoing significant changes,
so this page contains quite a bit of outdated information.
Tracking issue: <https://github.com/rust-lang/rust/issues/113349>
</div>
As of <!-- date-check --> November 2024, most of the rust compiler is now
parallelized.
- The codegen part is executed concurrently by default. You can use the `-C
codegen-units=n` option to control the number of concurrent tasks.
- The parts after HIR lowering to codegen such as type checking, borrowing
checking, and mir optimization are parallelized in the nightly version.
Currently, they are executed in serial by default, and parallelization is
manually enabled by the user using the `-Z threads = n` option.
- Other parts, such as lexical parsing, HIR lowering, and macro expansion, are
still executed in serial mode.
<div class="warning">
The following sections are kept for now but are quite outdated.
</div>
---
## Code generation
During monomorphization the compiler splits up all the code to
be generated into smaller chunks called _codegen units_. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary. This process
occurs in the [`rustc_codegen_ssa::base`] module.
## Data structures
The underlying thread-safe data-structures used in the parallel compiler
can be found in the [`rustc_data_structures::sync`] module. These data structures
are implemented differently depending on whether `parallel-compiler` is true.
| data structure | parallel | non-parallel |
| -------------------------------- | --------------------------------------------------- | ------------ |
| Lock\<T> | (parking_lot::Mutex\<T>) | (std::cell::RefCell) |
| RwLock\<T> | (parking_lot::RwLock\<T>) | (std::cell::RefCell) |
| MTLock\<T> | (Lock\<T>) | (T) |
| ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref |
| MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref |
| WriteGuard | parking_lot::RwLockWriteGuard | std::cell::RefMut |
| MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut |
| LockGuard | parking_lot::MutexGuard | std::cell::RefMut |
- These thread-safe data structures are interspersed during compilation which
can cause lock contention resulting in degraded performance as the number of
threads increases beyond 4. So we audit the use of these data structures
which leads to either a refactoring so as to reduce the use of shared state,
or the authoring of persistent documentation covering the specific of the
invariants, the atomicity, and the lock orderings.
- On the other hand, we still need to figure out what other invariants
during compilation might not hold in parallel compilation.
### WorkerLocal
[`WorkerLocal`] is a special data structure implemented for parallel compilers. It
holds worker-locals values for each thread in a thread pool. You can only
access the worker local value through the `Deref` `impl` on the thread pool it
was constructed on. It panics otherwise.
`WorkerLocal` is used to implement the `Arena` allocator in the parallel
environment, which is critical in parallel queries. Its implementation is
located in the [`rustc_data_structures::sync::worker_local`] module. However,
in the non-parallel compiler, it is implemented as `(OneThread<T>)`, whose `T`
can be accessed directly through `Deref::deref`.
## Parallel iterator
The parallel iterators provided by the [`rayon`] crate are easy ways to
implement parallelism. In the current implementation of the parallel compiler
we use a custom [fork][rustc-rayon] of `rayon` to run tasks in parallel.
Some iterator functions are implemented to run loops in parallel
when `parallel-compiler` is true.
| Function(Omit `Send` and `Sync`) | Introduction | Owning Module |