# Backend Agnostic Codegen
<!-- toc -->
[`rustc_codegen_ssa`]
provides an abstract interface for all backends to implement,
namely LLVM, [Cranelift], and [GCC].
Below is some background information on the refactoring that created this
abstract interface.
## Refactoring of `rustc_codegen_llvm`
by Denis Merigoux, October 23rd 2018
### State of the code before the refactoring
All the code related to the compilation of MIR into LLVM IR was contained
inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
important elements:
* the `back` folder (7,800 LOC) implements the mechanisms for creating the
different object files and archive through LLVM, but also the communication
mechanisms for parallel code generation;
* the `debuginfo` (3,200 LOC) folder contains all code that passes debug
information down to LLVM;
* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
LLVM using the C++ API;
* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
IR;
* the `base.rs` (1,300 LOC) file contains some helper functions but also the
high-level code that launches the code generation and distributes the work.
* the `builder.rs` (1,200 LOC) file contains all the functions generating
individual LLVM IR instructions inside a basic block;
* the `common.rs` (450 LOC) contains various helper functions and all the
functions generating LLVM static values;
* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
The goal of this refactoring is to separate inside this crate code that is
specific to the LLVM from code that can be reused for other rustc backends. For
instance, the `mir` folder is almost entirely backend-specific but it relies
heavily on other parts of the crate. The separation of the code must not affect
the logic of the code nor its performance.
For these reasons, the separation process involves two transformations that
have to be done at the same time for the resulting code to compile:
1. replace all the LLVM-specific types by generics inside function signatures
and structure definitions;
2. encapsulate all functions calling the LLVM FFI inside a set of traits that
will define the interface between backend-agnostic code and the backend.
While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
suggestion by @eddyb).
### Generic types and structures
@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
work has been extended to all structures inside the `mir` folder and elsewhere,
as well as for LLVM's `BasicBlock` and `Type` types.
The two most important structures for the LLVM codegen are `CodegenCx` and
`Builder`. They are parametrized by multiple lifetime parameters and the type
for `Value`.
```rust,ignore
struct CodegenCx<'ll, 'tcx> {
/* ... */
}
struct Builder<'a, 'll, 'tcx> {
cx: &'a CodegenCx<'ll, 'tcx>,
/* ... */
}
```
`CodegenCx` is used to compile one codegen-unit that can contain multiple
functions, whereas `Builder` is created to compile one basic block.
The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
parameters, that correspond to the following:
* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
containing the program's information;
* `'a` is a short-lived reference of a `CodegenCx` or another object inside a
struct;
* `'ll` is the lifetime of references to LLVM objects such as `Value` or
`Type`.
Although there are already many lifetime parameters in the code, making it
generic uncovered situations where the borrow-checker was passing only due to
the special nature of the LLVM objects manipulated (they are extern pointers).
For instance, an additional lifetime parameter had to be added to
`LocalAnalyser` in `analyse.rs`, leading to the definition: