Monomorphization in Rust

# Monomorphization  As you probably know, Rust has a very expressive type system that has extensive support for generic types. But of course, assembly is not generic, so we need to figure out the concrete types of all the generics before the code can execute. Different languages handle this problem differently. For example, in some languages, such as Java, we may not know the most precise type of value until runtime. In the case of Java, this is ok because (almost) all variables are reference values anyway (i.e. pointers to a heap allocated object). This flexibility comes at the cost of performance, since all accesses to an object must dereference a pointer. Rust takes a different approach: it _monomorphizes_ all generic types. This means that compiler stamps out a different copy of the code of a generic function for each concrete type needed. For example, if I use a `Vec<u64>` and a `Vec<String>` in my code, then the generated binary will have two copies of the generated code for `Vec`: one for `Vec<u64>` and another for `Vec<String>`. The result is fast programs, but it comes at the cost of compile time (creating all those copies can take a while) and binary size (all those copies might take a lot of space). Monomorphization is the first step in the backend of the Rust compiler. ## Collection First, we need to figure out what concrete types we need for all the generic things in our program. This is called _collection_, and the code that does this is called the _monomorphization collector_. Take this example: ```rust fn banana() { peach::<u64>(); } fn main() { banana(); } ``` The monomorphization collector will give you a list of `[main, banana, peach::<u64>]`. These are the functions that will have machine code generated for them. Collector will also add things like statics to that list. See [the collector rustdocs][collect] for more info. The monomorphization collector is run just before MIR lowering and codegen. [`rustc_codegen_ssa::base::codegen_crate`][codegen1] calls the [`collect_and_partition_mono_items`][mono] query, which does monomorphization collection and then partitions them into [codegen units](../appendix/glossary.md#codegen-unit). ## Codegen Unit (CGU) partitioning For better incremental build times, the CGU partitioner creates two CGU for each source level modules. One is for "stable" i.e. non-generic code and the other is more volatile code i.e. monomorphized/specialized instances. For dependencies, consider Crate A and Crate B, such that Crate B depends on Crate A. The following table lists different scenarios for a function in Crate A that might be used by one or more modules in Crate B. | Crate A function | Behavior | | - | - | | Non-generic function | Crate A function doesn't appear in any codegen units of Crate B | | Non-generic `#[inline]` function | Crate A function appears within a single CGU of Crate B, and exists even after post-inlining stage| | Generic function | Regardless of inlining, all monomorphized (specialized) functions <br> from Crate A appear within a single codegen unit for Crate B. <br> The codegen unit exists even after the post inlining stage.| | Generic `#[inline]` function | - same - | For more details about the partitioner read the module level [documentation].

Rust employs monomorphization to handle generic types, creating specialized code for each concrete type used. This process, involving collection and codegen unit (CGU) partitioning, optimizes for performance at the cost of compile time and binary size. The monomorphization collector identifies necessary concrete types, while CGU partitioning divides code into stable and volatile units for improved incremental builds, particularly concerning dependencies between crates.