Canonicalization in Trait Solving

# Canonicalization > **NOTE**: FIXME: The content of this chapter has some overlap with > [Next-gen trait solving Canonicalization chapter](../solve/canonicalization.html). > It is suggested to reorganize these contents in the future. Canonicalization is the process of **isolating** an inference value from its context. It is a key part of implementing [canonical queries][cq], and you may wish to read the parent chapter to get more context. Canonicalization is really based on a very simple concept: every [inference variable](../type-inference.html#vars) is always in one of two states: either it is **unbound**, in which case we don't know yet what type it is, or it is **bound**, in which case we do. So to isolate some data-structure T that contains types/regions from its environment, we just walk down and find the unbound variables that appear in T; those variables get replaced with "canonical variables", starting from zero and numbered in a fixed order (left to right, for the most part, but really it doesn't matter as long as it is consistent). So, for example, if we have the type `X = (?T, ?U)`, where `?T` and `?U` are distinct, unbound inference variables, then the canonical form of `X` would be `(?0, ?1)`, where `?0` and `?1` represent these **canonical placeholders**. Note that the type `Y = (?U, ?T)` also canonicalizes to `(?0, ?1)`. But the type `Z = (?T, ?T)` would canonicalize to `(?0, ?0)` (as would `(?U, ?U)`). In other words, the exact identity of the inference variables is not important – unless they are repeated. We use this to improve caching as well as to detect cycles and other things during trait resolution. Roughly speaking, the idea is that if two trait queries have the same canonical form, then they will get the same answer. That answer will be expressed in terms of the canonical variables (`?0`, `?1`), which we can then map back to the original variables (`?T`, `?U`). ## Canonicalizing the query To see how it works, imagine that we are asking to solve the following trait query: `?A: Foo<'static, ?B>`, where `?A` and `?B` are unbound. This query contains two unbound variables, but it also contains the lifetime `'static`. The trait system generally ignores all lifetimes and treats them equally, so when canonicalizing, we will *also* replace any [free lifetime](../appendix/background.html#free-vs-bound) with a canonical variable (Note that `'static` is actually a _free_ lifetime variable here. We are not considering it in the typing context of the whole program but only in the context of this trait reference. Mathematically, we are not quantifying over the whole program, but only this obligation). Therefore, we get the following result: ```text ?0: Foo<'?1, ?2> ``` Sometimes we write this differently, like so: ```text for<T,L,T> { ?0: Foo<'?1, ?2> } ``` This `for<>` gives some information about each of the canonical variables within. In this case, each `T` indicates a type variable, so `?0` and `?2` are types; the `L` indicates a lifetime variable, so `?1` is a lifetime. The `canonicalize` method *also* gives back a `CanonicalVarValues` array OV with the "original values" for each canonicalized variable: ```text [?A, 'static, ?B] ``` We'll need this vector OV later, when we process the query response. ## Executing the query Once we've constructed the canonical query, we can try to solve it. To do so, we will wind up creating a fresh inference context and **instantiating** the canonical query in that context. The idea is that we create a substitution S from the canonical form containing a fresh inference variable (of suitable kind) for each canonical variable. So, for our example query: ```text for<T,L,T> { ?0: Foo<'?1, ?2> } ``` the substitution S might be: ```text S = [?A, '?B, ?C] ``` We can then replace the bound canonical variables (`?0`, etc) with these inference variables, yielding the following fully instantiated query: ```text ?A: Foo<'?B, ?C> ``` Remember that substitution S though! We're going to need it later.

Canonicalization isolates inference values by replacing unbound inference variables and free lifetimes within a data structure with canonical variables (e.g., ?0, ?1). This process is crucial for implementing canonical queries, improving caching, and detecting cycles during trait resolution. The goal is to transform queries into a standard form that can be efficiently solved and whose results can be mapped back to the original variables.