Persistence Challenges and Stable Identifiers in Incremental Compilation

invoke the query provider to re-compute the result. Re-computing the query might then itself involve recursively invoking more queries, which can mean we come back to the `try_mark_green()` algorithm for the dependencies recursively. ## The Real World: How Persistence Makes Everything Complicated The sections above described the underlying algorithm for incremental compilation but because the compiler process exits after being finished and takes the query context with its result cache with it into oblivion, we have to persist data to disk, so the next compilation session can make use of it. This comes with a whole new set of implementation challenges: - The query result cache is stored to disk, so they are not readily available for change comparison. - A subsequent compilation session will start off with new version of the code that has arbitrary changes applied to it. All kinds of IDs and indices that are generated from a global, sequential counter (e.g. `NodeId`, `DefId`, etc) might have shifted, making the persisted results on disk not immediately usable anymore because the same numeric IDs and indices might refer to completely new things in the new compilation session. - Persisting things to disk comes at a cost, so not every tiny piece of information should be actually cached in between compilation sessions. Fixed-sized, plain-old-data is preferred to complex things that need to run through an expensive (de-)serialization step. The following sections describe how the compiler solves these issues. ### A Question Of Stability: Bridging The Gap Between Compilation Sessions As noted before, various IDs (like `DefId`) are generated by the compiler in a way that depends on the contents of the source code being compiled. ID assignment is usually deterministic, that is, if the exact same code is compiled twice, the same things will end up with the same IDs. However, if something changes, e.g. a function is added in the middle of a file, there is no guarantee that anything will have the same ID as it had before. As a consequence we cannot represent the data in our on-disk cache the same way it is represented in memory. For example, if we just stored a piece of type information like `TyKind::FnDef(DefId, &'tcx Substs<'tcx>)` (as we do in memory) and then the contained `DefId` points to a different function in a new compilation session we'd be in trouble. The solution to this problem is to find "stable" forms for IDs which remain valid in between compilation sessions. For the most important case, `DefId`s, these are the so-called `DefPath`s. Each `DefId` has a corresponding `DefPath` but in place of a numeric ID, a `DefPath` is based on the path to the identified item, e.g. `std::collections::HashMap`. The advantage of an ID like this is that it is not affected by unrelated changes. For example, one can add a new function to `std::collections` but `std::collections::HashMap` would still be `std::collections::HashMap`. A `DefPath` is "stable" across changes made to the source code while a `DefId` isn't. There is also the `DefPathHash` which is just a 128-bit hash value of the `DefPath`. The two contain the same information and we mostly use the `DefPathHash` because it simpler to handle, being `Copy` and self-contained. This principle of stable identifiers is used to make the data in the on-disk cache resilient to source code changes. Instead of storing a `DefId`, we store the `DefPathHash` and when we deserialize something from the cache, we map the `DefPathHash` to the corresponding `DefId` in the *current* compilation session (which is just a simple hash table lookup). The `HirId`, used for identifying HIR components that don't have their own `DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath` and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`) locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around, the `LocalId`s within it are still the same.

Incremental compilation faces challenges in persisting data between compilation sessions due to the compiler process exiting and IDs shifting. The solution involves using stable identifiers like `DefPath` and `DefPathHash` instead of numeric IDs to maintain validity across code changes. These stable forms, based on paths (e.g., `std::collections::HashMap`), are resilient to unrelated changes. The `HirId` is another stable ID, consisting of a `DefPath` and a `LocalId`.