invoke the query provider to re-compute the result. Re-computing the query might
then itself involve recursively invoking more queries, which can mean we come back
to the `try_mark_green()` algorithm for the dependencies recursively.
## The Real World: How Persistence Makes Everything Complicated
The sections above described the underlying algorithm for incremental
compilation but because the compiler process exits after being finished and
takes the query context with its result cache with it into oblivion, we have to
persist data to disk, so the next compilation session can make use of it.
This comes with a whole new set of implementation challenges:
- The query result cache is stored to disk, so they are not readily available
for change comparison.
- A subsequent compilation session will start off with new version of the code
that has arbitrary changes applied to it. All kinds of IDs and indices that
are generated from a global, sequential counter (e.g. `NodeId`, `DefId`, etc)
might have shifted, making the persisted results on disk not immediately
usable anymore because the same numeric IDs and indices might refer to
completely new things in the new compilation session.
- Persisting things to disk comes at a cost, so not every tiny piece of
information should be actually cached in between compilation sessions.
Fixed-sized, plain-old-data is preferred to complex things that need to run
through an expensive (de-)serialization step.
The following sections describe how the compiler solves these issues.
### A Question Of Stability: Bridging The Gap Between Compilation Sessions
As noted before, various IDs (like `DefId`) are generated by the compiler in a
way that depends on the contents of the source code being compiled. ID assignment
is usually deterministic, that is, if the exact same code is compiled twice,
the same things will end up with the same IDs. However, if something
changes, e.g. a function is added in the middle of a file, there is no
guarantee that anything will have the same ID as it had before.
As a consequence we cannot represent the data in our on-disk cache the same
way it is represented in memory. For example, if we just stored a piece
of type information like `TyKind::FnDef(DefId, &'tcx Substs<'tcx>)` (as we do
in memory) and then the contained `DefId` points to a different function in
a new compilation session we'd be in trouble.
The solution to this problem is to find "stable" forms for IDs which remain
valid in between compilation sessions. For the most important case, `DefId`s,
these are the so-called `DefPath`s. Each `DefId` has a
corresponding `DefPath` but in place of a numeric ID, a `DefPath` is based on
the path to the identified item, e.g. `std::collections::HashMap`. The
advantage of an ID like this is that it is not affected by unrelated changes.
For example, one can add a new function to `std::collections` but
`std::collections::HashMap` would still be `std::collections::HashMap`. A
`DefPath` is "stable" across changes made to the source code while a `DefId`
isn't.
There is also the `DefPathHash` which is just a 128-bit hash value of the
`DefPath`. The two contain the same information and we mostly use the
`DefPathHash` because it simpler to handle, being `Copy` and self-contained.
This principle of stable identifiers is used to make the data in the on-disk
cache resilient to source code changes. Instead of storing a `DefId`, we store
the `DefPathHash` and when we deserialize something from the cache, we map the
`DefPathHash` to the corresponding `DefId` in the *current* compilation session
(which is just a simple hash table lookup).
The `HirId`, used for identifying HIR components that don't have their own
`DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath`
and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`)
locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around,
the `LocalId`s within it are still the same.