`DefPath` is "stable" across changes made to the source code while a `DefId`
isn't.
There is also the `DefPathHash` which is just a 128-bit hash value of the
`DefPath`. The two contain the same information and we mostly use the
`DefPathHash` because it simpler to handle, being `Copy` and self-contained.
This principle of stable identifiers is used to make the data in the on-disk
cache resilient to source code changes. Instead of storing a `DefId`, we store
the `DefPathHash` and when we deserialize something from the cache, we map the
`DefPathHash` to the corresponding `DefId` in the *current* compilation session
(which is just a simple hash table lookup).
The `HirId`, used for identifying HIR components that don't have their own
`DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath`
and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`)
locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around,
the `LocalId`s within it are still the same.
### Checking Query Results For Changes: HashStable And Fingerprints
In order to do red-green-marking we often need to check if the result of a
query has changed compared to the result it had during the previous
compilation session. There are two performance problems with this though:
- We'd like to avoid having to load the previous result from disk just for
doing the comparison. We already computed the new result and will use that.
Also loading a result from disk will "pollute" the interners with data that
is unlikely to ever be used.
- We don't want to store each and every result in the on-disk cache. For
example, it would be wasted effort to persist things to disk that are
already available in upstream crates.
The compiler avoids these problems by using so-called `Fingerprint`s. Each time
a new query result is computed, the query engine will compute a 128 bit hash
value of the result. We call this hash value "the `Fingerprint` of the query
result". The hashing is (and has to be) done "in a stable way". This means
that whenever something is hashed that might change in between compilation
sessions (e.g. a `DefId`), we instead hash its stable equivalent
(e.g. the corresponding `DefPath`). That's what the whole `HashStable`
infrastructure is for. This way `Fingerprint`s computed in two
different compilation sessions are still comparable.
The next step is to store these fingerprints along with the dependency graph.
This is cheap since fingerprints are just bytes to be copied. It's also cheap to
load the entire set of fingerprints together with the dependency graph.
Now, when red-green-marking reaches the point where it needs to check if a
result has changed, it can just compare the (already loaded) previous
fingerprint to the fingerprint of the new result.
This approach works rather well but it's not without flaws:
- There is a small possibility of hash collisions. That is, two different
results could have the same fingerprint and the system would erroneously
assume that the result hasn't changed, leading to a missed update.
We mitigate this risk by using a high-quality hash function and a 128 bit
wide hash value. Due to these measures the practical risk of a hash
collision is negligible.
- Computing fingerprints is quite costly. It is the main reason why incremental
compilation can be slower than non-incremental compilation. We are forced to
use a good and thus expensive hash function, and we have to map things to
their stable equivalents while doing the hashing.
### A Tale Of Two DepGraphs: The Old And The New
The initial description of dependency tracking glosses over a few details
that quickly become a head scratcher when actually trying to implement things.
In particular it's easy to overlook that we are actually dealing with *two*
dependency graphs: The one we built during the previous compilation session and
the one that we are building for the current compilation session.
When a compilation session starts, the compiler loads the previous dependency