Home Explore Blog CI



rustc

5th chunk of `src/queries/incremental-compilation-in-detail.md`
ce4ccd731a7efba1dee3af28bcd1a7f281defa9b2e3907ba0000000100000fea
`DefPath` is "stable" across changes made to the source code while a `DefId`
isn't.

There is also the `DefPathHash` which is just a 128-bit hash value of the
`DefPath`. The two contain the same information and we mostly use the
`DefPathHash` because it simpler to handle, being `Copy` and self-contained.

This principle of stable identifiers is used to make the data in the on-disk
cache resilient to source code changes. Instead of storing a `DefId`, we store
the `DefPathHash` and when we deserialize something from the cache, we map the
`DefPathHash` to the corresponding `DefId` in the *current* compilation session
(which is just a simple hash table lookup).

The `HirId`, used for identifying HIR components that don't have their own
`DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath`
and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`)
locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around,
the `LocalId`s within it are still the same.



### Checking Query Results For Changes: HashStable And Fingerprints

In order to do red-green-marking we often need to check if the result of a
query has changed compared to the result it had during the previous
compilation session. There are two performance problems with this though:

- We'd like to avoid having to load the previous result from disk just for
  doing the comparison. We already computed the new result and will use that.
  Also loading a result from disk will "pollute" the interners with data that
  is unlikely to ever be used.
- We don't want to store each and every result in the on-disk cache. For
  example, it would be wasted effort to persist things to disk that are
  already available in upstream crates.

The compiler avoids these problems by using so-called `Fingerprint`s. Each time
a new query result is computed, the query engine will compute a 128 bit hash
value of the result. We call this hash value "the `Fingerprint` of the query
result". The hashing is (and has to be) done "in a stable way". This means
that whenever something is hashed that might change in between compilation
sessions (e.g. a `DefId`), we instead hash its stable equivalent
(e.g. the corresponding `DefPath`). That's what the whole `HashStable`
infrastructure is for. This way `Fingerprint`s computed in two
different compilation sessions are still comparable.

The next step is to store these fingerprints along with the dependency graph.
This is cheap since fingerprints are just bytes to be copied. It's also cheap to
load the entire set of fingerprints together with the dependency graph.

Now, when red-green-marking reaches the point where it needs to check if a
result has changed, it can just compare the (already loaded) previous
fingerprint to the fingerprint of the new result.

This approach works rather well but it's not without flaws:

- There is a small possibility of hash collisions. That is, two different
  results could have the same fingerprint and the system would erroneously
  assume that the result hasn't changed, leading to a missed update.

  We mitigate this risk by using a high-quality hash function and a 128 bit
  wide hash value. Due to these measures the practical risk of a hash
  collision is negligible.

- Computing fingerprints is quite costly. It is the main reason why incremental
  compilation can be slower than non-incremental compilation. We are forced to
  use a good and thus expensive hash function, and we have to map things to
  their stable equivalents while doing the hashing.


### A Tale Of Two DepGraphs: The Old And The New

The initial description of dependency tracking glosses over a few details
that quickly become a head scratcher when actually trying to implement things.
In particular it's easy to overlook that we are actually dealing with *two*
dependency graphs: The one we built during the previous compilation session and
the one that we are building for the current compilation session.

When a compilation session starts, the compiler loads the previous dependency

Title: Stable Identifiers, Fingerprints, and Dual Dependency Graphs in Incremental Compilation
Summary
Incremental compilation uses stable identifiers like `DefPathHash` and `HirId` to maintain data validity across compilation sessions. To check for changes in query results, it uses `Fingerprint`s (128-bit hash values) computed using `HashStable` infrastructure, avoiding the need to load previous results from disk. These fingerprints are stored alongside the dependency graph for efficient comparison during red-green marking. The process involves dealing with two dependency graphs: the one from the previous compilation and the one being built for the current session.