Home Explore Blog CI



rustc

7th chunk of `src/queries/incremental-compilation-in-detail.md`
e796e5cccd1b348b060bc7e23ca52178e37a9d14d4ba79aa0000000100000fc6
The system described so far has a somewhat subtle property: If all inputs of a
dep-node are green then the dep-node itself can be marked as green without
computing or loading the corresponding query result. Applying this property
transitively often leads to the situation that some intermediate results are
never actually loaded from disk, as in the following example:

```ignore
   input(A) <-- intermediate_query(B) <-- leaf_query(C)
```

The compiler might need the value of `leaf_query(C)` in order to generate some
output artifact. If it can mark `leaf_query(C)` as green, it will load the
result from the on-disk cache. The result of `intermediate_query(B)` is never
loaded though. As a consequence, when the compiler persists the *new* result
cache by writing all in-memory query results to disk, `intermediate_query(B)`
will not be in memory and thus will be missing from the new result cache.

If there subsequently is another compilation session that actually needs the
result of `intermediate_query(B)` it will have to be re-computed even though we
had a perfectly valid result for it in the cache just before.

In order to prevent this from happening, the compiler does something called
"cache promotion": Before emitting the new result cache it will walk all green
dep-nodes and make sure that their query result is loaded into memory. That way
the result cache doesn't unnecessarily shrink again.



# Incremental Compilation and the Compiler Backend

The compiler backend, the part involving LLVM, is using the query system but
it is not implemented in terms of queries itself. As a consequence it does not
automatically partake in dependency tracking. However, the manual integration
with the tracking system is pretty straight-forward. The compiler simply tracks
what queries get invoked when generating the initial LLVM version of each
codegen unit (CGU), which results in a dep-node for each CGU. In subsequent
compilation sessions it then tries to mark the dep-node for a CGU as green. If
it succeeds, it knows that the corresponding object and bitcode files on disk
are still valid. If it doesn't succeed, the entire CGU has to be recompiled.

This is the same approach that is used for regular queries. The main differences
are:

 - that we cannot easily compute a fingerprint for LLVM modules (because
   they are opaque C++ objects),

 - that the logic for dealing with cached values is rather different from
   regular queries because here we have bitcode and object files instead of
   serialized Rust values in the common result cache file, and

 - the operations around LLVM are so expensive in terms of computation time and
   memory consumption that we need to have tight control over what is
   executed when and what stays in memory for how long.

The query system could probably be extended with general purpose mechanisms to
deal with all of the above but so far that seemed like more trouble than it
would save.



## Query Modifiers

The query system allows for applying [modifiers][mod] to queries. These
modifiers affect certain aspects of how the system treats the query with
respect to incremental compilation:

 - `eval_always` - A query with the `eval_always` attribute is re-executed
   unconditionally during incremental compilation. I.e. the system will not
   even try to mark the query's dep-node as green. This attribute has two use
   cases:

    - `eval_always` queries can read inputs (from files, global state, etc).
      They can also produce side effects like writing to files and changing global state.

    - Some queries are very likely to be re-evaluated because their result
      depends on the entire source code. In this case `eval_always` can be used
      as an optimization because the system can skip recording dependencies in
      the first place.

 - `no_hash` - Applying `no_hash` to a query tells the system to not compute
   the fingerprint of the query's result. This has two consequences:

    - Not computing the fingerprint can save quite a bit of time because

Title: Cache Promotion, Incremental Compilation with LLVM, and Query Modifiers
Summary
To prevent the result cache from shrinking due to intermediate results not being loaded, the compiler performs "cache promotion" by loading the query results of all green dep-nodes before emitting the new result cache. The compiler backend, which uses LLVM, manually integrates with the tracking system to track dependencies for codegen units. It tries to mark the dep-node for a CGU as green and if succeeds, it knows that the corresponding object and bitcode files on disk are still valid. The query system allows modifiers like `eval_always` (re-executes unconditionally) and `no_hash` (skips fingerprint computation) to control query behavior during incremental compilation.