The Projection Query Pattern and Shortcomings of the Current Incremental Compilation System

allows the red-green system to do its change detection even if there is no query key available for a given dep-node -- something which is needed for handling trait selection because it is not based on queries. ## The Projection Query Pattern It's interesting to note that `eval_always` and `no_hash` can be used together in the so-called "projection query" pattern. It is often the case that there is one query that depends on the entirety of the compiler's input (e.g. the indexed HIR) and another query that projects individual values out of this monolithic value (e.g. a HIR item with a certain `DefId`). These projection queries allow for building change propagation "firewalls" because even if the result of the monolithic query changes (which it is very likely to do) the small projections can still mostly be marked as green. ```ignore +------------+ | | +---------------+ +--------+ | | <---------| projection(x) | <---------| foo(a) | | | +---------------+ +--------+ | | | monolithic | +---------------+ +--------+ | query | <---------| projection(y) | <---------| bar(b) | | | +---------------+ +--------+ | | | | +---------------+ +--------+ | | <---------| projection(z) | <---------| baz(c) | | | +---------------+ +--------+ +------------+ ``` Let's assume that the result `monolithic_query` changes so that also the result of `projection(x)` has changed, i.e. both their dep-nodes are being marked as red. As a consequence `foo(a)` needs to be re-executed; but `bar(b)` and `baz(c)` can be marked as green. However, if `foo`, `bar`, and `baz` would have directly depended on `monolithic_query` then all of them would have had to be re-evaluated. This pattern works even without `eval_always` and `no_hash` but the two modifiers can be used to avoid unnecessary overhead. If the monolithic query is likely to change at any minor modification of the compiler's input it makes sense to mark it as `eval_always`, thus getting rid of its dependency tracking cost. And it always makes sense to mark the monolithic query as `no_hash` because we have the projections to take care of keeping things green as much as possible. # Shortcomings of the Current System There are many things that still can be improved. ## Incrementality of on-disk data structures The current system is not able to update on-disk caches and the dependency graph in-place. Instead it has to rewrite each file entirely in each compilation session. The overhead of doing so is a few percent of total compilation time. ## Unnecessary data dependencies Data structures used as query results could be factored in a way that removes edges from the dependency graph. Especially "span" information is very volatile, so including it in query result will increase the chance that the result won't be reusable. See <https://github.com/rust-lang/rust/issues/47389> for more information.

The projection query pattern involves a monolithic query that depends on the entire compiler input and projection queries that extract individual values. This allows for change propagation firewalls, minimizing the impact of changes in the monolithic query. The pattern can be optimized using `eval_always` and `no_hash`. The current system has shortcomings, including the inability to update on-disk caches in-place and unnecessary data dependencies that can reduce reusability.