Home Explore Blog CI



rustc

1st chunk of `src/queries/incremental-compilation-in-detail.md`
cd3fd0f8111455322de653d34cf0ce700f34e9c7c1afdea80000000100000fce
# Incremental Compilation in detail

<!-- toc -->

The incremental compilation scheme is, in essence, a surprisingly
simple extension to the overall query system. It relies on the fact that:

  1. queries are pure functions -- given the same inputs, a query will always
     yield the same result, and
  2. the query model structures compilation in an acyclic graph that makes
     dependencies between individual computations explicit.

This chapter will explain how we can use these properties for making things
incremental and then goes on to discuss version implementation issues.

## A Basic Algorithm For Incremental Query Evaluation

As explained in the [query evaluation model primer][query-model], query
invocations form a directed-acyclic graph. Here's the example from the
previous chapter again:

```ignore
  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
                                                               |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
                    v                                          |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
```

Since every access from one query to another has to go through the query
context, we can record these accesses and thus actually build this dependency
graph in memory. With dependency tracking enabled, when compilation is done,
we know which queries were invoked (the nodes of the graph) and for each
invocation, which other queries or input has gone into computing the query's
result (the edges of the graph).

Now suppose we change the source code of our program so that
HIR of `bar` looks different than before. Our goal is to only recompute
those queries that are actually affected by the change while re-using
the cached results of all the other queries. Given the dependency graph we can
do exactly that. For a given query invocation, the graph tells us exactly
what data has gone into computing its results, we just have to follow the
edges until we reach something that has changed. If we don't encounter
anything that has changed, we know that the query still would evaluate to
the same result we already have in our cache.

Taking the `type_of(foo)` invocation from above as an example, we can check
whether the cached result is still valid by following the edges to its
inputs. The only edge leads to `Hir(foo)`, an input that has not been affected
by the change. So we know that the cached result for `type_of(foo)` is still
valid.

The story is a bit different for `type_check_item(foo)`: We again walk the
edges and already know that `type_of(foo)` is fine. Then we get to
`type_of(bar)` which we have not checked yet, so we walk the edges of
`type_of(bar)` and encounter `Hir(bar)` which *has* changed. Consequently
the result of `type_of(bar)` might yield a different result than what we
have in the cache and, transitively, the result of `type_check_item(foo)`
might have changed too. We thus re-run `type_check_item(foo)`, which in
turn will re-run `type_of(bar)`, which will yield an up-to-date result
because it reads the up-to-date version of `Hir(bar)`. Also, we re-run
`type_check_item(bar)` because result of `type_of(bar)` might have changed.


## The Problem With The Basic Algorithm: False Positives

If you read the previous paragraph carefully you'll notice that it says that
`type_of(bar)` *might* have changed because one of its inputs has changed.
There's also the possibility that it might still yield exactly the same
result *even though* its input has changed. Consider an example with a
simple query that just computes the sign of an integer:

```ignore
  IntValue(x) <---- sign_of(x) <--- some_other_query(x)
```

Let's say that `IntValue(x)` starts out as `1000` and then is set to `2000`.

Title: Incremental Compilation Algorithm and False Positives
Summary
This section details the incremental compilation scheme, which leverages the purity of queries and the acyclic dependency graph to recompute only affected queries after code changes. The basic algorithm involves tracking query dependencies and re-evaluating queries only if their inputs have changed. However, it acknowledges the issue of false positives, where queries might be recomputed unnecessarily even if their results remain the same despite input changes.