Incremental Compilation Algorithm and False Positives

# Incremental Compilation in detail  The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. It relies on the fact that: 1. queries are pure functions -- given the same inputs, a query will always yield the same result, and 2. the query model structures compilation in an acyclic graph that makes dependencies between individual computations explicit. This chapter will explain how we can use these properties for making things incremental and then goes on to discuss version implementation issues. ## A Basic Algorithm For Incremental Query Evaluation As explained in the [query evaluation model primer][query-model], query invocations form a directed-acyclic graph. Here's the example from the previous chapter again: ```ignore list_of_all_hir_items <----------------------------- type_check_crate() | | Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+ | | +-----------------+ | | | v | Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+ ``` Since every access from one query to another has to go through the query context, we can record these accesses and thus actually build this dependency graph in memory. With dependency tracking enabled, when compilation is done, we know which queries were invoked (the nodes of the graph) and for each invocation, which other queries or input has gone into computing the query's result (the edges of the graph). Now suppose we change the source code of our program so that HIR of `bar` looks different than before. Our goal is to only recompute those queries that are actually affected by the change while re-using the cached results of all the other queries. Given the dependency graph we can do exactly that. For a given query invocation, the graph tells us exactly what data has gone into computing its results, we just have to follow the edges until we reach something that has changed. If we don't encounter anything that has changed, we know that the query still would evaluate to the same result we already have in our cache. Taking the `type_of(foo)` invocation from above as an example, we can check whether the cached result is still valid by following the edges to its inputs. The only edge leads to `Hir(foo)`, an input that has not been affected by the change. So we know that the cached result for `type_of(foo)` is still valid. The story is a bit different for `type_check_item(foo)`: We again walk the edges and already know that `type_of(foo)` is fine. Then we get to `type_of(bar)` which we have not checked yet, so we walk the edges of `type_of(bar)` and encounter `Hir(bar)` which *has* changed. Consequently the result of `type_of(bar)` might yield a different result than what we have in the cache and, transitively, the result of `type_check_item(foo)` might have changed too. We thus re-run `type_check_item(foo)`, which in turn will re-run `type_of(bar)`, which will yield an up-to-date result because it reads the up-to-date version of `Hir(bar)`. Also, we re-run `type_check_item(bar)` because result of `type_of(bar)` might have changed. ## The Problem With The Basic Algorithm: False Positives If you read the previous paragraph carefully you'll notice that it says that `type_of(bar)` *might* have changed because one of its inputs has changed. There's also the possibility that it might still yield exactly the same result *even though* its input has changed. Consider an example with a simple query that just computes the sign of an integer: ```ignore IntValue(x) <---- sign_of(x) <--- some_other_query(x) ``` Let's say that `IntValue(x)` starts out as `1000` and then is set to `2000`.

This section details the incremental compilation scheme, which leverages the purity of queries and the acyclic dependency graph to recompute only affected queries after code changes. The basic algorithm involves tracking query dependencies and re-evaluating queries only if their inputs have changed. However, it acknowledges the issue of false positives, where queries might be recomputed unnecessarily even if their results remain the same despite input changes.