assume that the result hasn't changed, leading to a missed update.
We mitigate this risk by using a high-quality hash function and a 128 bit
wide hash value. Due to these measures the practical risk of a hash
collision is negligible.
- Computing fingerprints is quite costly. It is the main reason why incremental
compilation can be slower than non-incremental compilation. We are forced to
use a good and thus expensive hash function, and we have to map things to
their stable equivalents while doing the hashing.
### A Tale Of Two DepGraphs: The Old And The New
The initial description of dependency tracking glosses over a few details
that quickly become a head scratcher when actually trying to implement things.
In particular it's easy to overlook that we are actually dealing with *two*
dependency graphs: The one we built during the previous compilation session and
the one that we are building for the current compilation session.
When a compilation session starts, the compiler loads the previous dependency
graph into memory as an immutable piece of data. Then, when a query is invoked,
it will first try to mark the corresponding node in the graph as green. This
means really that we are trying to mark the node in the *previous* dep-graph
as green that corresponds to the query key in the *current* session. How do we
do this mapping between current query key and previous `DepNode`? The answer
is again `Fingerprint`s: Nodes in the dependency graph are identified by a
fingerprint of the query key. Since fingerprints are stable across compilation
sessions, computing one in the current session allows us to find a node
in the dependency graph from the previous session. If we don't find a node with
the given fingerprint, it means that the query key refers to something that
did not yet exist in the previous session.
So, having found the dep-node in the previous dependency graph, we can look
up its dependencies (i.e. also dep-nodes in the previous graph) and continue with
the rest of the try-mark-green algorithm. The next interesting thing happens
when we successfully marked the node as green. At that point we copy the node
and the edges to its dependencies from the old graph into the new graph. We
have to do this because the new dep-graph cannot acquire the
node and edges via the regular dependency tracking. The tracking system can
only record edges while actually running a query -- but running the query,
although we have the result already cached, is exactly what we want to avoid.
Once the compilation session has finished, all the unchanged parts have been
copied over from the old into the new dependency graph, while the changed parts
have been added to the new graph by the tracking system. At this point, the
new graph is serialized out to disk, alongside the query result cache, and can
act as the previous dep-graph in a subsequent compilation session.
### Didn't You Forget Something?: Cache Promotion
The system described so far has a somewhat subtle property: If all inputs of a
dep-node are green then the dep-node itself can be marked as green without
computing or loading the corresponding query result. Applying this property
transitively often leads to the situation that some intermediate results are
never actually loaded from disk, as in the following example:
```ignore
input(A) <-- intermediate_query(B) <-- leaf_query(C)
```
The compiler might need the value of `leaf_query(C)` in order to generate some
output artifact. If it can mark `leaf_query(C)` as green, it will load the
result from the on-disk cache. The result of `intermediate_query(B)` is never
loaded though. As a consequence, when the compiler persists the *new* result
cache by writing all in-memory query results to disk, `intermediate_query(B)`
will not be in memory and thus will be missing from the new result cache.
If there subsequently is another compilation session that actually needs the
result of `intermediate_query(B)` it will have to be re-computed even though we