Query DAG Ordering and Algorithm Improvements

query DAG. The "reads" is the set of queries that Q executed during its execution. - For each query R in `reads(Q)`, we recursively demand the color of R using try-mark-green. - Note: it is important that we visit each node in `reads(Q)` in same order as they occurred in the original compilation. See [the section on the query DAG below](#dag). - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty. - We re-execute Q and compare the hash of its result to the hash of the result from the previous compilation. - If the hash has not changed, we can mark Q as **green** and return. - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case, we can color Q as **green** and return. <a id="dag"></a> ### The query DAG The query DAG code is stored in [`compiler/rustc_middle/src/dep_graph`][dep_graph]. Construction of the DAG is done by instrumenting the query execution. One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the **order** in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this: ```rust,ignore fn main_query(tcx) { if tcx.subquery1() { tcx.subquery2() } else { tcx.subquery3() } } ``` Now imagine that in the first compilation, `main_query` starts by executing `subquery1`, and this returns true. In that case, the next query `main_query` executes will be `subquery2`, and `subquery3` will not be executed at all. But now imagine that in the **next** compilation, the input has changed such that `subquery1` returns **false**. In this case, `subquery2` would never execute. If try-mark-green were to visit `reads(main_query)` out of order, however, it might visit `subquery2` before `subquery1`, and hence execute it. This can lead to ICEs and other problems in the compiler. ## Improvements to the basic algorithm In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save **the hashes** of all the subqueries that we performed. Then, in select cases, we **also** save the results. This is why the incremental algorithm separates computing the **color** of a node, which often does not require its value, from computing the **result** of a node. Computing the result is done via a simple algorithm like so: - Check if a saved result for Q is available. If so, compute the color of Q. If Q is green, deserialize and return the saved result. - Otherwise, execute Q. - We can then compare the hash of the result and color Q as green if it did not change. ## Resources The initial design document can be found [here][initial-design], which expands on the memoization details, provides more high-level overview and motivation for this system. # Footnotes

The query DAG tracks the order in which queries are read, which is important for the try-mark-green algorithm to walk queries back in the same order. This is because once a subquery returns red, the execution path may change. To improve efficiency, the algorithm saves hashes of subqueries and only saves results in select cases. The algorithm separates computing the color of a node from computing its result. Resources, including the initial design document, are provided.