Query Evaluation Model: Deep Dive

# The Query Evaluation Model in detail  This chapter provides a deeper dive into the abstract model queries are built on. It does not go into implementation details but tries to explain the underlying logic. The examples here, therefore, have been stripped down and simplified and don't directly reflect the compilers internal APIs. ## What is a query? Abstractly we view the compiler's knowledge about a given crate as a "database" and queries are the way of asking the compiler questions about it, i.e. we "query" the compiler's "database" for facts. However, there's something special to this compiler database: It starts out empty and is filled on-demand when queries are executed. Consequently, a query must know how to compute its result if the database does not contain it yet. For doing so, it can access other queries and certain input values that the database is pre-filled with on creation. A query thus consists of the following things: - A name that identifies the query - A "key" that specifies what we want to look up - A result type that specifies what kind of result it yields - A "provider" which is a function that specifies how the result is to be computed if it isn't already present in the database. As an example, the name of the `type_of` query is `type_of`, its query key is a `DefId` identifying the item we want to know the type of, the result type is `Ty<'tcx>`, and the provider is a function that, given the query key and access to the rest of the database, can compute the type of the item identified by the key. So in some sense a query is just a function that maps the query key to the corresponding result. However, we have to apply some restrictions in order for this to be sound: - The key and result must be immutable values. - The provider function must be a pure function in the sense that for the same key it must always yield the same result. - The only parameters a provider function takes are the key and a reference to the "query context" (which provides access to the rest of the "database"). The database is built up lazily by invoking queries. The query providers will invoke other queries, for which the result is either already cached or computed by calling another query provider. These query provider invocations conceptually form a directed acyclic graph (DAG) at the leaves of which are input values that are already known when the query context is created. ## Caching/Memoization Results of query invocations are "memoized" which means that the query context will cache the result in an internal table and, when the query is invoked with the same query key again, will return the result from the cache instead of running the provider again. This caching is crucial for making the query engine efficient. Without memoization the system would still be sound (that is, it would yield the same results) but the same computations would be done over and over again. Memoization is one of the main reasons why query providers have to be pure functions. If calling a provider function could yield different results for each invocation (because it accesses some global mutable state) then we could not memoize the result. ## Input data When the query context is created, it is still empty: No queries have been executed, no results are cached. But the context already provides access to "input" data, i.e. pieces of immutable data that were computed before the context was created and that queries can access to do their computations. As of  January 2021, this input data consists mainly of the HIR map, upstream crate metadata, and the command-line options the compiler was invoked with; but in the future inputs will just consist of command-line options and a list of source files -- the HIR map will itself be provided by a query which processes these source files. Without inputs, queries would live in a void without anything to compute their result from (remember, query providers only have access to other queries and

This chapter details the abstract model for query evaluation in the compiler. It defines a query as having a name, key, result type, and a 'provider' function to compute the result if it's not already cached. The system relies on caching (memoization) of query results for efficiency, necessitating that provider functions be pure. The query context starts empty but has access to pre-computed 'input' data like the HIR map, crate metadata, and compiler options, which queries use to compute results.