Common Table Expression (CTE) Materialization in SQL

the <literal>CYCLE</literal> clause and order by the path column. If breadth-first ordering is wanted, then specifying both <literal>SEARCH</literal> and <literal>CYCLE</literal> can be useful. </para> </tip> <para> A helpful trick for testing queries when you are not certain if they might loop is to place a <literal>LIMIT</literal> in the parent query. For example, this query would loop forever without the <literal>LIMIT</literal>: <programlisting> WITH RECURSIVE t(n) AS ( SELECT 1 UNION ALL SELECT n+1 FROM t ) SELECT n FROM t <emphasis>LIMIT 100</emphasis>; </programlisting> This works because <productname>PostgreSQL</productname>'s implementation evaluates only as many rows of a <literal>WITH</literal> query as are actually fetched by the parent query. Using this trick in production is not recommended, because other systems might work differently. Also, it usually won't work if you make the outer query sort the recursive query's results or join them to some other table, because in such cases the outer query will usually try to fetch all of the <literal>WITH</literal> query's output anyway. </para> </sect3> </sect2> <sect2 id="queries-with-cte-materialization"> <title>Common Table Expression Materialization</title> <para> A useful property of <literal>WITH</literal> queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling <literal>WITH</literal> queries. Thus, expensive calculations that are needed in multiple places can be placed within a <literal>WITH</literal> query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced <literal>WITH</literal> query, since that might affect all uses of the <literal>WITH</literal> query's output when it should affect only one. The multiply-referenced <literal>WITH</literal> query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.) </para> <para> However, if a <literal>WITH</literal> query is non-recursive and side-effect-free (that is, it is a <literal>SELECT</literal> containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the <literal>WITH</literal> query just once, but not if it references the <literal>WITH</literal> query more than once. You can override that decision by specifying <literal>MATERIALIZED</literal> to force separate calculation of the <literal>WITH</literal> query, or by specifying <literal>NOT MATERIALIZED</literal> to force it to be merged into the parent query. The latter choice risks duplicate computation of the <literal>WITH</literal> query, but it can still give a net savings if each usage of the <literal>WITH</literal> query needs only a small part of the <literal>WITH</literal> query's full output. </para> <para> A simple example of these rules is <programlisting> WITH w AS ( SELECT * FROM big_table ) SELECT * FROM w WHERE key = 123; </programlisting> This <literal>WITH</literal> query will be folded, producing the same execution plan as <programlisting> SELECT * FROM big_table WHERE key = 123; </programlisting> In particular, if there's an index on <structfield>key</structfield>, it will probably be used to fetch just the rows having <literal>key = 123</literal>. On the other hand, in <programlisting> WITH w AS ( SELECT

This section discusses the materialization of Common Table Expressions (CTEs) in SQL queries. It explains that CTEs are typically evaluated once per execution of the parent query, which can be beneficial for avoiding redundant calculations or preventing unwanted multiple evaluations of functions with side effects. However, this can also limit the optimizer's ability to push restrictions from the parent query into a multiply-referenced CTE. The text introduces the concepts of MATERIALIZED and NOT MATERIALIZED clauses, which can be used to override the default behavior of CTE evaluation. It provides examples to illustrate how CTEs are typically folded into the parent query when referenced only once, and how this behavior changes with multiple references or when using the MATERIALIZED clause. The section emphasizes the trade-offs between query optimization and the prevention of duplicate computations when deciding whether to materialize a CTE.