Advanced Cycle Detection in SQL Recursive Queries

<structfield>path</structfield> to the loop-prone query: <programlisting> WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS ( SELECT g.id, g.link, g.data, 0, <emphasis>false, ARRAY[g.id]</emphasis> FROM graph g UNION ALL SELECT g.id, g.link, g.data, sg.depth + 1, <emphasis>g.id = ANY(path), path || g.id</emphasis> FROM graph g, search_graph sg WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis> ) SELECT * FROM search_graph; </programlisting> Aside from preventing cycles, the array value is often useful in its own right as representing the <quote>path</quote> taken to reach any particular row. </para> <para> In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows. For example, if we needed to compare fields <structfield>f1</structfield> and <structfield>f2</structfield>: <programlisting> WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS ( SELECT g.id, g.link, g.data, 0, <emphasis>false, ARRAY[ROW(g.f1, g.f2)]</emphasis> FROM graph g UNION ALL SELECT g.id, g.link, g.data, sg.depth + 1, <emphasis>ROW(g.f1, g.f2) = ANY(path), path || ROW(g.f1, g.f2)</emphasis> FROM graph g, search_graph sg WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis> ) SELECT * FROM search_graph; </programlisting> </para> <tip> <para> Omit the <literal>ROW()</literal> syntax in the common case where only one field needs to be checked to recognize a cycle. This allows a simple array rather than a composite-type array to be used, gaining efficiency. </para> </tip> <para> There is built-in syntax to simplify cycle detection. The above query can also be written like this: <programlisting> WITH RECURSIVE search_graph(id, link, data, depth) AS ( SELECT g.id, g.link, g.data, 1 FROM graph g UNION ALL SELECT g.id, g.link, g.data, sg.depth + 1 FROM graph g, search_graph sg WHERE g.id = sg.link ) <emphasis>CYCLE id SET is_cycle USING path</emphasis> SELECT * FROM search_graph; </programlisting> and it will be internally rewritten to the above form. The <literal>CYCLE</literal> clause specifies first the list of columns to track for cycle detection, then a column name that will show whether a cycle has been detected, and finally the name of another column that will track the path. The cycle and path columns will implicitly be added to the output rows of the CTE. </para> <tip> <para> The cycle path column is computed in the same way as the depth-first ordering column show in the previous section. A query can have both a <literal>SEARCH</literal> and a <literal>CYCLE</literal> clause, but a depth-first search specification and a cycle detection specification would create redundant computations, so it's more efficient to just use the <literal>CYCLE</literal> clause and order by the path column. If breadth-first ordering is wanted, then specifying both <literal>SEARCH</literal> and <literal>CYCLE</literal> can be useful. </para> </tip> <para> A helpful trick for testing queries when you are not certain if they might loop is to place a <literal>LIMIT</literal> in the parent query. For example, this query would loop forever without the <literal>LIMIT</literal>: <programlisting> WITH RECURSIVE t(n) AS ( SELECT 1 UNION ALL SELECT n+1 FROM t ) SELECT n FROM t <emphasis>LIMIT 100</emphasis>; </programlisting> This works because <productname>PostgreSQL</productname>'s implementation evaluates only as many rows of a <literal>WITH</literal> query as are actually fetched by the parent query. Using this trick in production is not recommended, because other systems might work differently. Also, it usually won't work if you make the outer query sort the recursive

This section delves deeper into cycle detection techniques for SQL recursive queries. It expands on the previous example by demonstrating how to handle cases where multiple fields need to be checked for cycle detection. The text introduces a method using arrays of rows to track multiple fields, showing how to modify the query structure to accommodate this. It also presents a built-in syntax for simplifying cycle detection, using the CYCLE clause. This clause automatically adds cycle detection and path tracking to the query without manually modifying the CTE structure. The section concludes with tips on efficiently combining SEARCH and CYCLE clauses, and a practical trick for testing potentially infinite recursive queries using LIMIT. The content emphasizes the importance of proper cycle detection in preventing infinite loops and providing useful path information in recursive query results.