Cycle Detection in SQL Recursive Queries

it is perhaps unsound to rely on it. The order of the rows within each level is certainly undefined, so some explicit ordering might be desired in any case. </para> </tip> <para> There is built-in syntax to compute a depth- or breadth-first sort column. For example: <programlisting> WITH RECURSIVE search_tree(id, link, data) AS ( SELECT t.id, t.link, t.data FROM tree t UNION ALL SELECT t.id, t.link, t.data FROM tree t, search_tree st WHERE t.id = st.link ) <emphasis>SEARCH DEPTH FIRST BY id SET ordercol</emphasis> SELECT * FROM search_tree ORDER BY ordercol; WITH RECURSIVE search_tree(id, link, data) AS ( SELECT t.id, t.link, t.data FROM tree t UNION ALL SELECT t.id, t.link, t.data FROM tree t, search_tree st WHERE t.id = st.link ) <emphasis>SEARCH BREADTH FIRST BY id SET ordercol</emphasis> SELECT * FROM search_tree ORDER BY ordercol; </programlisting> This syntax is internally expanded to something similar to the above hand-written forms. The <literal>SEARCH</literal> clause specifies whether depth- or breadth first search is wanted, the list of columns to track for sorting, and a column name that will contain the result data that can be used for sorting. That column will implicitly be added to the output rows of the CTE. </para> </sect3> <sect3 id="queries-with-cycle"> <title>Cycle Detection</title> <para> When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely. Sometimes, using <literal>UNION</literal> instead of <literal>UNION ALL</literal> can accomplish this by discarding rows that duplicate previous output rows. However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before. The standard method for handling such situations is to compute an array of the already-visited values. For example, consider again the following query that searches a table <structname>graph</structname> using a <structfield>link</structfield> field: <programlisting> WITH RECURSIVE search_graph(id, link, data, depth) AS ( SELECT g.id, g.link, g.data, 0 FROM graph g UNION ALL SELECT g.id, g.link, g.data, sg.depth + 1 FROM graph g, search_graph sg WHERE g.id = sg.link ) SELECT * FROM search_graph; </programlisting> This query will loop if the <structfield>link</structfield> relationships contain cycles. Because we require a <quote>depth</quote> output, just changing <literal>UNION ALL</literal> to <literal>UNION</literal> would not eliminate the looping. Instead we need to recognize whether we have reached the same row again while following a particular path of links. We add two columns <structfield>is_cycle</structfield> and <structfield>path</structfield> to the loop-prone query: <programlisting> WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS ( SELECT g.id, g.link, g.data, 0, <emphasis>false, ARRAY[g.id]</emphasis> FROM graph g UNION ALL SELECT g.id, g.link, g.data, sg.depth + 1, <emphasis>g.id = ANY(path), path || g.id</emphasis> FROM graph g, search_graph sg WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis> ) SELECT * FROM search_graph; </programlisting> Aside from preventing cycles, the array value is often useful in its own right as representing the <quote>path</quote> taken to reach any particular row. </para> <para> In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows. For example, if we needed to compare fields <structfield>f1</structfield> and <structfield>f2</structfield>: <programlisting> WITH RECURSIVE search_graph(id, link, data,

This section discusses the importance of cycle detection in recursive SQL queries to prevent infinite loops. It introduces a method using arrays to track visited values, demonstrating how to modify a recursive query to include cycle detection. The technique involves adding columns for 'is_cycle' and 'path' to the query structure. The 'path' column stores an array of previously visited node IDs, while 'is_cycle' checks if the current node has been visited before. The text provides SQL examples for implementing this approach, including cases where multiple fields need to be checked for cycle detection. It emphasizes that this method not only prevents cycles but also provides useful information about the path taken to reach each row in the result set.