Join Optimization and Query Execution

left relation is scanned and the appropriate values of every row found are used as hash keys to locate the matching rows in the table. </para> </listitem> </itemizedlist> </para> <para> When the query involves more than two relations, the final result must be built up by a tree of join steps, each with two inputs. The planner examines different possible join sequences to find the cheapest one. </para> <para> If the query uses fewer than <xref linkend="guc-geqo-threshold"/> relations, a near-exhaustive search is conducted to find the best join sequence. The planner preferentially considers joins between any two relations for which there exists a corresponding join clause in the <literal>WHERE</literal> qualification (i.e., for which a restriction like <literal>where rel1.attr1=rel2.attr2</literal> exists). Join pairs with no join clause are considered only when there is no other choice, that is, a particular relation has no available join clauses to any other relation. All possible plans are generated for every join pair considered by the planner, and the one that is (estimated to be) the cheapest is chosen. </para> <para> When <varname>geqo_threshold</varname> is exceeded, the join sequences considered are determined by heuristics, as described in <xref linkend="geqo"/>. Otherwise the process is the same. </para> <para> The finished plan tree consists of sequential or index scans of the base relations, plus nested-loop, merge, or hash join nodes as needed, plus any auxiliary steps needed, such as sort nodes or aggregate-function calculation nodes. Most of these plan node types have the additional ability to do <firstterm>selection</firstterm> (discarding rows that do not meet a specified Boolean condition) and <firstterm>projection</firstterm> (computation of a derived column set based on given column values, that is, evaluation of scalar expressions where needed). One of the responsibilities of the planner is to attach selection conditions from the <literal>WHERE</literal> clause and computation of required output expressions to the most appropriate nodes of the plan tree. </para> </sect2> </sect1> <sect1 id="executor"> <title>Executor</title> <para> The <firstterm>executor</firstterm> takes the plan created by the planner/optimizer and recursively processes it to extract the required set of rows. This is essentially a demand-pull pipeline mechanism. Each time a plan node is called, it must deliver one more row, or report that it is done delivering rows. </para> <para> To provide a concrete example, assume that the top node is a <literal>MergeJoin</literal> node. Before any merge can be done two rows have to be fetched (one from each subplan). So the executor recursively calls itself to process the subplans (it starts with the subplan attached to <literal>lefttree</literal>). The new top node (the top node of the left subplan) is, let's say, a <literal>Sort</literal> node and again recursion is needed to obtain an input row. The child node of the <literal>Sort</literal> might be a <literal>SeqScan</literal> node, representing actual reading of a table. Execution of this node causes the executor to fetch a row from the table and return it up to the calling node. The <literal>Sort</literal>

When queries involve multiple relations, the planner creates a tree of join steps and examines different join sequences to minimize cost. For queries with fewer relations than the `geqo_threshold`, a near-exhaustive search for the best join sequence is performed, prioritizing joins based on explicit join clauses. If the threshold is exceeded, heuristics determine join sequences. The final plan tree consists of scans, join nodes, and auxiliary steps. The planner attaches selection conditions and output expressions to appropriate nodes in the tree. The executor processes the plan recursively, using a demand-pull pipeline. Each node delivers a row or reports completion. For example, a MergeJoin node fetches rows from subplans, recursively calling the executor. This may involve Sort nodes and SeqScan nodes that read table rows.