Parallel Query Execution Techniques

id="parallel-scans"> <title>Parallel Scans</title> <para> The following types of parallel-aware table scans are currently supported. <itemizedlist> <listitem> <para> In a <emphasis>parallel sequential scan</emphasis>, the table's blocks will be divided into ranges and shared among the cooperating processes. Each worker process will complete the scanning of its given range of blocks before requesting an additional range of blocks. </para> </listitem> <listitem> <para> In a <emphasis>parallel bitmap heap scan</emphasis>, one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperating processes as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlying index scan is not. </para> </listitem> <listitem> <para> In a <emphasis>parallel index scan</emphasis> or <emphasis>parallel index-only scan</emphasis>, the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other processes can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process. </para> </listitem> </itemizedlist> Other scan types, such as scans of non-btree indexes, may support parallel scans in the future. </para> </sect2> <sect2 id="parallel-joins"> <title>Parallel Joins</title> <para> Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. Depending on the join type, the inner side may also be a parallel plan. </para> <itemizedlist> <listitem> <para> In a <emphasis>nested loop join</emphasis>, the inner side is always non-parallel. Although it is executed in full, this is efficient if the inner side is an index scan, because the outer tuples and thus the loops that look up values in the index are divided over the cooperating processes. </para> </listitem> <listitem> <para> In a <emphasis>merge join</emphasis>, the inner side is always a non-parallel plan and therefore executed in full. This may be inefficient, especially if a sort must be performed, because the work and resulting data are duplicated in every cooperating process. </para> </listitem> <listitem> <para> In a <emphasis>hash join</emphasis> (without the "parallel" prefix), the inner side is executed in full by every cooperating process to build identical copies of the hash table. This may be inefficient if the hash table is large or the plan is expensive. In a <emphasis>parallel hash join</emphasis>, the inner side is a <emphasis>parallel hash</emphasis> that divides the work of building a shared hash table over the cooperating processes. </para> </listitem> </itemizedlist> </sect2> <sect2 id="parallel-aggregation"> <title>Parallel Aggregation</title> <para> <productname>PostgreSQL</productname> supports parallel aggregation by aggregating in two stages. First, each process participating in the parallel portion of the query performs an aggregation step, producing a partial result for each group of which that

PostgreSQL supports various parallel query execution techniques, including parallel scans such as parallel sequential scans, parallel bitmap heap scans, and parallel index scans, as well as parallel joins like nested loop joins, merge joins, and hash joins. Additionally, it supports parallel aggregation by performing an initial aggregation step in each process and then combining the partial results.