Home Explore Blog CI



postgresql

6th chunk of `doc/src/sgml/parallel.sgml`
16420435e3c72056cf7b3bdf08ff74f5d61fcea2559abbd80000000100000fa1
 in every cooperating process.
      </para>
    </listitem>
    <listitem>
      <para>
        In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
        the inner side is executed in full by every cooperating process
        to build identical copies of the hash table.  This may be inefficient
        if the hash table is large or the plan is expensive.  In a
        <emphasis>parallel hash join</emphasis>, the inner side is a
        <emphasis>parallel hash</emphasis> that divides the work of building
        a shared hash table over the cooperating processes.
      </para>
    </listitem>
  </itemizedlist>
 </sect2>

 <sect2 id="parallel-aggregation">
  <title>Parallel Aggregation</title>
  <para>
    <productname>PostgreSQL</productname> supports parallel aggregation by aggregating in
    two stages.  First, each process participating in the parallel portion of
    the query performs an aggregation step, producing a partial result for
    each group of which that process is aware.  This is reflected in the plan
    as a <literal>Partial Aggregate</literal> node.  Second, the partial results are
    transferred to the leader via <literal>Gather</literal> or <literal>Gather
    Merge</literal>.  Finally, the leader re-aggregates the results across all
    workers in order to produce the final result.  This is reflected in the
    plan as a <literal>Finalize Aggregate</literal> node.
  </para>

  <para>
    Because the <literal>Finalize Aggregate</literal> node runs on the leader
    process, queries that produce a relatively large number of groups in
    comparison to the number of input rows will appear less favorable to the
    query planner. For example, in the worst-case scenario the number of
    groups seen by the <literal>Finalize Aggregate</literal> node could be as many as
    the number of input rows that were seen by all worker processes in the
    <literal>Partial Aggregate</literal> stage. For such cases, there is clearly
    going to be no performance benefit to using parallel aggregation. The
    query planner takes this into account during the planning process and is
    unlikely to choose parallel aggregate in this scenario.
  </para>

  <para>
    Parallel aggregation is not supported in all situations.  Each aggregate
    must be <link linkend="parallel-safety">safe</link> for parallelism and must
    have a combine function.  If the aggregate has a transition state of type
    <literal>internal</literal>, it must have serialization and deserialization
    functions.  See <xref linkend="sql-createaggregate"/> for more details.
    Parallel aggregation is not supported if any aggregate function call
    contains <literal>DISTINCT</literal> or <literal>ORDER BY</literal> clause and is also
    not supported for ordered set aggregates or when  the query involves
    <literal>GROUPING SETS</literal>.  It can only be used when all joins involved in
    the query are also part of the parallel portion of the plan.
  </para>

 </sect2>

 <sect2 id="parallel-append">
  <title>Parallel Append</title>

  <para>
    Whenever <productname>PostgreSQL</productname> needs to combine rows
    from multiple sources into a single result set, it uses an
    <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
    This commonly happens when implementing <literal>UNION ALL</literal> or
    when scanning a partitioned table.  Such nodes can be used in parallel
    plans just as they can in any other plan.  However, in a parallel plan,
    the planner may instead use a <literal>Parallel Append</literal> node.
  </para>

  <para>
    When an <literal>Append</literal> node is used in a parallel plan, each
    process will execute the child plans in the order in which they appear,
    so that all participating processes cooperate to execute the first child
    plan until it is complete and then move to the second plan at around the
    same time.  When a <literal>Parallel Append</literal> is used

Title: Parallel Query Execution in PostgreSQL
Summary
PostgreSQL supports parallel aggregation by dividing the work into two stages: partial aggregation in each process and final aggregation in the leader process. However, parallel aggregation is not supported in all situations, such as when using DISTINCT or ORDER BY clauses, or with ordered set aggregates. The planner also uses Parallel Append nodes to combine rows from multiple sources in parallel plans, allowing for more efficient execution of queries involving UNION ALL or partitioned tables.