Home Explore Blog CI



postgresql

1st chunk of `doc/src/sgml/parallel.sgml`
5199a98c8a811928e72d28c0831d7acdcedcae47f723e2240000000100000fa8
<!-- doc/src/sgml/parallel.sgml -->

 <chapter id="parallel-query">
  <title>Parallel Query</title>

  <indexterm zone="parallel-query">
   <primary>parallel query</primary>
  </indexterm>

  <para>
   <productname>PostgreSQL</productname> can devise query plans that can leverage
   multiple CPUs in order to answer queries faster.  This feature is known
   as parallel query.  Many queries cannot benefit from parallel query, either
   due to limitations of the current implementation or because there is no
   imaginable query plan that is any faster than the serial query plan.
   However, for queries that can benefit, the speedup from parallel query
   is often very significant.  Many queries can run more than twice as fast
   when using parallel query, and some queries can run four times faster or
   even more.  Queries that touch a large amount of data but return only a
   few rows to the user will typically benefit most.  This chapter explains
   some details of how parallel query works and in which situations it can be
   used so that users who wish to make use of it can understand what to expect.
  </para>

 <sect1 id="how-parallel-query-works">
  <title>How Parallel Query Works</title>

   <para>
    When the optimizer determines that parallel query is the fastest execution
    strategy for a particular query, it will create a query plan that includes
    a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
    node.  Here is a simple example:

<screen>
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
                                     QUERY PLAN
-------------------------------------------------------------------&zwsp;------------------
 Gather  (cost=1000.00..217018.43 rows=1 width=97)
   Workers Planned: 2
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..216018.33 rows=1 width=97)
         Filter: (filler ~~ '%x%'::text)
(4 rows)
</screen>
   </para>

   <para>
    In all cases, the <literal>Gather</literal> or
    <literal>Gather Merge</literal> node will have exactly one
    child plan, which is the portion of the plan that will be executed in
    parallel.  If the <literal>Gather</literal> or <literal>Gather Merge</literal> node is
    at the very top of the plan tree, then the entire query will execute in
    parallel.  If it is somewhere else in the plan tree, then only the portion
    of the plan below it will run in parallel.  In the example above, the
    query accesses only one table, so there is only one plan node other than
    the <literal>Gather</literal> node itself; since that plan node is a child of the
    <literal>Gather</literal> node, it will run in parallel.
   </para>

   <para>
    <link linkend="using-explain">Using EXPLAIN</link>, you can see the number of
    workers chosen by the planner.  When the <literal>Gather</literal> node is reached
    during query execution, the process that is implementing the user's
    session will request a number of <link linkend="bgworker">background
    worker processes</link> equal to the number
    of workers chosen by the planner.  The number of background workers that
    the planner will consider using is limited to at most
    <xref linkend="guc-max-parallel-workers-per-gather"/>.  The total number
    of background workers that can exist at any one time is limited by both
    <xref linkend="guc-max-worker-processes"/> and
    <xref linkend="guc-max-parallel-workers"/>.  Therefore, it is possible for a
    parallel query to run with fewer workers than planned, or even with
    no workers at all.  The optimal plan may depend on the number of workers
    that are available, so this can result in poor query performance.  If this
    occurrence is frequent, consider increasing
    <varname>max_worker_processes</varname> and <varname>max_parallel_workers</varname>
    so that more workers can be run simultaneously or alternatively reducing
    <varname>max_parallel_workers_per_gather</varname> so that the planner
    requests

Title: Parallel Query in PostgreSQL
Summary
PostgreSQL's parallel query feature allows certain queries to utilize multiple CPUs, significantly improving query speeds for tasks that involve large amounts of data, with some queries running up to four times faster than serial queries.