<!-- doc/src/sgml/parallel.sgml -->
<chapter id="parallel-query">
<title>Parallel Query</title>
<indexterm zone="parallel-query">
<primary>parallel query</primary>
</indexterm>
<para>
<productname>PostgreSQL</productname> can devise query plans that can leverage
multiple CPUs in order to answer queries faster. This feature is known
as parallel query. Many queries cannot benefit from parallel query, either
due to limitations of the current implementation or because there is no
imaginable query plan that is any faster than the serial query plan.
However, for queries that can benefit, the speedup from parallel query
is often very significant. Many queries can run more than twice as fast
when using parallel query, and some queries can run four times faster or
even more. Queries that touch a large amount of data but return only a
few rows to the user will typically benefit most. This chapter explains
some details of how parallel query works and in which situations it can be
used so that users who wish to make use of it can understand what to expect.
</para>
<sect1 id="how-parallel-query-works">
<title>How Parallel Query Works</title>
<para>
When the optimizer determines that parallel query is the fastest execution
strategy for a particular query, it will create a query plan that includes
a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
node. Here is a simple example:
<screen>
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
QUERY PLAN
-------------------------------------------------------------------&zwsp;------------------
Gather (cost=1000.00..217018.43 rows=1 width=97)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
Filter: (filler ~~ '%x%'::text)
(4 rows)
</screen>
</para>
<para>
In all cases, the <literal>Gather</literal> or
<literal>Gather Merge</literal> node will have exactly one
child plan, which is the portion of the plan that will be executed in
parallel. If the <literal>Gather</literal> or <literal>Gather Merge</literal> node is
at the very top of the plan tree, then the entire query will execute in
parallel. If it is somewhere else in the plan tree, then only the portion
of the plan below it will run in parallel. In the example above, the
query accesses only one table, so there is only one plan node other than
the <literal>Gather</literal> node itself; since that plan node is a child of the
<literal>Gather</literal> node, it will run in parallel.
</para>
<para>
<link linkend="using-explain">Using EXPLAIN</link>, you can see the number of
workers chosen by the planner. When the <literal>Gather</literal> node is reached
during query execution, the process that is implementing the user's
session will request a number of <link linkend="bgworker">background
worker processes</link> equal to the number
of workers chosen by the planner. The number of background workers that
the planner will consider using is limited to at most
<xref linkend="guc-max-parallel-workers-per-gather"/>. The total number
of background workers that can exist at any one time is limited by both
<xref linkend="guc-max-worker-processes"/> and
<xref linkend="guc-max-parallel-workers"/>. Therefore, it is possible for a
parallel query to run with fewer workers than planned, or even with
no workers at all. The optimal plan may depend on the number of workers
that are available, so this can result in poor query performance. If this
occurrence is frequent, consider increasing
<varname>max_worker_processes</varname> and <varname>max_parallel_workers</varname>
so that more workers can be run simultaneously or alternatively reducing
<varname>max_parallel_workers_per_gather</varname> so that the planner
requests