Home Explore Blog CI



postgresql

26th chunk of `doc/src/sgml/perform.sgml`
2c6301323ff9528e92f3b5e42d57cd8954ed1263f58b16da0000000100000fa3
  0.000113
     4 | {New York, NY}         | {f,f} |  0.001967 |       0.000114
     5 | {Atlanta, GA}          | {f,f} |  0.001633 |        3.3e-05
     6 | {Sacramento, CA}       | {f,f} |  0.001433 |        7.8e-05
     7 | {Miami, FL}            | {f,f} |    0.0014 |          6e-05
     8 | {Dallas, TX}           | {f,f} |  0.001367 |        8.8e-05
     9 | {Chicago, IL}          | {f,f} |  0.001333 |        5.1e-05
   ...
(99 rows)
</programlisting>
     This indicates that the most common combination of city and state is
     Washington in DC, with actual frequency (in the sample) about 0.35%.
     The base frequency of the combination (as computed from the simple
     per-column frequencies) is only 0.0027%, resulting in two orders of
     magnitude under-estimates.
    </para>

    <para>
     It's advisable to create <acronym>MCV</acronym> statistics objects only
     on combinations of columns that are actually used in conditions together,
     and for which misestimation of the number of groups is resulting in bad
     plans.  Otherwise, the <command>ANALYZE</command> and planning cycles
     are just wasted.
    </para>
   </sect3>

  </sect2>
 </sect1>

 <sect1 id="explicit-joins">
  <title>Controlling the Planner with Explicit <literal>JOIN</literal> Clauses</title>

  <indexterm zone="explicit-joins">
   <primary>join</primary>
   <secondary>controlling the order</secondary>
  </indexterm>

  <para>
   It is possible
   to control the query planner to some extent by using the explicit <literal>JOIN</literal>
   syntax.  To see why this matters, we first need some background.
  </para>

  <para>
   In a simple join query, such as:
<programlisting>
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
</programlisting>
   the planner is free to join the given tables in any order.  For
   example, it could generate a query plan that joins A to B, using
   the <literal>WHERE</literal> condition <literal>a.id = b.id</literal>, and then
   joins C to this joined table, using the other <literal>WHERE</literal>
   condition.  Or it could join B to C and then join A to that result.
   Or it could join A to C and then join them with B &mdash; but that
   would be inefficient, since the full Cartesian product of A and C
   would have to be formed, there being no applicable condition in the
   <literal>WHERE</literal> clause to allow optimization of the join.  (All
   joins in the <productname>PostgreSQL</productname> executor happen
   between two input tables, so it's necessary to build up the result
   in one or another of these fashions.)  The important point is that
   these different join possibilities give semantically equivalent
   results but might have hugely different execution costs.  Therefore,
   the planner will explore all of them to try to find the most
   efficient query plan.
  </para>

  <para>
   When a query only involves two or three tables, there aren't many join
   orders to worry about.  But the number of possible join orders grows
   exponentially as the number of tables expands.  Beyond ten or so input
   tables it's no longer practical to do an exhaustive search of all the
   possibilities, and even for six or seven tables planning might take an
   annoyingly long time.  When there are too many input tables, the
   <productname>PostgreSQL</productname> planner will switch from exhaustive
   search to a <firstterm>genetic</firstterm> probabilistic search
   through a limited number of possibilities.  (The switch-over threshold is
   set by the <xref linkend="guc-geqo-threshold"/> run-time
   parameter.)
   The genetic search takes less time, but it won't
   necessarily find the best possible plan.
  </para>

  <para>
   When the query involves outer joins, the planner has less freedom
   than it does for plain (inner) joins. For example, consider:
<programlisting>
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
</programlisting>
   Although this query's restrictions are

Title: Controlling the Query Planner with Explicit JOIN Clauses
Summary
The PostgreSQL query planner can be controlled to some extent by using explicit JOIN syntax, which allows specifying the order of joins and can improve performance by reducing the number of possible join orders, especially for queries involving multiple tables, and can also handle outer joins with less freedom than inner joins.