Query Planner Statistics

21th chunk of `doc/src/sgml/perform.sgml`

da38372276f1a235f2aae5a8e18a89cc2179bd34904dcda80000000100000fad

 880                        Ramp+
         |           |            | Sp Railroad                       +
         |           |            | I- 580                            +
         |           |            | I- 680                        Ramp+
         |           |            | I- 80                         Ramp+
         |           |            | 14th                          St  +
         |           |            | I- 880                            +
         |           |            | Mac Arthur                    Blvd+
         |           |            | Mission                       Blvd+
...
 name    | t         |    -0.5125 | I- 580                        Ramp+
         |           |            | I- 880                        Ramp+
         |           |            | I- 580                            +
         |           |            | I- 680                        Ramp+
         |           |            | I- 80                         Ramp+
         |           |            | Sp Railroad                       +
         |           |            | I- 880                            +
         |           |            | State Hwy 13                  Ramp+
         |           |            | I- 80                             +
         |           |            | State Hwy 24                  Ramp+
...
 thepath | f         |          0 |
 thepath | t         |          0 |
(4 rows)
</screen>

   Note that two rows are displayed for the same column, one corresponding
   to the complete inheritance hierarchy starting at the
   <literal>road</literal> table (<literal>inherited</literal>=<literal>t</literal>),
   and another one including only the <literal>road</literal> table itself
   (<literal>inherited</literal>=<literal>f</literal>).
   (For brevity, we have only shown the first ten most-common values for
   the <literal>name</literal> column.)
  </para>

  <para>
   The amount of information stored in <structname>pg_statistic</structname>
   by <command>ANALYZE</command>, in particular the maximum number of entries in the
   <structfield>most_common_vals</structfield> and <structfield>histogram_bounds</structfield>
   arrays for each column, can be set on a
   column-by-column basis using the <command>ALTER TABLE SET STATISTICS</command>
   command, or globally by setting the
   <xref linkend="guc-default-statistics-target"/> configuration variable.
   The default limit is presently 100 entries.  Raising the limit
   might allow more accurate planner estimates to be made, particularly for
   columns with irregular data distributions, at the price of consuming
   more space in <structname>pg_statistic</structname> and slightly more
   time to compute the estimates.  Conversely, a lower limit might be
   sufficient for columns with simple data distributions.
  </para>

  <para>
   Further details about the planner's use of statistics can be found in
   <xref linkend="planner-stats-details"/>.
  </para>
  </sect2>

  <sect2 id="planner-stats-extended">
   <title>Extended Statistics</title>

   <indexterm zone="planner-stats-extended">
    <primary>statistics</primary>
    <secondary>of the planner</secondary>
   </indexterm>

   <indexterm>
    <primary>correlation</primary>
    <secondary>in the query planner</secondary>
   </indexterm>

   <indexterm>
    <primary>pg_statistic_ext</primary>
   </indexterm>

   <indexterm>
    <primary>pg_statistic_ext_data</primary>
   </indexterm>

   <para>
    It is common to see slow queries running bad execution plans because
    multiple columns used in the query clauses are correlated.
    The planner normally assumes that multiple conditions
    are independent of each other,
    an assumption that does not hold when column values are correlated.
    Regular statistics, because of their per-individual-column nature,
    cannot capture any knowledge about cross-column correlation.
    However, <productname>PostgreSQL</productname> has the ability to compute
    <firstterm>multivariate

Title: Query Planner Statistics

Summary

The query planner uses statistics from the pg_statistic system catalog to estimate the selectivity of WHERE clauses, which can be viewed through the pg_stats view, and can be customized using the ALTER TABLE SET STATISTICS command or the default_statistics_target configuration variable, and also supports extended statistics to capture cross-column correlation using the pg_statistic_ext and pg_statistic_ext_data catalogs.