Home Explore Blog CI



postgresql

2nd chunk of `doc/src/sgml/planstats.sgml`
6ed31b5e67eef50d4e76953ff354db92e8b9e9300167dfb40000000100000fa1
 1000)
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..23.80 rows=1007 width=0)
         Index Cond: (unique1 < 1000)
</programlisting>

   The planner examines the <literal>WHERE</literal> clause condition
   and looks up the selectivity function for the operator
   <literal>&lt;</literal> in <structname>pg_operator</structname>.
   This is held in the column <structfield>oprrest</structfield>,
   and the entry in this case is <function>scalarltsel</function>.
   The <function>scalarltsel</function> function retrieves the histogram for
   <structfield>unique1</structfield> from
   <structname>pg_statistic</structname>.  For manual queries it is more
   convenient to look in the simpler <structname>pg_stats</structname>
   view:

<programlisting>
SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='unique1';

                   histogram_bounds
------------------------------------------------------
 {0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995}
</programlisting>

   Next the fraction of the histogram occupied by <quote>&lt; 1000</quote>
   is worked out. This is the selectivity. The histogram divides the range
   into equal frequency buckets, so all we have to do is locate the bucket
   that our value is in and count <emphasis>part</emphasis> of it and
   <emphasis>all</emphasis> of the ones before. The value 1000 is clearly in
   the second bucket (993&ndash;1997).  Assuming a linear distribution of
   values inside each bucket, we can calculate the selectivity as:

<programlisting>
selectivity = (1 + (1000 - bucket[2].min)/(bucket[2].max - bucket[2].min))/num_buckets
            = (1 + (1000 - 993)/(1997 - 993))/10
            = 0.100697
</programlisting>

   that is, one whole bucket plus a linear fraction of the second, divided by
   the number of buckets. The estimated number of rows can now be calculated as
   the product of the selectivity and the cardinality of
   <structname>tenk1</structname>:

<programlisting>
rows = rel_cardinality * selectivity
     = 10000 * 0.100697
     = 1007  (rounding off)
</programlisting>
  </para>

  <para>
   Next let's consider an example with an equality condition in its
   <literal>WHERE</literal> clause:

<programlisting>
EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'CRAAAA';

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=30 width=244)
   Filter: (stringu1 = 'CRAAAA'::name)
</programlisting>

   Again the planner examines the <literal>WHERE</literal> clause condition
   and looks up the selectivity function for <literal>=</literal>, which is
   <function>eqsel</function>.  For equality estimation the histogram is
   not useful; instead the list of <firstterm>most
   common values</firstterm> (<acronym>MCV</acronym>s) is used to determine the
   selectivity. Let's have a look at the MCVs, with some additional columns
   that will be useful later:

<programlisting>
SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats
WHERE tablename='tenk1' AND attname='stringu1';

null_frac         | 0
n_distinct        | 676
most_common_vals  | {EJAAAA,BBAAAA,CRAAAA,FCAAAA,FEAAAA,GSAAAA,&zwsp;JOAAAA,MCAAAA,NAAAAA,WGAAAA}
most_common_freqs | {0.00333333,0.003,0.003,0.003,0.003,0.003,&zwsp;0.003,0.003,0.003,0.003}

</programlisting>

   Since <literal>CRAAAA</literal> appears in the list of MCVs, the selectivity is
   merely the corresponding entry in the list of most common frequencies
   (<acronym>MCF</acronym>s):

<programlisting>
selectivity = mcf[3]
            = 0.003
</programlisting>

   As before, the estimated number of rows is just the product of this with the
   cardinality of <structname>tenk1</structname>:

<programlisting>
rows = 10000 * 0.003
     = 30
</programlisting>
  </para>

  <para>
   Now consider the same query, but with a constant that is not in the
   <acronym>MCV</acronym> list:

<programlisting>
EXPLAIN SELECT * FROM

Title: Row Estimation Examples
Summary
This section provides examples of how the PostgreSQL planner estimates the number of rows returned by a query, including queries with range conditions, equality conditions, and constants that are or are not in the list of most common values, and how the planner uses histograms and most common values to determine selectivity and estimate row counts.