Row Estimation Examples

1000) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..23.80 rows=1007 width=0) Index Cond: (unique1 < 1000) </programlisting> The planner examines the <literal>WHERE</literal> clause condition and looks up the selectivity function for the operator <literal><</literal> in <structname>pg_operator</structname>. This is held in the column <structfield>oprrest</structfield>, and the entry in this case is <function>scalarltsel</function>. The <function>scalarltsel</function> function retrieves the histogram for <structfield>unique1</structfield> from <structname>pg_statistic</structname>. For manual queries it is more convenient to look in the simpler <structname>pg_stats</structname> view: <programlisting> SELECT histogram_bounds FROM pg_stats WHERE tablename='tenk1' AND attname='unique1'; histogram_bounds ------------------------------------------------------ {0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995} </programlisting> Next the fraction of the histogram occupied by <quote>< 1000</quote> is worked out. This is the selectivity. The histogram divides the range into equal frequency buckets, so all we have to do is locate the bucket that our value is in and count <emphasis>part</emphasis> of it and <emphasis>all</emphasis> of the ones before. The value 1000 is clearly in the second bucket (993–1997). Assuming a linear distribution of values inside each bucket, we can calculate the selectivity as: <programlisting> selectivity = (1 + (1000 - bucket[2].min)/(bucket[2].max - bucket[2].min))/num_buckets = (1 + (1000 - 993)/(1997 - 993))/10 = 0.100697 </programlisting> that is, one whole bucket plus a linear fraction of the second, divided by the number of buckets. The estimated number of rows can now be calculated as the product of the selectivity and the cardinality of <structname>tenk1</structname>: <programlisting> rows = rel_cardinality * selectivity = 10000 * 0.100697 = 1007 (rounding off) </programlisting> </para> <para> Next let's consider an example with an equality condition in its <literal>WHERE</literal> clause: <programlisting> EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'CRAAAA'; QUERY PLAN ---------------------------------------------------------- Seq Scan on tenk1 (cost=0.00..483.00 rows=30 width=244) Filter: (stringu1 = 'CRAAAA'::name) </programlisting> Again the planner examines the <literal>WHERE</literal> clause condition and looks up the selectivity function for <literal>=</literal>, which is <function>eqsel</function>. For equality estimation the histogram is not useful; instead the list of <firstterm>most common values</firstterm> (<acronym>MCV</acronym>s) is used to determine the selectivity. Let's have a look at the MCVs, with some additional columns that will be useful later: <programlisting> SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats WHERE tablename='tenk1' AND attname='stringu1'; null_frac | 0 n_distinct | 676 most_common_vals | {EJAAAA,BBAAAA,CRAAAA,FCAAAA,FEAAAA,GSAAAA,&zwsp;JOAAAA,MCAAAA,NAAAAA,WGAAAA} most_common_freqs | {0.00333333,0.003,0.003,0.003,0.003,0.003,&zwsp;0.003,0.003,0.003,0.003} </programlisting> Since <literal>CRAAAA</literal> appears in the list of MCVs, the selectivity is merely the corresponding entry in the list of most common frequencies (<acronym>MCF</acronym>s): <programlisting> selectivity = mcf[3] = 0.003 </programlisting> As before, the estimated number of rows is just the product of this with the cardinality of <structname>tenk1</structname>: <programlisting> rows = 10000 * 0.003 = 30 </programlisting> </para> <para> Now consider the same query, but with a constant that is not in the <acronym>MCV</acronym> list: <programlisting> EXPLAIN SELECT * FROM

This section provides examples of how the PostgreSQL planner estimates the number of rows returned by a query, including queries with range conditions, equality conditions, and constants that are or are not in the list of most common values, and how the planner uses histograms and most common values to determine selectivity and estimate row counts.