<emphasis>the histogram does not include the portion of the column
population represented by the MCVs</emphasis>. We do things this way because
it allows more precise estimation. In this situation
<function>scalarltsel</function> directly applies the condition (e.g.,
<quote>< 1000</quote>) to each value of the MCV list, and adds up the
frequencies of the MCVs for which the condition is true. This gives
an exact estimate of the selectivity within the portion of the table
that is MCVs. The histogram is then used in the same way as above
to estimate the selectivity in the portion of the table that is not
MCVs, and then the two numbers are combined to estimate the overall
selectivity. For example, consider
<programlisting>
EXPLAIN SELECT * FROM tenk1 WHERE stringu1 < 'IAAAAA';
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..483.00 rows=3077 width=244)
Filter: (stringu1 < 'IAAAAA'::name)
</programlisting>
We already saw the MCV information for <structfield>stringu1</structfield>,
and here is its histogram:
<programlisting>
SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='stringu1';
histogram_bounds
-------------------------------------------------------------------&zwsp;-------------
{AAAAAA,CQAAAA,FRAAAA,IBAAAA,KRAAAA,NFAAAA,PSAAAA,SGAAAA,VAAAAA,&zwsp;XLAAAA,ZZAAAA}
</programlisting>
Checking the MCV list, we find that the condition <literal>stringu1 <
'IAAAAA'</literal> is satisfied by the first six entries and not the last four,
so the selectivity within the MCV part of the population is
<programlisting>
selectivity = sum(relevant mvfs)
= 0.00333333 + 0.003 + 0.003 + 0.003 + 0.003 + 0.003
= 0.01833333
</programlisting>
Summing all the MCFs also tells us that the total fraction of the
population represented by MCVs is 0.03033333, and therefore the
fraction represented by the histogram is 0.96966667 (again, there
are no nulls, else we'd have to exclude them here). We can see
that the value <literal>IAAAAA</literal> falls nearly at the end of the
third histogram bucket. Using some rather cheesy assumptions
about the frequency of different characters, the planner arrives
at the estimate 0.298387 for the portion of the histogram population
that is less than <literal>IAAAAA</literal>. We then combine the estimates
for the MCV and non-MCV populations:
<programlisting>
selectivity = mcv_selectivity + histogram_selectivity * histogram_fraction
= 0.01833333 + 0.298387 * 0.96966667
= 0.307669
rows = 10000 * 0.307669
= 3077 (rounding off)
</programlisting>
In this particular example, the correction from the MCV list is fairly
small, because the column distribution is actually quite flat (the
statistics showing these particular values as being more common than
others are mostly due to sampling error). In a more typical case where
some values are significantly more common than others, this complicated
process gives a useful improvement in accuracy because the selectivity
for the most common values is found exactly.
</para>
<para>
Now let's consider a case with more than one
condition in the <literal>WHERE</literal> clause:
<programlisting>
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000 AND stringu1 = 'xxx';
QUERY PLAN
-------------------------------------------------------------------&zwsp;-------------
Bitmap Heap Scan on tenk1 (cost=23.80..396.91 rows=1 width=244)
Recheck Cond: (unique1 < 1000)
Filter: (stringu1 = 'xxx'::name)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..23.80 rows=1007 width=0)
Index Cond: (unique1 < 1000)
</programlisting>
The planner assumes that the two conditions are independent, so that
the