being more common than
others are mostly due to sampling error). In a more typical case where
some values are significantly more common than others, this complicated
process gives a useful improvement in accuracy because the selectivity
for the most common values is found exactly.
</para>
<para>
Now let's consider a case with more than one
condition in the <literal>WHERE</literal> clause:
<programlisting>
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000 AND stringu1 = 'xxx';
QUERY PLAN
-------------------------------------------------------------------&zwsp;-------------
Bitmap Heap Scan on tenk1 (cost=23.80..396.91 rows=1 width=244)
Recheck Cond: (unique1 < 1000)
Filter: (stringu1 = 'xxx'::name)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..23.80 rows=1007 width=0)
Index Cond: (unique1 < 1000)
</programlisting>
The planner assumes that the two conditions are independent, so that
the individual selectivities of the clauses can be multiplied together:
<programlisting>
selectivity = selectivity(unique1 < 1000) * selectivity(stringu1 = 'xxx')
= 0.100697 * 0.0014559
= 0.0001466
rows = 10000 * 0.0001466
= 1 (rounding off)
</programlisting>
Notice that the number of rows estimated to be returned from the bitmap
index scan reflects only the condition used with the index; this is
important since it affects the cost estimate for the subsequent heap
fetches.
</para>
<para>
Finally we will examine a query that involves a join:
<programlisting>
EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
QUERY PLAN
-------------------------------------------------------------------&zwsp;-------------------
Nested Loop (cost=4.64..456.23 rows=50 width=488)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.64..142.17 rows=50 width=244)
Recheck Cond: (unique1 < 50)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.63 rows=50 width=0)
Index Cond: (unique1 < 50)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..6.27 rows=1 width=244)
Index Cond: (unique2 = t1.unique2)
</programlisting>
The restriction on <structname>tenk1</structname>,
<literal>unique1 < 50</literal>,
is evaluated before the nested-loop join.
This is handled analogously to the previous range example. This time the
value 50 falls into the first bucket of the
<structfield>unique1</structfield> histogram:
<programlisting>
selectivity = (0 + (50 - bucket[1].min)/(bucket[1].max - bucket[1].min))/num_buckets
= (0 + (50 - 0)/(993 - 0))/10
= 0.005035
rows = 10000 * 0.005035
= 50 (rounding off)
</programlisting>
The restriction for the join is <literal>t2.unique2 = t1.unique2</literal>.
The operator is just
our familiar <literal>=</literal>, however the selectivity function is
obtained from the <structfield>oprjoin</structfield> column of
<structname>pg_operator</structname>, and is <function>eqjoinsel</function>.
<function>eqjoinsel</function> looks up the statistical information for both
<structname>tenk2</structname> and <structname>tenk1</structname>:
<programlisting>
SELECT tablename, null_frac,n_distinct, most_common_vals FROM pg_stats
WHERE tablename IN ('tenk1', 'tenk2') AND attname='unique2';
tablename | null_frac | n_distinct | most_common_vals
-----------+-----------+------------+------------------
tenk1 | 0 | -1 |
tenk2 | 0 | -1 |
</programlisting>
In this case there is no <acronym>MCV</acronym> information for
<structname>unique2</structname> and all the values appear to be
unique (n_distinct = -1), so we use an algorithm that relies on the row
count estimates for both relations (num_rows, not shown, but "tenk")