<!-- doc/src/sgml/planstats.sgml -->
<chapter id="planner-stats-details">
<title>How the Planner Uses Statistics</title>
<para>
This chapter builds on the material covered in <xref
linkend="using-explain"/> and <xref linkend="planner-stats"/> to show some
additional details about how the planner uses the
system statistics to estimate the number of rows each part of a query might
return. This is a significant part of the planning process,
providing much of the raw material for cost calculation.
</para>
<para>
The intent of this chapter is not to document the code in detail,
but to present an overview of how it works.
This will perhaps ease the learning curve for someone who subsequently
wishes to read the code.
</para>
<sect1 id="row-estimation-examples">
<title>Row Estimation Examples</title>
<indexterm zone="row-estimation-examples">
<primary>row estimation</primary>
<secondary>planner</secondary>
</indexterm>
<para>
The examples shown below use tables in the <productname>PostgreSQL</productname>
regression test database.
Note also that since <command>ANALYZE</command> uses random sampling
while producing statistics, the results will change slightly after
any new <command>ANALYZE</command>.
</para>
<para>
Let's start with a very simple query:
<programlisting>
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
</programlisting>
How the planner determines the cardinality of <structname>tenk1</structname>
is covered in <xref linkend="planner-stats"/>, but is repeated here for
completeness. The number of pages and rows is looked up in
<structname>pg_class</structname>:
<programlisting>
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
relpages | reltuples
----------+-----------
358 | 10000
</programlisting>
These numbers are current as of the last <command>VACUUM</command> or
<command>ANALYZE</command> on the table. The planner then fetches the
actual current number of pages in the table (this is a cheap operation,
not requiring a table scan). If that is different from
<structfield>relpages</structfield> then
<structfield>reltuples</structfield> is scaled accordingly to
arrive at a current number-of-rows estimate. In the example above, the value of
<structfield>relpages</structfield> is up-to-date so the rows estimate is
the same as <structfield>reltuples</structfield>.
</para>
<para>
Let's move on to an example with a range condition in its
<literal>WHERE</literal> clause:
<programlisting>
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;
QUERY PLAN
-------------------------------------------------------------------&zwsp;-------------
Bitmap Heap Scan on tenk1 (cost=24.06..394.64 rows=1007 width=244)
Recheck Cond: (unique1 < 1000)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..23.80 rows=1007 width=0)
Index Cond: (unique1 < 1000)
</programlisting>
The planner examines the <literal>WHERE</literal> clause condition
and looks up the selectivity function for the operator
<literal><</literal> in <structname>pg_operator</structname>.
This is held in the column <structfield>oprrest</structfield>,
and the entry in this case is <function>scalarltsel</function>.
The <function>scalarltsel</function> function retrieves the histogram for
<structfield>unique1</structfield> from
<structname>pg_statistic</structname>. For manual queries it is more
convenient to look in the simpler <structname>pg_stats</structname>
view:
<programlisting>
SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='unique1';
histogram_bounds
------------------------------------------------------
{0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995}