transmit them to the client, which
could be important factors in the real elapsed time; but the planner
ignores those costs because it cannot change them by altering the
plan. (Every correct plan will output the same row set, we trust.)
</para>
<para>
The <literal>rows</literal> value is a little tricky because it is
not the number of rows processed or scanned by the
plan node, but rather the number emitted by the node. This is often
less than the number scanned, as a result of filtering by any
<literal>WHERE</literal>-clause conditions that are being applied at the node.
Ideally the top-level rows estimate will approximate the number of rows
actually returned, updated, or deleted by the query.
</para>
<para>
Returning to our example:
<screen>
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244)
</screen>
</para>
<para>
These numbers are derived very straightforwardly. If you do:
<programlisting>
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
</programlisting>
you will find that <classname>tenk1</classname> has 345 disk
pages and 10000 rows. The estimated cost is computed as (disk pages read *
<xref linkend="guc-seq-page-cost"/>) + (rows scanned *
<xref linkend="guc-cpu-tuple-cost"/>). By default,
<varname>seq_page_cost</varname> is 1.0 and <varname>cpu_tuple_cost</varname> is 0.01,
so the estimated cost is (345 * 1.0) + (10000 * 0.01) = 445.
</para>
<para>
Now let's modify the query to add a <literal>WHERE</literal> condition:
<screen>
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..470.00 rows=7000 width=244)
Filter: (unique1 < 7000)
</screen>
Notice that the <command>EXPLAIN</command> output shows the <literal>WHERE</literal>
clause being applied as a <quote>filter</quote> condition attached to the Seq
Scan plan node. This means that
the plan node checks the condition for each row it scans, and outputs
only the ones that pass the condition.
The estimate of output rows has been reduced because of the
<literal>WHERE</literal> clause.
However, the scan will still have to visit all 10000 rows, so the cost
hasn't decreased; in fact it has gone up a bit (by 10000 * <xref
linkend="guc-cpu-operator-cost"/>, to be exact) to reflect the extra CPU
time spent checking the <literal>WHERE</literal> condition.
</para>
<para>
The actual number of rows this query would select is 7000, but the <literal>rows</literal>
estimate is only approximate. If you try to duplicate this experiment,
you may well get a slightly different estimate; moreover, it can
change after each <command>ANALYZE</command> command, because the
statistics produced by <command>ANALYZE</command> are taken from a
randomized sample of the table.
</para>
<para>
Now, let's make the condition more restrictive:
<screen>
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100;
QUERY PLAN
-------------------------------------------------------------------&zwsp;-----------
Bitmap Heap Scan on tenk1 (cost=5.06..224.98 rows=100 width=244)
Recheck Cond: (unique1 < 100)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=100 width=0)
Index Cond: (unique1 < 100)
</screen>
Here the planner has decided to use a two-step plan: the child plan
node visits an index to find the locations of rows matching the index
condition, and then the upper plan node actually fetches those rows
from the table itself. Fetching rows separately is much more
expensive than reading them sequentially, but because not all the pages
of