Understanding Query Plans with EXPLAIN

transmit them to the client, which could be important factors in the real elapsed time; but the planner ignores those costs because it cannot change them by altering the plan. (Every correct plan will output the same row set, we trust.) </para> <para> The <literal>rows</literal> value is a little tricky because it is not the number of rows processed or scanned by the plan node, but rather the number emitted by the node. This is often less than the number scanned, as a result of filtering by any <literal>WHERE</literal>-clause conditions that are being applied at the node. Ideally the top-level rows estimate will approximate the number of rows actually returned, updated, or deleted by the query. </para> <para> Returning to our example: <screen> EXPLAIN SELECT * FROM tenk1; QUERY PLAN ------------------------------------------------------------- Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244) </screen> </para> <para> These numbers are derived very straightforwardly. If you do: <programlisting> SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; </programlisting> you will find that <classname>tenk1</classname> has 345 disk pages and 10000 rows. The estimated cost is computed as (disk pages read * <xref linkend="guc-seq-page-cost"/>) + (rows scanned * <xref linkend="guc-cpu-tuple-cost"/>). By default, <varname>seq_page_cost</varname> is 1.0 and <varname>cpu_tuple_cost</varname> is 0.01, so the estimated cost is (345 * 1.0) + (10000 * 0.01) = 445. </para> <para> Now let's modify the query to add a <literal>WHERE</literal> condition: <screen> EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000; QUERY PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..470.00 rows=7000 width=244) Filter: (unique1 < 7000) </screen> Notice that the <command>EXPLAIN</command> output shows the <literal>WHERE</literal> clause being applied as a <quote>filter</quote> condition attached to the Seq Scan plan node. This means that the plan node checks the condition for each row it scans, and outputs only the ones that pass the condition. The estimate of output rows has been reduced because of the <literal>WHERE</literal> clause. However, the scan will still have to visit all 10000 rows, so the cost hasn't decreased; in fact it has gone up a bit (by 10000 * <xref linkend="guc-cpu-operator-cost"/>, to be exact) to reflect the extra CPU time spent checking the <literal>WHERE</literal> condition. </para> <para> The actual number of rows this query would select is 7000, but the <literal>rows</literal> estimate is only approximate. If you try to duplicate this experiment, you may well get a slightly different estimate; moreover, it can change after each <command>ANALYZE</command> command, because the statistics produced by <command>ANALYZE</command> are taken from a randomized sample of the table. </para> <para> Now, let's make the condition more restrictive: <screen> EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100; QUERY PLAN -------------------------------------------------------------------&zwsp;----------- Bitmap Heap Scan on tenk1 (cost=5.06..224.98 rows=100 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=100 width=0) Index Cond: (unique1 < 100) </screen> Here the planner has decided to use a two-step plan: the child plan node visits an index to find the locations of rows matching the index condition, and then the upper plan node actually fetches those rows from the table itself. Fetching rows separately is much more expensive than reading them sequentially, but because not all the pages of

This section explains how to use the EXPLAIN command to understand query plans, including how costs are estimated, how the planner chooses between different plan nodes, and how to interpret the output of EXPLAIN, using examples to illustrate the effects of adding WHERE clauses and index conditions on query plans.