Statistics Used by the Planner

mentioned for <literal>LIMIT</literal>. Also, if the outer (first) child contains rows with duplicate key values, the inner (second) child is backed up and rescanned for the portion of its rows matching that key value. <command>EXPLAIN ANALYZE</command> counts these repeated emissions of the same inner rows as if they were real additional rows. When there are many outer duplicates, the reported actual row count for the inner child plan node can be significantly larger than the number of rows that are actually in the inner relation. </para> <para> BitmapAnd and BitmapOr nodes always report their actual row counts as zero, due to implementation limitations. </para> <para> Normally, <command>EXPLAIN</command> will display every plan node created by the planner. However, there are cases where the executor can determine that certain nodes need not be executed because they cannot produce any rows, based on parameter values that were not available at planning time. (Currently this can only happen for child nodes of an Append or MergeAppend node that is scanning a partitioned table.) When this happens, those plan nodes are omitted from the <command>EXPLAIN</command> output and a <literal>Subplans Removed: <replaceable>N</replaceable></literal> annotation appears instead. </para> </sect2> </sect1> <sect1 id="planner-stats"> <title>Statistics Used by the Planner</title> <indexterm zone="planner-stats"> <primary>statistics</primary> <secondary>of the planner</secondary> </indexterm> <sect2 id="planner-stats-single-column"> <title>Single-Column Statistics</title> <para> As we saw in the previous section, the query planner needs to estimate the number of rows retrieved by a query in order to make good choices of query plans. This section provides a quick look at the statistics that the system uses for these estimates. </para> <para> One component of the statistics is the total number of entries in each table and index, as well as the number of disk blocks occupied by each table and index. This information is kept in the table <link linkend="catalog-pg-class"><structname>pg_class</structname></link>, in the columns <structfield>reltuples</structfield> and <structfield>relpages</structfield>. We can look at it with queries similar to this one: <screen> SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'tenk1%'; relname | relkind | reltuples | relpages ----------------------+---------+-----------+---------- tenk1 | r | 10000 | 345 tenk1_hundred | i | 10000 | 11 tenk1_thous_tenthous | i | 10000 | 30 tenk1_unique1 | i | 10000 | 30 tenk1_unique2 | i | 10000 | 30 (5 rows) </screen> Here we can see that <structname>tenk1</structname> contains 10000 rows, as do its indexes, but the indexes are (unsurprisingly) much smaller than the table. </para> <para> For efficiency reasons, <structfield>reltuples</structfield> and <structfield>relpages</structfield> are not updated on-the-fly, and so they usually contain somewhat out-of-date values. They are updated by <command>VACUUM</command>, <command>ANALYZE</command>, and a few DDL commands such as <command>CREATE INDEX</command>. A <command>VACUUM</command> or <command>ANALYZE</command> operation that does not scan the entire table (which is commonly the case) will incrementally update the <structfield>reltuples</structfield> count on the basis of the part of the table it did scan, resulting in an approximate value. In any case, the planner will scale the values it finds in <structname>pg_class</structname> to match the current physical table size, thus obtaining a closer approximation. </para> <indexterm> <primary>pg_statistic</primary>

The query planner uses statistics to estimate the number of rows retrieved by a query, including total entries in tables and indexes, and disk block occupancy, which are stored in the pg_class catalog and updated by VACUUM, ANALYZE, and DDL commands, providing a basis for the planner to make informed decisions about query plans.