<primary>ANALYZE</primary>
</indexterm>
<para>
The <productname>PostgreSQL</productname> query planner relies on
statistical information about the contents of tables in order to
generate good plans for queries. These statistics are gathered by
the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
which can be invoked by itself or
as an optional step in <command>VACUUM</command>. It is important to have
reasonably accurate statistics, otherwise poor choices of plans might
degrade database performance.
</para>
<para>
The autovacuum daemon, if enabled, will automatically issue
<command>ANALYZE</command> commands whenever the content of a table has
changed sufficiently. However, administrators might prefer to rely
on manually-scheduled <command>ANALYZE</command> operations, particularly
if it is known that update activity on a table will not affect the
statistics of <quote>interesting</quote> columns. The daemon schedules
<command>ANALYZE</command> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
</para>
<para>
Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics for
the inheritance tree as a whole won't be collected. It is necessary to
run <command>ANALYZE</command> on the parent table manually in order to
keep the statistics up to date.
</para>
<para>
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
ones. But even for a heavily-updated table, there might be no need for
statistics updates if the statistical distribution of the data is
not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
For example, a <type>timestamp</type> column that contains the time
of row update will have a constantly-increasing maximum value as
rows are added and updated; such a column will probably need more
frequent statistics updates than, say, a column containing URLs for
pages accessed on a website. The URL column might receive changes just
as often, but the statistical distribution of its values probably
changes relatively slowly.
</para>
<para>
It is possible to run <command>ANALYZE</command> on specific tables and even
just specific columns of a table, so the flexibility exists to update some
statistics more frequently than others if your application requires it.
In practice, however, it is usually best to just analyze the entire
database, because it is a fast operation. <command>ANALYZE</command> uses a
statistically random sampling of the rows of a table rather than reading
every single row.
</para>
<tip>
<para>
Although per-column tweaking of <command>ANALYZE</command> frequency might not be
very productive, you might find it worthwhile to do per-column
adjustment of the level of detail of the statistics collected by
<command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
clauses and have highly irregular data distributions might require a
finer-grain data histogram than other columns. See <command>ALTER TABLE
SET STATISTICS</command>, or change the database-wide default using the <xref
linkend="guc-default-statistics-target"/> configuration parameter.
</para>
<para>
Also, by default there is limited information available about
the selectivity of functions. However, if you create a statistics
object or an expression
index that uses a function call, useful statistics will be
gathered about