Updating Planner Statistics in PostgreSQL

<primary>ANALYZE</primary> </indexterm> <para> The <productname>PostgreSQL</productname> query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the <link linkend="sql-analyze"><command>ANALYZE</command></link> command, which can be invoked by itself or as an optional step in <command>VACUUM</command>. It is important to have reasonably accurate statistics, otherwise poor choices of plans might degrade database performance. </para> <para> The autovacuum daemon, if enabled, will automatically issue <command>ANALYZE</command> commands whenever the content of a table has changed sufficiently. However, administrators might prefer to rely on manually-scheduled <command>ANALYZE</command> operations, particularly if it is known that update activity on a table will not affect the statistics of <quote>interesting</quote> columns. The daemon schedules <command>ANALYZE</command> strictly as a function of the number of rows inserted or updated; it has no knowledge of whether that will lead to meaningful statistical changes. </para> <para> Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to run <command>ANALYZE</command> on the parent table manually in order to keep the statistics up to date. </para> <para> As with vacuuming for space recovery, frequent updates of statistics are more useful for heavily-updated tables than for seldom-updated ones. But even for a heavily-updated table, there might be no need for statistics updates if the statistical distribution of the data is not changing much. A simple rule of thumb is to think about how much the minimum and maximum values of the columns in the table change. For example, a <type>timestamp</type> column that contains the time of row update will have a constantly-increasing maximum value as rows are added and updated; such a column will probably need more frequent statistics updates than, say, a column containing URLs for pages accessed on a website. The URL column might receive changes just as often, but the statistical distribution of its values probably changes relatively slowly. </para> <para> It is possible to run <command>ANALYZE</command> on specific tables and even just specific columns of a table, so the flexibility exists to update some statistics more frequently than others if your application requires it. In practice, however, it is usually best to just analyze the entire database, because it is a fast operation. <command>ANALYZE</command> uses a statistically random sampling of the rows of a table rather than reading every single row. </para> <tip> <para> Although per-column tweaking of <command>ANALYZE</command> frequency might not be very productive, you might find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal> clauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. See <command>ALTER TABLE SET STATISTICS</command>, or change the database-wide default using the <xref linkend="guc-default-statistics-target"/> configuration parameter. </para> <para> Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about

This section describes the importance of updating planner statistics in PostgreSQL using the ANALYZE command, which gathers statistical information about table contents to generate good query plans, and discusses how the autovacuum daemon can automatically issue ANALYZE commands, as well as considerations for manual scheduling, column-specific analysis, and adjusting the level of detail for collected statistics.