Home Explore Blog CI



postgresql

5th chunk of `doc/src/sgml/maintenance.sgml`
626c476702d02d17e08439624dfd3734a797a57ad2fd90660000000100000fa5

    <primary>ANALYZE</primary>
   </indexterm>

   <para>
    The <productname>PostgreSQL</productname> query planner relies on
    statistical information about the contents of tables in order to
    generate good plans for queries.  These statistics are gathered by
    the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
    which can be invoked by itself or
    as an optional step in <command>VACUUM</command>.  It is important to have
    reasonably accurate statistics, otherwise poor choices of plans might
    degrade database performance.
   </para>

   <para>
    The autovacuum daemon, if enabled, will automatically issue
    <command>ANALYZE</command> commands whenever the content of a table has
    changed sufficiently.  However, administrators might prefer to rely
    on manually-scheduled <command>ANALYZE</command> operations, particularly
    if it is known that update activity on a table will not affect the
    statistics of <quote>interesting</quote> columns.  The daemon schedules
    <command>ANALYZE</command> strictly as a function of the number of rows
    inserted or updated; it has no knowledge of whether that will lead
    to meaningful statistical changes.
   </para>

   <para>
    Tuples changed in partitions and inheritance children do not trigger
    analyze on the parent table.  If the parent table is empty or rarely
    changed, it may never be processed by autovacuum, and the statistics for
    the inheritance tree as a whole won't be collected. It is necessary to
    run <command>ANALYZE</command> on the parent table manually in order to
    keep the statistics up to date.
   </para>

   <para>
    As with vacuuming for space recovery, frequent updates of statistics
    are more useful for heavily-updated tables than for seldom-updated
    ones. But even for a heavily-updated table, there might be no need for
    statistics updates if the statistical distribution of the data is
    not changing much. A simple rule of thumb is to think about how much
    the minimum and maximum values of the columns in the table change.
    For example, a <type>timestamp</type> column that contains the time
    of row update will have a constantly-increasing maximum value as
    rows are added and updated; such a column will probably need more
    frequent statistics updates than, say, a column containing URLs for
    pages accessed on a website. The URL column might receive changes just
    as often, but the statistical distribution of its values probably
    changes relatively slowly.
   </para>

   <para>
    It is possible to run <command>ANALYZE</command> on specific tables and even
    just specific columns of a table, so the flexibility exists to update some
    statistics more frequently than others if your application requires it.
    In practice, however, it is usually best to just analyze the entire
    database, because it is a fast operation.  <command>ANALYZE</command> uses a
    statistically random sampling of the rows of a table rather than reading
    every single row.
   </para>

   <tip>
    <para>
     Although per-column tweaking of <command>ANALYZE</command> frequency might not be
     very productive, you might find it worthwhile to do per-column
     adjustment of the level of detail of the statistics collected by
     <command>ANALYZE</command>.  Columns that are heavily used in <literal>WHERE</literal>
     clauses and have highly irregular data distributions might require a
     finer-grain data histogram than other columns.  See <command>ALTER TABLE
     SET STATISTICS</command>, or change the database-wide default using the <xref
     linkend="guc-default-statistics-target"/> configuration parameter.
    </para>

    <para>
     Also, by default there is limited information available about
     the selectivity of functions.  However, if you create a statistics
     object or an expression
     index that uses a function call, useful statistics will be
     gathered about

Title: Updating Planner Statistics in PostgreSQL
Summary
This section describes the importance of updating planner statistics in PostgreSQL using the ANALYZE command, which gathers statistical information about table contents to generate good query plans, and discusses how the autovacuum daemon can automatically issue ANALYZE commands, as well as considerations for manual scheduling, column-specific analysis, and adjusting the level of detail for collected statistics.