Additional Considerations for Vacuum and Analyze

sampling of the rows of a table rather than reading every single row. </para> <tip> <para> Although per-column tweaking of <command>ANALYZE</command> frequency might not be very productive, you might find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal> clauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. See <command>ALTER TABLE SET STATISTICS</command>, or change the database-wide default using the <xref linkend="guc-default-statistics-target"/> configuration parameter. </para> <para> Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index. </para> </tip> <tip> <para> The autovacuum daemon does not issue <command>ANALYZE</command> commands for foreign tables, since it has no means of determining how often that might be useful. If your queries require statistics on foreign tables for proper planning, it's a good idea to run manually-managed <command>ANALYZE</command> commands on those tables on a suitable schedule. </para> </tip> <tip> <para> The autovacuum daemon does not issue <command>ANALYZE</command> commands for partitioned tables. Inheritance parents will only be analyzed if the parent itself is changed - changes to child tables do not trigger autoanalyze on the parent table. If your queries require statistics on parent tables for proper planning, it is necessary to periodically run a manual <command>ANALYZE</command> on those tables to keep the statistics up to date. </para> </tip> </sect2> <sect2 id="vacuum-for-visibility-map"> <title>Updating the Visibility Map</title> <para> Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each table to keep track of which pages contain only tuples that are known to be visible to all active transactions (and all future transactions, until the page is again modified). This has two purposes. First, vacuum itself can skip such pages on the next run, since there is nothing to clean up. </para> <para> Second, it allows <productname>PostgreSQL</productname> to answer some queries using only the index, without reference to the underlying table. Since <productname>PostgreSQL</productname> indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An <link linkend="indexes-index-only-scans"><firstterm>index-only scan</firstterm></link>, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large. </para> </sect2> <sect2 id="vacuum-for-wraparound"> <title>Preventing Transaction ID Wraparound Failures</title> <indexterm zone="vacuum-for-wraparound"> <primary>transaction ID</primary> <secondary>wraparound</secondary> </indexterm> <indexterm> <primary>wraparound</primary> <secondary>of transaction IDs</secondary> </indexterm> <para> <productname>PostgreSQL</productname>'s <link linkend="mvcc-intro">MVCC</link> transaction semantics depend on being able to compare transaction ID (<acronym>XID</acronym>)

This section discusses additional considerations for using the VACUUM and ANALYZE commands in PostgreSQL, including updating the visibility map to improve query performance, preventing transaction ID wraparound failures, and special considerations for foreign tables, partitioned tables, and index-only scans, as well as adjusting the level of detail for collected statistics and gathering statistics on functions.