Bottom-Up Index Deletion in Detail and Simple Index Tuple Deletion

statements, where concentrated build up of obsolete versions in particular pages would occur otherwise. A page split will usually be avoided, though it's possible that certain implementation-level heuristics will fail to identify and delete even one garbage index tuple (in which case a page split or deduplication pass resolves the issue of an incoming new tuple not fitting on a leaf page). The worst-case number of versions that any index scan must traverse (for any single logical row) is an important contributor to overall system responsiveness and throughput. A bottom-up index deletion pass targets suspected garbage tuples in a single leaf page based on <emphasis>qualitative</emphasis> distinctions involving logical rows and versions. This contrasts with the <quote>top-down</quote> index cleanup performed by autovacuum workers, which is triggered when certain <emphasis>quantitative</emphasis> table-level thresholds are exceeded (see <xref linkend="autovacuum"/>). </para> <note> <para> Not all deletion operations that are performed within B-Tree indexes are bottom-up deletion operations. There is a distinct category of index tuple deletion: <firstterm>simple index tuple deletion</firstterm>. This is a deferred maintenance operation that deletes index tuples that are known to be safe to delete (those whose item identifier's <literal>LP_DEAD</literal> bit is already set). Like bottom-up index deletion, simple index deletion takes place at the point that a page split is anticipated as a way of avoiding the split. </para> <para> Simple deletion is opportunistic in the sense that it can only take place when recent index scans set the <literal>LP_DEAD</literal> bits of affected items in passing. Prior to <productname>PostgreSQL</productname> 14, the only category of B-Tree deletion was simple deletion. The main differences between it and bottom-up deletion are that only the former is opportunistically driven by the activity of passing index scans, while only the latter specifically targets version churn from <command>UPDATE</command>s that do not logically modify indexed columns. </para> </note> <para> Bottom-up index deletion performs the vast majority of all garbage index tuple cleanup for particular indexes with certain workloads. This is expected with any B-Tree index that is subject to significant version churn from <command>UPDATE</command>s that rarely or never logically modify the columns that the index covers. The average and worst-case number of versions per logical row can be kept low purely through targeted incremental deletion passes. It's quite possible that the on-disk size of certain indexes will never increase by even one single page/block despite <emphasis>constant</emphasis> version churn from <command>UPDATE</command>s. Even then, an exhaustive <quote>clean sweep</quote> by a <command>VACUUM</command> operation (typically run in an autovacuum worker process) will eventually be required as a part of <emphasis>collective</emphasis> cleanup of the table and each of its indexes. </para> <para> Unlike <command>VACUUM</command>, bottom-up index deletion does not provide any strong guarantees about how old the oldest garbage index tuple may be. No index can be permitted to retain <quote>floating garbage</quote> index tuples that became dead prior to a conservative cutoff point shared by the table and all of its indexes collectively. This fundamental table-level invariant makes it safe to recycle table <acronym>TID</acronym>s. This is how it is possible for distinct logical rows to reuse the same table <acronym>TID</acronym> over time (though this can never happen with two logical rows whose lifetimes span the same <command>VACUUM</command> cycle). </para> </sect3> <sect3 id="btree-deduplication">

This section details bottom-up index deletion, which targets garbage tuples on leaf pages based on qualitative distinctions. This contrasts with autovacuum's top-down approach, which is triggered by quantitative thresholds. Simple index tuple deletion is also described, which is a deferred maintenance operation that deletes index tuples marked as safe to delete. Bottom-up deletion is particularly useful for indexes experiencing version churn from UPDATEs that don't modify indexed columns, potentially maintaining a stable index size despite constant updates. However, unlike VACUUM, it doesn't guarantee the age of the oldest garbage tuple; a collective cleanup by VACUUM is eventually required to maintain table-level invariants.