Home Explore Blog CI



postgresql

11th chunk of `doc/src/sgml/btree.sgml`
a89172afdc3fef78590c018e05998c4ecbacc1ccfabb4d6c0000000100000faf
 statements, where concentrated build up
   of obsolete versions in particular pages would occur otherwise.  A
   page split will usually be avoided, though it's possible that
   certain implementation-level heuristics will fail to identify and
   delete even one garbage index tuple (in which case a page split or
   deduplication pass resolves the issue of an incoming new tuple not
   fitting on a leaf page).  The worst-case number of versions that
   any index scan must traverse (for any single logical row) is an
   important contributor to overall system responsiveness and
   throughput.  A bottom-up index deletion pass targets suspected
   garbage tuples in a single leaf page based on
   <emphasis>qualitative</emphasis> distinctions involving logical
   rows and versions.  This contrasts with the <quote>top-down</quote>
   index cleanup performed by autovacuum workers, which is triggered
   when certain <emphasis>quantitative</emphasis> table-level
   thresholds are exceeded (see <xref linkend="autovacuum"/>).
  </para>
  <note>
   <para>
    Not all deletion operations that are performed within B-Tree
    indexes are bottom-up deletion operations.  There is a distinct
    category of index tuple deletion: <firstterm>simple index tuple
     deletion</firstterm>.  This is a deferred maintenance operation
    that deletes index tuples that are known to be safe to delete
    (those whose item identifier's <literal>LP_DEAD</literal> bit is
    already set).  Like bottom-up index deletion, simple index
    deletion takes place at the point that a page split is anticipated
    as a way of avoiding the split.
   </para>
   <para>
    Simple deletion is opportunistic in the sense that it can only
    take place when recent index scans set the
    <literal>LP_DEAD</literal> bits of affected items in passing.
    Prior to <productname>PostgreSQL</productname> 14, the only
    category of B-Tree deletion was simple deletion.  The main
    differences between it and bottom-up deletion are that only the
    former is opportunistically driven by the activity of passing
    index scans, while only the latter specifically targets version
    churn from <command>UPDATE</command>s that do not logically modify
    indexed columns.
   </para>
  </note>
  <para>
   Bottom-up index deletion performs the vast majority of all garbage
   index tuple cleanup for particular indexes with certain workloads.
   This is expected with any B-Tree index that is subject to
   significant version churn from <command>UPDATE</command>s that
   rarely or never logically modify the columns that the index covers.
   The average and worst-case number of versions per logical row can
   be kept low purely through targeted incremental deletion passes.
   It's quite possible that the on-disk size of certain indexes will
   never increase by even one single page/block despite
   <emphasis>constant</emphasis> version churn from
   <command>UPDATE</command>s.  Even then, an exhaustive <quote>clean
    sweep</quote> by a <command>VACUUM</command> operation (typically
   run in an autovacuum worker process) will eventually be required as
   a part of <emphasis>collective</emphasis> cleanup of the table and
   each of its indexes.
  </para>
  <para>
   Unlike <command>VACUUM</command>, bottom-up index deletion does not
   provide any strong guarantees about how old the oldest garbage
   index tuple may be.  No index can be permitted to retain
   <quote>floating garbage</quote> index tuples that became dead prior
   to a conservative cutoff point shared by the table and all of its
   indexes collectively.  This fundamental table-level invariant makes
   it safe to recycle table <acronym>TID</acronym>s.  This is how it
   is possible for distinct logical rows to reuse the same table
   <acronym>TID</acronym> over time (though this can never happen with
   two logical rows whose lifetimes span the same
   <command>VACUUM</command> cycle).
  </para>
 </sect3>

 <sect3 id="btree-deduplication">

Title: Bottom-Up Index Deletion in Detail and Simple Index Tuple Deletion
Summary
This section details bottom-up index deletion, which targets garbage tuples on leaf pages based on qualitative distinctions. This contrasts with autovacuum's top-down approach, which is triggered by quantitative thresholds. Simple index tuple deletion is also described, which is a deferred maintenance operation that deletes index tuples marked as safe to delete. Bottom-up deletion is particularly useful for indexes experiencing version churn from UPDATEs that don't modify indexed columns, potentially maintaining a stable index size despite constant updates. However, unlike VACUUM, it doesn't guarantee the age of the oldest garbage tuple; a collective cleanup by VACUUM is eventually required to maintain table-level invariants.