statements, where concentrated build up
of obsolete versions in particular pages would occur otherwise. A
page split will usually be avoided, though it's possible that
certain implementation-level heuristics will fail to identify and
delete even one garbage index tuple (in which case a page split or
deduplication pass resolves the issue of an incoming new tuple not
fitting on a leaf page). The worst-case number of versions that
any index scan must traverse (for any single logical row) is an
important contributor to overall system responsiveness and
throughput. A bottom-up index deletion pass targets suspected
garbage tuples in a single leaf page based on
<emphasis>qualitative</emphasis> distinctions involving logical
rows and versions. This contrasts with the <quote>top-down</quote>
index cleanup performed by autovacuum workers, which is triggered
when certain <emphasis>quantitative</emphasis> table-level
thresholds are exceeded (see <xref linkend="autovacuum"/>).
</para>
<note>
<para>
Not all deletion operations that are performed within B-Tree
indexes are bottom-up deletion operations. There is a distinct
category of index tuple deletion: <firstterm>simple index tuple
deletion</firstterm>. This is a deferred maintenance operation
that deletes index tuples that are known to be safe to delete
(those whose item identifier's <literal>LP_DEAD</literal> bit is
already set). Like bottom-up index deletion, simple index
deletion takes place at the point that a page split is anticipated
as a way of avoiding the split.
</para>
<para>
Simple deletion is opportunistic in the sense that it can only
take place when recent index scans set the
<literal>LP_DEAD</literal> bits of affected items in passing.
Prior to <productname>PostgreSQL</productname> 14, the only
category of B-Tree deletion was simple deletion. The main
differences between it and bottom-up deletion are that only the
former is opportunistically driven by the activity of passing
index scans, while only the latter specifically targets version
churn from <command>UPDATE</command>s that do not logically modify
indexed columns.
</para>
</note>
<para>
Bottom-up index deletion performs the vast majority of all garbage
index tuple cleanup for particular indexes with certain workloads.
This is expected with any B-Tree index that is subject to
significant version churn from <command>UPDATE</command>s that
rarely or never logically modify the columns that the index covers.
The average and worst-case number of versions per logical row can
be kept low purely through targeted incremental deletion passes.
It's quite possible that the on-disk size of certain indexes will
never increase by even one single page/block despite
<emphasis>constant</emphasis> version churn from
<command>UPDATE</command>s. Even then, an exhaustive <quote>clean
sweep</quote> by a <command>VACUUM</command> operation (typically
run in an autovacuum worker process) will eventually be required as
a part of <emphasis>collective</emphasis> cleanup of the table and
each of its indexes.
</para>
<para>
Unlike <command>VACUUM</command>, bottom-up index deletion does not
provide any strong guarantees about how old the oldest garbage
index tuple may be. No index can be permitted to retain
<quote>floating garbage</quote> index tuples that became dead prior
to a conservative cutoff point shared by the table and all of its
indexes collectively. This fundamental table-level invariant makes
it safe to recycle table <acronym>TID</acronym>s. This is how it
is possible for distinct logical rows to reuse the same table
<acronym>TID</acronym> over time (though this can never happen with
two logical rows whose lifetimes span the same
<command>VACUUM</command> cycle).
</para>
</sect3>
<sect3 id="btree-deduplication">