Home Explore Blog CI



postgresql

12th chunk of `doc/src/sgml/btree.sgml`
8462f364b53257357dc348af4581a8a825e9fbb815ee40f40000000100000fa3
 sweep</quote> by a <command>VACUUM</command> operation (typically
   run in an autovacuum worker process) will eventually be required as
   a part of <emphasis>collective</emphasis> cleanup of the table and
   each of its indexes.
  </para>
  <para>
   Unlike <command>VACUUM</command>, bottom-up index deletion does not
   provide any strong guarantees about how old the oldest garbage
   index tuple may be.  No index can be permitted to retain
   <quote>floating garbage</quote> index tuples that became dead prior
   to a conservative cutoff point shared by the table and all of its
   indexes collectively.  This fundamental table-level invariant makes
   it safe to recycle table <acronym>TID</acronym>s.  This is how it
   is possible for distinct logical rows to reuse the same table
   <acronym>TID</acronym> over time (though this can never happen with
   two logical rows whose lifetimes span the same
   <command>VACUUM</command> cycle).
  </para>
 </sect3>

 <sect3 id="btree-deduplication">
  <title>Deduplication</title>
  <para>
   A duplicate is a leaf page tuple (a tuple that points to a table
   row) where <emphasis>all</emphasis> indexed key columns have values
   that match corresponding column values from at least one other leaf
   page tuple in the same index.  Duplicate tuples are quite common in
   practice.  B-Tree indexes can use a special, space-efficient
   representation for duplicates when an optional technique is
   enabled: <firstterm>deduplication</firstterm>.
  </para>
  <para>
   Deduplication works by periodically merging groups of duplicate
   tuples together, forming a single <firstterm>posting list</firstterm> tuple for each
   group.  The column key value(s) only appear once in this
   representation.  This is followed by a sorted array of
   <acronym>TID</acronym>s that point to rows in the table.  This
   significantly reduces the storage size of indexes where each value
   (or each distinct combination of column values) appears several
   times on average.  The latency of queries can be reduced
   significantly.  Overall query throughput may increase
   significantly.  The overhead of routine index vacuuming may also be
   reduced significantly.
  </para>
  <note>
   <para>
    B-Tree deduplication is just as effective with
    <quote>duplicates</quote> that contain a NULL value, even though
    NULL values are never equal to each other according to the
    <literal>=</literal> member of any B-Tree operator class.  As far
    as any part of the implementation that understands the on-disk
    B-Tree structure is concerned, NULL is just another value from the
    domain of indexed values.
   </para>
  </note>
  <para>
   The deduplication process occurs lazily, when a new item is
   inserted that cannot fit on an existing leaf page, though only when
   index tuple deletion could not free sufficient space for the new
   item (typically deletion is briefly considered and then skipped
   over).  Unlike GIN posting list tuples, B-Tree posting list tuples
   do not need to expand every time a new duplicate is inserted; they
   are merely an alternative physical representation of the original
   logical contents of the leaf page.  This design prioritizes
   consistent performance with mixed read-write workloads.  Most
   client applications will at least see a moderate performance
   benefit from using deduplication.  Deduplication is enabled by
   default.
  </para>
  <para>
   <command>CREATE INDEX</command> and <command>REINDEX</command>
   apply deduplication to create posting list tuples, though the
   strategy they use is slightly different.  Each group of duplicate
   ordinary tuples encountered in the sorted input taken from the
   table is merged into a posting list tuple
   <emphasis>before</emphasis> being added to the current pending leaf
   page.  Individual posting list tuples are packed with as many
   <acronym>TID</acronym>s as possible.  Leaf pages are written out in
   the usual way, without

Title: B-Tree Deduplication
Summary
This section discusses B-Tree deduplication, a technique to optimize storage and performance by merging duplicate leaf page tuples into a single posting list tuple containing the key value(s) and an array of TIDs. Deduplication is triggered when a new item cannot fit on a leaf page after index tuple deletion. It's effective with NULL values and prioritizes consistent performance with mixed read-write workloads, generally providing a performance benefit. CREATE INDEX and REINDEX also use deduplication when building indexes.