Home Explore Blog CI



postgresql

14th chunk of `doc/src/sgml/btree.sgml`
e53a5eb2efc743ce64f93cd26002f87cdec8cbea2870ddd00000000100000e0b
 read-only
   workloads, since reading posting list tuples is at least as
   efficient as reading the standard tuple representation.  Disabling
   deduplication isn't usually helpful.
  </para>
  <para>
   It is sometimes possible for unique indexes (as well as unique
   constraints) to use deduplication.  This allows leaf pages to
   temporarily <quote>absorb</quote> extra version churn duplicates.
   Deduplication in unique indexes augments bottom-up index deletion,
   especially in cases where a long-running transaction holds a
   snapshot that blocks garbage collection.  The goal is to buy time
   for the bottom-up index deletion strategy to become effective
   again.  Delaying page splits until a single long-running
   transaction naturally goes away can allow a bottom-up deletion pass
   to succeed where an earlier deletion pass failed.
  </para>
  <tip>
   <para>
    A special heuristic is applied to determine whether a
    deduplication pass in a unique index should take place.  It can
    often skip straight to splitting a leaf page, avoiding a
    performance penalty from wasting cycles on unhelpful deduplication
    passes.  If you're concerned about the overhead of deduplication,
    consider setting <literal>deduplicate_items = off</literal>
    selectively.  Leaving deduplication enabled in unique indexes has
    little downside.
   </para>
  </tip>
  <para>
   Deduplication cannot be used in all cases due to
   implementation-level restrictions.  Deduplication safety is
   determined when <command>CREATE INDEX</command> or
   <command>REINDEX</command> is run.
  </para>
  <para>
   Note that deduplication is deemed unsafe and cannot be used in the
   following cases involving semantically significant differences
   among equal datums:
  </para>
  <para>
   <itemizedlist>
    <listitem>
     <para>
      <type>text</type>, <type>varchar</type>, and <type>char</type>
      cannot use deduplication when a
      <emphasis>nondeterministic</emphasis> collation is used.  Case
      and accent differences must be preserved among equal datums.
     </para>
    </listitem>

    <listitem>
     <para>
      <type>numeric</type> cannot use deduplication.  Numeric display
      scale must be preserved among equal datums.
     </para>
    </listitem>

    <listitem>
     <para>
      <type>jsonb</type> cannot use deduplication, since the
      <type>jsonb</type> B-Tree operator class uses
      <type>numeric</type> internally.
     </para>
    </listitem>

    <listitem>
     <para>
      <type>float4</type> and <type>float8</type> cannot use
      deduplication.  These types have distinct representations for
      <literal>-0</literal> and <literal>0</literal>, which are
      nevertheless considered equal.  This difference must be
      preserved.
     </para>
    </listitem>
   </itemizedlist>
  </para>
  <para>
   There is one further implementation-level restriction that may be
   lifted in a future version of
   <productname>PostgreSQL</productname>:
  </para>
  <para>
   <itemizedlist>
    <listitem>
     <para>
      Container types (such as composite types, arrays, or range
      types) cannot use deduplication.
     </para>
    </listitem>
   </itemizedlist>
  </para>
  <para>
   There is one further implementation-level restriction that applies
   regardless of the operator class or collation used:
  </para>
  <para>
   <itemizedlist>
    <listitem>
     <para>
      <literal>INCLUDE</literal> indexes can never use deduplication.
     </para>
    </listitem>
   </itemizedlist>
  </para>

 </sect3>
</sect2>

</sect1>

Title: Deduplication in Unique Indexes and Implementation Restrictions
Summary
This section discusses the use of deduplication in unique indexes, emphasizing its role in handling version churn and aiding bottom-up index deletion. It advises selectively disabling deduplication if overhead is a concern, but generally recommends leaving it enabled in unique indexes. Furthermore, it outlines specific cases where deduplication is deemed unsafe and cannot be used due to implementation-level restrictions, including nondeterministic collations for text types, numeric types, jsonb types, float types, container types, and INCLUDE indexes.