Deduplication in Unique Indexes and Implementation Restrictions

read-only workloads, since reading posting list tuples is at least as efficient as reading the standard tuple representation. Disabling deduplication isn't usually helpful. </para> <para> It is sometimes possible for unique indexes (as well as unique constraints) to use deduplication. This allows leaf pages to temporarily <quote>absorb</quote> extra version churn duplicates. Deduplication in unique indexes augments bottom-up index deletion, especially in cases where a long-running transaction holds a snapshot that blocks garbage collection. The goal is to buy time for the bottom-up index deletion strategy to become effective again. Delaying page splits until a single long-running transaction naturally goes away can allow a bottom-up deletion pass to succeed where an earlier deletion pass failed. </para> <tip> <para> A special heuristic is applied to determine whether a deduplication pass in a unique index should take place. It can often skip straight to splitting a leaf page, avoiding a performance penalty from wasting cycles on unhelpful deduplication passes. If you're concerned about the overhead of deduplication, consider setting <literal>deduplicate_items = off</literal> selectively. Leaving deduplication enabled in unique indexes has little downside. </para> </tip> <para> Deduplication cannot be used in all cases due to implementation-level restrictions. Deduplication safety is determined when <command>CREATE INDEX</command> or <command>REINDEX</command> is run. </para> <para> Note that deduplication is deemed unsafe and cannot be used in the following cases involving semantically significant differences among equal datums: </para> <para> <itemizedlist> <listitem> <para> <type>text</type>, <type>varchar</type>, and <type>char</type> cannot use deduplication when a <emphasis>nondeterministic</emphasis> collation is used. Case and accent differences must be preserved among equal datums. </para> </listitem> <listitem> <para> <type>numeric</type> cannot use deduplication. Numeric display scale must be preserved among equal datums. </para> </listitem> <listitem> <para> <type>jsonb</type> cannot use deduplication, since the <type>jsonb</type> B-Tree operator class uses <type>numeric</type> internally. </para> </listitem> <listitem> <para> <type>float4</type> and <type>float8</type> cannot use deduplication. These types have distinct representations for <literal>-0</literal> and <literal>0</literal>, which are nevertheless considered equal. This difference must be preserved. </para> </listitem> </itemizedlist> </para> <para> There is one further implementation-level restriction that may be lifted in a future version of <productname>PostgreSQL</productname>: </para> <para> <itemizedlist> <listitem> <para> Container types (such as composite types, arrays, or range types) cannot use deduplication. </para> </listitem> </itemizedlist> </para> <para> There is one further implementation-level restriction that applies regardless of the operator class or collation used: </para> <para> <itemizedlist> <listitem> <para> <literal>INCLUDE</literal> indexes can never use deduplication. </para> </listitem> </itemizedlist> </para> </sect3> </sect2> </sect1>

This section discusses the use of deduplication in unique indexes, emphasizing its role in handling version churn and aiding bottom-up index deletion. It advises selectively disabling deduplication if overhead is a concern, but generally recommends leaving it enabled in unique indexes. Furthermore, it outlines specific cases where deduplication is deemed unsafe and cannot be used due to implementation-level restrictions, including nondeterministic collations for text types, numeric types, jsonb types, float types, container types, and INCLUDE indexes.