Skip Support Function, B-Tree Implementation, and B-Tree Structure

options. The options can be accessed from other support functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros. </para> <para> Currently, no B-Tree operator class has an <function>options</function> support function. B-tree doesn't allow flexible representation of keys like GiST, SP-GiST, GIN and BRIN do. So, <function>options</function> probably doesn't have much application in the current B-tree index access method. Nevertheless, this support function was added to B-tree for uniformity, and will probably find uses during further evolution of B-tree in <productname>PostgreSQL</productname>. </para> </listitem> </varlistentry> <varlistentry> <term><function>skipsupport</function></term> <listitem> <para> Optionally, a btree operator family may provide a <firstterm>skip support</firstterm> function, registered under support function number 6. These functions give the B-tree code a way to iterate through every possible value that can be represented by an operator class's underlying input type, in key space order. This is used by the core code when it applies the skip scan optimization. The APIs involved in this are defined in <filename>src/include/utils/skipsupport.h</filename>. </para> <para> Operator classes that do not provide a skip support function are still eligible to use skip scan. The core code can still use its fallback strategy, though that might be suboptimal for some discrete types. It usually doesn't make sense (and may not even be feasible) for operator classes on continuous types to provide a skip support function. </para> <para> It is not sensible for an operator family to register a cross-type <function>skipsupport</function> function, and attempting to do so will result in an error. This is because determining the next indexable value must happen by incrementing a value copied from an index tuple. The values generated must all be of the same underlying data type (the <quote>skipped</quote> index column's opclass input type). </para> </listitem> </varlistentry> </variablelist> </sect2> <sect2 id="btree-implementation"> <title>Implementation</title> <para> This section covers B-Tree index implementation details that may be of use to advanced users. See <filename>src/backend/access/nbtree/README</filename> in the source distribution for a much more detailed, internals-focused description of the B-Tree implementation. </para> <sect3 id="btree-structure"> <title>B-Tree Structure</title> <para> <productname>PostgreSQL</productname> B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly-linked list of pages. A single metapage is stored in a fixed position at the start of the first segment file of the index. All other pages are either leaf pages or internal pages. Leaf pages are the pages on the lowest level of the tree. All other levels consist of internal pages. Each leaf page contains tuples that point to table rows. Each internal page contains tuples that point to the next level down in the tree. Typically, over 99% of all pages are leaf pages. Both internal pages and leaf pages use the standard page format described in <xref linkend="storage-page-layout"/>. </para> <para> New leaf pages are added to a B-Tree index when an existing leaf page cannot fit an incoming tuple. A <firstterm>page split</firstterm> operation makes room for items that originally belonged on the overflowing page by moving a portion of the items to a new page. Page splits must also insert a new <firstterm>downlink</firstterm> to the new page in the parent page, which may cause the parent to split in turn. Page splits <quote>cascade upwards</quote>

This section discusses the optional 'skip support' function in B-tree operator families, used for iterating through all possible values of an operator class's input type in key space order, aiding the skip scan optimization. It notes that operator classes without this function can still use skip scan via a fallback strategy. The section emphasizes that cross-type skipsupport functions are not sensible and will result in an error. The section then transitions to the implementation details of B-Tree indexes in PostgreSQL, referencing the README file for internals-focused descriptions, and explains the multi-level tree structure, including the metapage, leaf pages, and internal pages. It describes leaf pages containing tuples that point to table rows, and internal pages containing tuples that point to the next level down in the tree. The section concludes by explaining how new leaf pages are added and how page splits cause new downlinks in the parent page.