GiST Index Implementation and Build Methods

function is provided by <productname>PostgreSQL</productname>: <literal>gist_translate_cmptype_common</literal> is for operator classes that use the <literal>RT*StrategyNumber</literal> constants. The <literal>btree_gist</literal> extension defines a second translation function, <literal>gist_translate_cmptype_btree</literal>, for operator classes that use the <literal>BT*StrategyNumber</literal> constants. </para> </listitem> </varlistentry> </variablelist> <para> All the GiST support methods are normally called in short-lived memory contexts; that is, <varname>CurrentMemoryContext</varname> will get reset after each tuple is processed. It is therefore not very important to worry about pfree'ing everything you palloc. However, in some cases it's useful for a support method to cache data across repeated calls. To do that, allocate the longer-lived data in <literal>fcinfo->flinfo->fn_mcxt</literal>, and keep a pointer to it in <literal>fcinfo->flinfo->fn_extra</literal>. Such data will survive for the life of the index operation (e.g., a single GiST index scan, index build, or index tuple insertion). Be careful to pfree the previous value when replacing a <literal>fn_extra</literal> value, or the leak will accumulate for the duration of the operation. </para> </sect2> <sect2 id="gist-implementation"> <title>Implementation</title> <sect3 id="gist-buffering-build"> <title>GiST Index Build Methods</title> <para> The simplest way to build a GiST index is just to insert all the entries, one by one. This tends to be slow for large indexes, because if the index tuples are scattered across the index and the index is large enough to not fit in cache, a lot of random I/O will be needed. <productname>PostgreSQL</productname> supports two alternative methods for initial build of a GiST index: <firstterm>sorted</firstterm> and <firstterm>buffered</firstterm> modes. </para> <para> The sorted method is only available if each of the opclasses used by the index provides a <function>sortsupport</function> function, as described in <xref linkend="gist-extensibility"/>. If they do, this method is usually the best, so it is used by default. </para> <para> The buffered method works by not inserting tuples directly into the index right away. It can dramatically reduce the amount of random I/O needed for non-ordered data sets. For well-ordered data sets the benefit is smaller or non-existent, because only a small number of pages receive new tuples at a time, and those pages fit in cache even if the index as a whole does not. </para> <para> The buffered method needs to call

This passage discusses the implementation of GiST indexes in PostgreSQL, including memory management for support methods, and describes two alternative methods for building GiST indexes: sorted and buffered modes, which can improve performance by reducing random I/O for large indexes.