GIN Index Implementation and Optimization

three support functions use the opclass's indexed data type for the <literal>query</literal> argument, even though the actual type might be something else depending on the operator. </para> </sect2> <sect2 id="gin-implementation"> <title>Implementation</title> <para> Internally, a <acronym>GIN</acronym> index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items (a member of an array, for example) and where each tuple in a leaf page contains either a pointer to a B-tree of heap pointers (a <quote>posting tree</quote>), or a simple list of heap pointers (a <quote>posting list</quote>) when the list is small enough to fit into a single index tuple along with the key value. <xref linkend="gin-internals-figure"/> illustrates these components of a GIN index. </para> <para> As of <productname>PostgreSQL</productname> 9.1, null key values can be included in the index. Also, placeholder nulls are included in the index for indexed items that are null or contain no keys according to <function>extractValue</function>. This allows searches that should find empty items to do so. </para> <para> Multicolumn <acronym>GIN</acronym> indexes are implemented by building a single B-tree over composite values (column number, key value). The key values for different columns can be of different types. </para> <figure id="gin-internals-figure"> <title>GIN Internals</title> <mediaobject> <imageobject> <imagedata fileref="images/gin.svg" format="SVG" width="100%"/> </imageobject> </mediaobject> </figure> <sect3 id="gin-fast-update"> <title>GIN Fast Update Technique</title> <para> Updating a <acronym>GIN</acronym> index tends to be slow because of the intrinsic nature of inverted indexes: inserting or updating one heap row can cause many inserts into the index (one for each key extracted from the indexed item). <acronym>GIN</acronym> is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. When the table is vacuumed or autoanalyzed, or when <function>gin_clean_pending_list</function> function is called, or if the pending list becomes larger than <xref linkend="guc-gin-pending-list-limit"/>, the entries are moved to the main <acronym>GIN</acronym> data structure using the same bulk insert techniques used during initial index creation. This greatly improves <acronym>GIN</acronym> index update speed, even counting the additional vacuum overhead. Moreover the overhead work can be done by a background process instead of in foreground query processing. </para> <para> The main disadvantage of this approach is that searches must scan the list of pending entries in addition to searching the regular index, and so a large list of pending entries will slow searches significantly. Another disadvantage is that, while most updates are fast, an update that causes the pending list to become <quote>too large</quote> will incur an immediate cleanup cycle and thus be much slower than other updates. Proper use of autovacuum can minimize both of these problems. </para> <para> If consistent response time is more important than update speed, use of pending entries can be disabled by turning off the <literal>fastupdate</literal> storage parameter for a <acronym>GIN</acronym> index. See <xref linkend="sql-createindex"/> for details. </para> </sect3> <sect3 id="gin-partial-match"> <title>Partial Match Algorithm</title> <para> GIN can support <quote>partial match</quote> queries, in which the query does not determine an exact match for one or more keys, but the possible matches fall within a reasonably narrow range of key values (within the key sorting order determined by the <function>compare</function> support method). The <function>extractQuery</function> method, instead of

A GIN index uses a B-tree structure with posting trees or lists to store key values and heap pointers, and supports null key values, multicolumn indexes, and fast update techniques by postponing work to a temporary pending list, which can be configured and optimized for trade-offs between update speed and search performance