Home Explore Blog CI



postgresql

8th chunk of `doc/src/sgml/gin.sgml`
89becfc3767db70c190f81faedf9d9ff7eae3af8609f9d9e0000000100000fa1
 three support functions use
  the opclass's indexed data type for the <literal>query</literal> argument, even
  though the actual type might be something else depending on the operator.
 </para>

</sect2>

<sect2 id="gin-implementation">
 <title>Implementation</title>

 <para>
  Internally, a <acronym>GIN</acronym> index contains a B-tree index
  constructed over keys, where each key is an element of one or more indexed
  items (a member of an array, for example) and where each tuple in a leaf
  page contains either a pointer to a B-tree of heap pointers (a
  <quote>posting tree</quote>), or a simple list of heap pointers (a <quote>posting
  list</quote>) when the list is small enough to fit into a single index tuple along
  with the key value.  <xref linkend="gin-internals-figure"/> illustrates
  these components of a GIN index.
 </para>

 <para>
  As of <productname>PostgreSQL</productname> 9.1, null key values can be
  included in the index.  Also, placeholder nulls are included in the index
  for indexed items that are null or contain no keys according to
  <function>extractValue</function>.  This allows searches that should find empty
  items to do so.
 </para>

 <para>
  Multicolumn <acronym>GIN</acronym> indexes are implemented by building
  a single B-tree over composite values (column number, key value).  The
  key values for different columns can be of different types.
 </para>

 <figure id="gin-internals-figure">
  <title>GIN Internals</title>
  <mediaobject>
   <imageobject>
    <imagedata fileref="images/gin.svg" format="SVG" width="100%"/>
   </imageobject>
  </mediaobject>
 </figure>

 <sect3 id="gin-fast-update">
  <title>GIN Fast Update Technique</title>

  <para>
   Updating a <acronym>GIN</acronym> index tends to be slow because of the
   intrinsic nature of inverted indexes: inserting or updating one heap row
   can cause many inserts into the index (one for each key extracted
   from the indexed item).
   <acronym>GIN</acronym> is capable of postponing much of this work by inserting
   new tuples into a temporary, unsorted list of pending entries.
   When the table is vacuumed or autoanalyzed, or when
   <function>gin_clean_pending_list</function> function is called, or if the
   pending list becomes larger than
   <xref linkend="guc-gin-pending-list-limit"/>, the entries are moved to the
   main <acronym>GIN</acronym> data structure using the same bulk insert
   techniques used during initial index creation.  This greatly improves
   <acronym>GIN</acronym> index update speed, even counting the additional
   vacuum overhead.  Moreover the overhead work can be done by a background
   process instead of in foreground query processing.
  </para>

  <para>
   The main disadvantage of this approach is that searches must scan the list
   of pending entries in addition to searching the regular index, and so
   a large list of pending entries will slow searches significantly.
   Another disadvantage is that, while most updates are fast, an update
   that causes the pending list to become <quote>too large</quote> will incur an
   immediate cleanup cycle and thus be much slower than other updates.
   Proper use of autovacuum can minimize both of these problems.
  </para>

  <para>
   If consistent response time is more important than update speed,
   use of pending entries can be disabled by turning off the
   <literal>fastupdate</literal> storage parameter for a
   <acronym>GIN</acronym> index.  See <xref linkend="sql-createindex"/>
   for details.
  </para>
 </sect3>

 <sect3 id="gin-partial-match">
  <title>Partial Match Algorithm</title>

  <para>
   GIN can support <quote>partial match</quote> queries, in which the query
   does not determine an exact match for one or more keys, but the possible
   matches fall within a reasonably narrow range of key values (within the
   key sorting order determined by the <function>compare</function> support method).
   The <function>extractQuery</function> method, instead of

Title: GIN Index Implementation and Optimization
Summary
A GIN index uses a B-tree structure with posting trees or lists to store key values and heap pointers, and supports null key values, multicolumn indexes, and fast update techniques by postponing work to a temporary pending list, which can be configured and optimized for trade-offs between update speed and search performance