Home Explore Blog CI



postgresql

46th chunk of `doc/src/sgml/textsearch.sgml`
4e48f2a1002f1805cb48679e25ce7af5abbf654eaff9412d0000000100000fa5
 dictionary <literal>thesaurus_astro</literal> does know the
     phrase <literal>supernovae stars</literal>, but <function>ts_lexize</function>
     fails since it does not parse the input text but treats it as a single
     token. Use <function>plainto_tsquery</function> or <function>to_tsvector</function> to
     test thesaurus dictionaries, for example:

<screen>
SELECT plainto_tsquery('supernovae stars');
 plainto_tsquery
-----------------
 'sn'
</screen>
    </para>
   </note>

  </sect2>

 </sect1>

 <sect1 id="textsearch-indexes">
  <title>Preferred Index Types for Text Search</title>

  <indexterm zone="textsearch-indexes">
   <primary>text search</primary>
   <secondary>indexes</secondary>
  </indexterm>

  <para>
   There are two kinds of indexes that can be used to speed up full text
   searches:
   <link linkend="gin"><acronym>GIN</acronym></link> and
   <link linkend="gist"><acronym>GiST</acronym></link>.
   Note that indexes are not mandatory for full text searching, but in
   cases where a column is searched on a regular basis, an index is
   usually desirable.
  </para>

  <para>
   To create such an index, do one of:

   <variablelist>

    <varlistentry>

     <term>
     <indexterm zone="textsearch-indexes">
      <primary>index</primary>
      <secondary>GIN</secondary>
      <tertiary>text search</tertiary>
     </indexterm>

      <literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIN (<replaceable>column</replaceable>);</literal>
     </term>

     <listitem>
      <para>
       Creates a GIN (Generalized Inverted Index)-based index.
       The <replaceable>column</replaceable> must be of <type>tsvector</type> type.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>

     <term>
     <indexterm zone="textsearch-indexes">
      <primary>index</primary>
      <secondary>GiST</secondary>
      <tertiary>text search</tertiary>
     </indexterm>

      <literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIST (<replaceable>column</replaceable> [ { DEFAULT | tsvector_ops } (siglen = <replaceable>number</replaceable>) ] );</literal>
     </term>

     <listitem>
      <para>
       Creates a GiST (Generalized Search Tree)-based index.
       The <replaceable>column</replaceable> can be of <type>tsvector</type> or
       <type>tsquery</type> type.
       Optional integer parameter <literal>siglen</literal> determines
       signature length in bytes (see below for details).
      </para>
     </listitem>
    </varlistentry>

   </variablelist>
  </para>

  <para>
   GIN indexes are the preferred text search index type.  As inverted
   indexes, they contain an index entry for each word (lexeme), with a
   compressed list of matching locations.  Multi-word searches can find
   the first match, then use the index to remove rows that are lacking
   additional words.  GIN indexes store only the words (lexemes) of
   <type>tsvector</type> values, and not their weight labels.  Thus a table
   row recheck is needed when using a query that involves weights.
  </para>

  <para>
   A GiST index is <firstterm>lossy</firstterm>, meaning that the index
   might produce false matches, and it is necessary
   to check the actual table row to eliminate such false matches.
   (<productname>PostgreSQL</productname> does this automatically when needed.)
   GiST indexes are lossy because each document is represented in the
   index by a fixed-length signature.  The signature length in bytes is determined
   by the value of the optional integer parameter <literal>siglen</literal>.
   The default signature length (when <literal>siglen</literal> is not specified) is
   124 bytes, the maximum signature length is 2024 bytes. The signature is generated by hashing
   each word into a single bit in an n-bit string, with all these bits OR-ed
   together to produce an n-bit document signature.  When two words hash to
   the same bit position

Title: Preferred Index Types for Text Search: GIN and GiST
Summary
This section discusses the use of GIN and GiST indexes to speed up full text searches in PostgreSQL. It describes how to create these indexes using `CREATE INDEX` statements, noting that GIN indexes are generally preferred. GIN indexes store each word with a compressed list of locations, requiring a table row recheck when weights are involved. GiST indexes, on the other hand, are lossy, using fixed-length signatures to represent documents. The signature length can be adjusted using the `siglen` parameter.