Home Explore Blog CI



postgresql

6th chunk of `doc/src/sgml/pgtrgm.sgml`
af28c203e560c1c4030c9c439a83dec9b880d286d2540ef40000000100000fa5
 very fast similarity searches.  These index types support
   the above-described similarity operators, and additionally support
   trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
   <literal>~</literal>, <literal>~*</literal> and <literal>=</literal> queries.
   The similarity comparisons are case-insensitive in a default build of
   <filename>pg_trgm</filename>.
   Inequality operators are not supported.
   Note that those indexes may not be as efficient as regular B-tree indexes
   for equality operator.
  </para>

  <para>
   Example:

<programlisting>
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops);
</programlisting>
or
<programlisting>
CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);
</programlisting>
  </para>

  <para>
   <literal>gist_trgm_ops</literal> GiST opclass approximates a set of
   trigrams as a bitmap signature.  Its optional integer parameter
   <literal>siglen</literal> determines the
   signature length in bytes.  The default length is 12 bytes.
   Valid values of signature length are between 1 and 2024 bytes.  Longer
   signatures lead to a more precise search (scanning a smaller fraction of the index and
   fewer heap pages), at the cost of a larger index.
  </para>

  <para>
   Example of creating such an index with a signature length of 32 bytes:
  </para>
<programlisting>
CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops(siglen=32));
</programlisting>

  <para>
   At this point, you will have an index on the <structfield>t</structfield> column that
   you can use for similarity searching.  A typical query is
<programlisting>
SELECT t, similarity(t, '<replaceable>word</replaceable>') AS sml
  FROM test_trgm
  WHERE t % '<replaceable>word</replaceable>'
  ORDER BY sml DESC, t;
</programlisting>
   This will return all values in the text column that are sufficiently
   similar to <replaceable>word</replaceable>, sorted from best match to worst.  The
   index will be used to make this a fast operation even over very large data
   sets.
  </para>

  <para>
   A variant of the above query is
<programlisting>
SELECT t, t &lt;-&gt; '<replaceable>word</replaceable>' AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;
</programlisting>
   This can be implemented quite efficiently by GiST indexes, but not
   by GIN indexes.  It will usually beat the first formulation when only
   a small number of the closest matches is wanted.
  </para>

  <para>
   Also you can use an index on the <structfield>t</structfield> column for word
   similarity or strict word similarity.  Typical queries are:
<programlisting>
SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
  FROM test_trgm
  WHERE '<replaceable>word</replaceable>' &lt;% t
  ORDER BY sml DESC, t;
</programlisting>
   and
<programlisting>
SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml
  FROM test_trgm
  WHERE '<replaceable>word</replaceable>' &lt;&lt;% t
  ORDER BY sml DESC, t;
</programlisting>
   This will return all values in the text column for which there is a
   continuous extent in the corresponding ordered trigram set that is
   sufficiently similar to the trigram set of <replaceable>word</replaceable>,
   sorted from best match to worst.  The index will be used to make this
   a fast operation even over very large data sets.
  </para>

  <para>
   Possible variants of the above queries are:
<programlisting>
SELECT t, '<replaceable>word</replaceable>' &lt;&lt;-&gt; t AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;
</programlisting>
   and
<programlisting>
SELECT t, '<replaceable>word</replaceable>' &lt;&lt;&lt;-&gt; t AS dist
  FROM test_trgm
  ORDER BY dist LIMIT 10;
</programlisting>
   This can be implemented quite efficiently by GiST indexes, but not
   by GIN indexes.
  </para>


  <para>
   Beginning in <productname>PostgreSQL</productname> 9.1, these index types also support
   index searches

Title: pg_trgm Index Usage and Examples
Summary
This section provides examples of using GiST and GIN indexes created with the pg_trgm extension for efficient similarity searches. It details how to query for similar text using the '%' operator and how to order results by similarity. It also explains how to use the '<->' operator with GiST indexes for finding the closest matches and provides examples for word and strict word similarity searches. It also discusses index searches.