very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
<literal>~</literal>, <literal>~*</literal> and <literal>=</literal> queries.
The similarity comparisons are case-insensitive in a default build of
<filename>pg_trgm</filename>.
Inequality operators are not supported.
Note that those indexes may not be as efficient as regular B-tree indexes
for equality operator.
</para>
<para>
Example:
<programlisting>
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops);
</programlisting>
or
<programlisting>
CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);
</programlisting>
</para>
<para>
<literal>gist_trgm_ops</literal> GiST opclass approximates a set of
trigrams as a bitmap signature. Its optional integer parameter
<literal>siglen</literal> determines the
signature length in bytes. The default length is 12 bytes.
Valid values of signature length are between 1 and 2024 bytes. Longer
signatures lead to a more precise search (scanning a smaller fraction of the index and
fewer heap pages), at the cost of a larger index.
</para>
<para>
Example of creating such an index with a signature length of 32 bytes:
</para>
<programlisting>
CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops(siglen=32));
</programlisting>
<para>
At this point, you will have an index on the <structfield>t</structfield> column that
you can use for similarity searching. A typical query is
<programlisting>
SELECT t, similarity(t, '<replaceable>word</replaceable>') AS sml
FROM test_trgm
WHERE t % '<replaceable>word</replaceable>'
ORDER BY sml DESC, t;
</programlisting>
This will return all values in the text column that are sufficiently
similar to <replaceable>word</replaceable>, sorted from best match to worst. The
index will be used to make this a fast operation even over very large data
sets.
</para>
<para>
A variant of the above query is
<programlisting>
SELECT t, t <-> '<replaceable>word</replaceable>' AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;
</programlisting>
This can be implemented quite efficiently by GiST indexes, but not
by GIN indexes. It will usually beat the first formulation when only
a small number of the closest matches is wanted.
</para>
<para>
Also you can use an index on the <structfield>t</structfield> column for word
similarity or strict word similarity. Typical queries are:
<programlisting>
SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
FROM test_trgm
WHERE '<replaceable>word</replaceable>' <% t
ORDER BY sml DESC, t;
</programlisting>
and
<programlisting>
SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml
FROM test_trgm
WHERE '<replaceable>word</replaceable>' <<% t
ORDER BY sml DESC, t;
</programlisting>
This will return all values in the text column for which there is a
continuous extent in the corresponding ordered trigram set that is
sufficiently similar to the trigram set of <replaceable>word</replaceable>,
sorted from best match to worst. The index will be used to make this
a fast operation even over very large data sets.
</para>
<para>
Possible variants of the above queries are:
<programlisting>
SELECT t, '<replaceable>word</replaceable>' <<-> t AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;
</programlisting>
and
<programlisting>
SELECT t, '<replaceable>word</replaceable>' <<<-> t AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;
</programlisting>
This can be implemented quite efficiently by GiST indexes, but not
by GIN indexes.
</para>
<para>
Beginning in <productname>PostgreSQL</productname> 9.1, these index types also support
index searches