pg_trgm Index Usage and Examples

very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>, <literal>~</literal>, <literal>~*</literal> and <literal>=</literal> queries. The similarity comparisons are case-insensitive in a default build of <filename>pg_trgm</filename>. Inequality operators are not supported. Note that those indexes may not be as efficient as regular B-tree indexes for equality operator. </para> <para> Example: <programlisting> CREATE TABLE test_trgm (t text); CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops); </programlisting> or <programlisting> CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops); </programlisting> </para> <para> <literal>gist_trgm_ops</literal> GiST opclass approximates a set of trigrams as a bitmap signature. Its optional integer parameter <literal>siglen</literal> determines the signature length in bytes. The default length is 12 bytes. Valid values of signature length are between 1 and 2024 bytes. Longer signatures lead to a more precise search (scanning a smaller fraction of the index and fewer heap pages), at the cost of a larger index. </para> <para> Example of creating such an index with a signature length of 32 bytes: </para> <programlisting> CREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops(siglen=32)); </programlisting> <para> At this point, you will have an index on the <structfield>t</structfield> column that you can use for similarity searching. A typical query is <programlisting> SELECT t, similarity(t, '<replaceable>word</replaceable>') AS sml FROM test_trgm WHERE t % '<replaceable>word</replaceable>' ORDER BY sml DESC, t; </programlisting> This will return all values in the text column that are sufficiently similar to <replaceable>word</replaceable>, sorted from best match to worst. The index will be used to make this a fast operation even over very large data sets. </para> <para> A variant of the above query is <programlisting> SELECT t, t <-> '<replaceable>word</replaceable>' AS dist FROM test_trgm ORDER BY dist LIMIT 10; </programlisting> This can be implemented quite efficiently by GiST indexes, but not by GIN indexes. It will usually beat the first formulation when only a small number of the closest matches is wanted. </para> <para> Also you can use an index on the <structfield>t</structfield> column for word similarity or strict word similarity. Typical queries are: <programlisting> SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml FROM test_trgm WHERE '<replaceable>word</replaceable>' <% t ORDER BY sml DESC, t; </programlisting> and <programlisting> SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml FROM test_trgm WHERE '<replaceable>word</replaceable>' <<% t ORDER BY sml DESC, t; </programlisting> This will return all values in the text column for which there is a continuous extent in the corresponding ordered trigram set that is sufficiently similar to the trigram set of <replaceable>word</replaceable>, sorted from best match to worst. The index will be used to make this a fast operation even over very large data sets. </para> <para> Possible variants of the above queries are: <programlisting> SELECT t, '<replaceable>word</replaceable>' <<-> t AS dist FROM test_trgm ORDER BY dist LIMIT 10; </programlisting> and <programlisting> SELECT t, '<replaceable>word</replaceable>' <<<-> t AS dist FROM test_trgm ORDER BY dist LIMIT 10; </programlisting> This can be implemented quite efficiently by GiST indexes, but not by GIN indexes. </para> <para> Beginning in <productname>PostgreSQL</productname> 9.1, these index types also support index searches

This section provides examples of using GiST and GIN indexes created with the pg_trgm extension for efficient similarity searches. It details how to query for similar text using the '%' operator and how to order results by similarity. It also explains how to use the '<->' operator with GiST indexes for finding the closest matches and provides examples for word and strict word similarity searches. It also discusses index searches.