Home Explore Blog CI



postgresql

1st chunk of `doc/src/sgml/pgtrgm.sgml`
f26bda084a82b71314ef9e16bbb48976a3d77d53ba7df8470000000100000fa5
<!-- doc/src/sgml/pgtrgm.sgml -->

<sect1 id="pgtrgm" xreflabel="pg_trgm">
 <title>pg_trgm &mdash;
   support for similarity of text using trigram matching</title>

 <indexterm zone="pgtrgm">
  <primary>pg_trgm</primary>
 </indexterm>

 <para>
  The <filename>pg_trgm</filename> module provides functions and operators
  for determining the similarity of
  alphanumeric text based on trigram matching, as
  well as index operator classes that support fast searching for similar
  strings.
 </para>

 <para>
  This module is considered <quote>trusted</quote>, that is, it can be
  installed by non-superusers who have <literal>CREATE</literal> privilege
  on the current database.
 </para>

 <sect2 id="pgtrgm-concepts">
  <title>Trigram (or Trigraph) Concepts</title>

  <para>
   A trigram is a group of three consecutive characters taken
   from a string.  We can measure the similarity of two strings by
   counting the number of trigrams they share.  This simple idea
   turns out to be very effective for measuring the similarity of
   words in many natural languages.
  </para>

  <note>
   <para>
    <filename>pg_trgm</filename> ignores non-word characters
    (non-alphanumerics) when extracting trigrams from a string.
    Each word is considered to have two spaces
    prefixed and one space suffixed when determining the set
    of trigrams contained in the string.
    For example, the set of trigrams in the string
    <quote><literal>cat</literal></quote> is
    <quote><literal>  c</literal></quote>,
    <quote><literal> ca</literal></quote>,
    <quote><literal>cat</literal></quote>, and
    <quote><literal>at </literal></quote>.
    The set of trigrams in the string
    <quote><literal>foo|bar</literal></quote> is
    <quote><literal>  f</literal></quote>,
    <quote><literal> fo</literal></quote>,
    <quote><literal>foo</literal></quote>,
    <quote><literal>oo </literal></quote>,
    <quote><literal>  b</literal></quote>,
    <quote><literal> ba</literal></quote>,
    <quote><literal>bar</literal></quote>, and
    <quote><literal>ar </literal></quote>.
   </para>
  </note>
 </sect2>

 <sect2 id="pgtrgm-funcs-ops">
  <title>Functions and Operators</title>

  <para>
   The functions provided by the <filename>pg_trgm</filename> module
   are shown in <xref linkend="pgtrgm-func-table"/>, the operators
   in <xref linkend="pgtrgm-op-table"/>.
  </para>

  <table id="pgtrgm-func-table">
   <title><filename>pg_trgm</filename> Functions</title>
    <tgroup cols="1">
     <thead>
      <row>
       <entry role="func_table_entry"><para role="func_signature">
        Function
       </para>
       <para>
        Description
       </para></entry>
      </row>
     </thead>

     <tbody>
      <row>
       <entry role="func_table_entry"><para role="func_signature">
        <indexterm><primary>similarity</primary></indexterm>
        <function>similarity</function> ( <type>text</type>, <type>text</type> )
        <returnvalue>real</returnvalue>
       </para>
       <para>
        Returns a number that indicates how similar the two arguments are.
        The range of the result is zero (indicating that the two strings are
        completely dissimilar) to one (indicating that the two strings are
        identical).
       </para></entry>
      </row>

      <row>
       <entry role="func_table_entry"><para role="func_signature">
        <indexterm><primary>show_trgm</primary></indexterm>
        <function>show_trgm</function> ( <type>text</type> )
        <returnvalue>text[]</returnvalue>
       </para>
       <para>
        Returns an array of all the trigrams in the given string.
        (In practice this is seldom useful except for debugging.)
       </para></entry>
      </row>

      <row>
       <entry role="func_table_entry"><para role="func_signature">
        <indexterm><primary>word_similarity</primary></indexterm>
        <function>word_similarity</function> ( <type>text</type>, <type>text</type> )
        <returnvalue>real</returnvalue>

Title: pg_trgm Module: Similarity of Text Using Trigram Matching
Summary
The pg_trgm module provides functions and operators to determine the similarity of alphanumeric text based on trigram matching. It includes index operator classes for fast searching and is considered a trusted module. A trigram is a group of three consecutive characters from a string. The module ignores non-word characters and considers spaces prefixed and suffixed to each word when extracting trigrams. The module provides functions such as similarity and show_trgm, which measure the similarity between strings and display the trigrams in a string, respectively.