Home Explore Blog CI



postgresql

36th chunk of `doc/src/sgml/textsearch.sgml`
05a795c99b1b4f5c4ed6275c5671b911ef691459289197250000000100000fa4
 <sect3 id="textsearch-thesaurus-config">
   <title>Thesaurus Configuration</title>

   <para>
    To define a new thesaurus dictionary, use the <literal>thesaurus</literal>
    template.  For example:

<programlisting>
CREATE TEXT SEARCH DICTIONARY thesaurus_simple (
    TEMPLATE = thesaurus,
    DictFile = mythesaurus,
    Dictionary = pg_catalog.english_stem
);
</programlisting>

    Here:
    <itemizedlist  spacing="compact" mark="bullet">
     <listitem>
      <para>
       <literal>thesaurus_simple</literal> is the new dictionary's name
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>mythesaurus</literal> is the base name of the thesaurus
       configuration file.
       (Its full name will be <filename>$SHAREDIR/tsearch_data/mythesaurus.ths</filename>,
       where <literal>$SHAREDIR</literal> means the installation shared-data
       directory.)
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>pg_catalog.english_stem</literal> is the subdictionary (here,
       a Snowball English stemmer) to use for thesaurus normalization.
       Notice that the subdictionary will have its own
       configuration (for example, stop words), which is not shown here.
      </para>
     </listitem>
    </itemizedlist>

    Now it is possible to bind the thesaurus dictionary <literal>thesaurus_simple</literal>
    to the desired token types in a configuration, for example:

<programlisting>
ALTER TEXT SEARCH CONFIGURATION russian
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
    WITH thesaurus_simple;
</programlisting>
   </para>

  </sect3>

  <sect3 id="textsearch-thesaurus-examples">
   <title>Thesaurus Example</title>

   <para>
    Consider a simple astronomical thesaurus <literal>thesaurus_astro</literal>,
    which contains some astronomical word combinations:

<programlisting>
supernovae stars : sn
crab nebulae : crab
</programlisting>

    Below we create a dictionary and bind some token types to
    an astronomical thesaurus and English stemmer:

<programlisting>
CREATE TEXT SEARCH DICTIONARY thesaurus_astro (
    TEMPLATE = thesaurus,
    DictFile = thesaurus_astro,
    Dictionary = english_stem
);

ALTER TEXT SEARCH CONFIGURATION russian
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
    WITH thesaurus_astro, english_stem;
</programlisting>

    Now we can see how it works.
    <function>ts_lexize</function> is not very useful for testing a thesaurus,
    because it treats its input as a single token.  Instead we can use
    <function>plainto_tsquery</function> and <function>to_tsvector</function>
    which will break their input strings into multiple tokens:

<screen>
SELECT plainto_tsquery('supernova star');
 plainto_tsquery
-----------------
 'sn'

SELECT to_tsvector('supernova star');
 to_tsvector
-------------
 'sn':1
</screen>

    In principle, one can use <function>to_tsquery</function> if you quote
    the argument:

<screen>
SELECT to_tsquery('''supernova star''');
 to_tsquery
------------
 'sn'
</screen>

    Notice that <literal>supernova star</literal> matches <literal>supernovae
    stars</literal> in <literal>thesaurus_astro</literal> because we specified
    the <literal>english_stem</literal> stemmer in the thesaurus definition.
    The stemmer removed the <literal>e</literal> and <literal>s</literal>.
   </para>

   <para>
    To index the original phrase as well as the substitute, just include it
    in the right-hand part of the definition:

<screen>
supernovae stars : sn supernovae stars

SELECT plainto_tsquery('supernova star');
       plainto_tsquery
-----------------------------
 'sn' &amp; 'supernova' &amp; 'star'
</screen>
   </para>

  </sect3>

  </sect2>

  <sect2 id="textsearch-ispell-dictionary">
   <title><application>Ispell</application> Dictionary</title>

   <para>
    The <application>Ispell</application> dictionary template supports
    <firstterm>morphological dictionaries</firstterm>, which can normalize many

Title: Thesaurus Configuration and Examples
Summary
This section explains how to configure and use a thesaurus dictionary in PostgreSQL. It details the use of the `thesaurus` template to define a new dictionary, specifying the configuration file and subdictionary for normalization. It provides an example of binding the thesaurus dictionary to specific token types. The section also presents a practical example with an astronomical thesaurus, demonstrating how to create the dictionary, bind token types, and test its functionality using `plainto_tsquery` and `to_tsvector`. Additionally, it explains how to index the original phrase along with the substitute. Finally, it transitions to the Ispell dictionary.