Home Explore Blog CI



postgresql

34th chunk of `doc/src/sgml/textsearch.sgml`
2e0c92086ba97364c916fadada02429b9e92058044c84eb50000000100000fa2
 <literal>synonym</literal> template is
    <literal>SYNONYMS</literal>, which is the base name of its configuration file
    &mdash; <literal>my_synonyms</literal> in the above example.
    The file's full name will be
    <filename>$SHAREDIR/tsearch_data/my_synonyms.syn</filename>
    (where <literal>$SHAREDIR</literal> means the
    <productname>PostgreSQL</productname> installation's shared-data directory).
    The file format is just one line
    per word to be substituted, with the word followed by its synonym,
    separated by white space.  Blank lines and trailing spaces are ignored.
   </para>

   <para>
    The <literal>synonym</literal> template also has an optional parameter
    <literal>CaseSensitive</literal>, which defaults to <literal>false</literal>.  When
    <literal>CaseSensitive</literal> is <literal>false</literal>, words in the synonym file
    are folded to lower case, as are input tokens.  When it is
    <literal>true</literal>, words and tokens are not folded to lower case,
    but are compared as-is.
   </para>

   <para>
    An asterisk (<literal>*</literal>) can be placed at the end of a synonym
    in the configuration file.  This indicates that the synonym is a prefix.
    The asterisk is ignored when the entry is used in
    <function>to_tsvector()</function>, but when it is used in
    <function>to_tsquery()</function>, the result will be a query item with
    the prefix match marker (see
    <xref linkend="textsearch-parsing-queries"/>).
    For example, suppose we have these entries in
    <filename>$SHAREDIR/tsearch_data/synonym_sample.syn</filename>:
<programlisting>
postgres        pgsql
postgresql      pgsql
postgre pgsql
gogle   googl
indices index*
</programlisting>
    Then we will get these results:
<screen>
mydb=# CREATE TEXT SEARCH DICTIONARY syn (template=synonym, synonyms='synonym_sample');
mydb=# SELECT ts_lexize('syn', 'indices');
 ts_lexize
-----------
 {index}
(1 row)

mydb=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);
mydb=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn;
mydb=# SELECT to_tsvector('tst', 'indices');
 to_tsvector
-------------
 'index':1
(1 row)

mydb=# SELECT to_tsquery('tst', 'indices');
 to_tsquery
------------
 'index':*
(1 row)

mydb=# SELECT 'indexes are very useful'::tsvector;
            tsvector
---------------------------------
 'are' 'indexes' 'useful' 'very'
(1 row)

mydb=# SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst', 'indices');
 ?column?
----------
 t
(1 row)
</screen>
   </para>
  </sect2>

  <sect2 id="textsearch-thesaurus">
   <title>Thesaurus Dictionary</title>

   <para>
    A thesaurus dictionary (sometimes abbreviated as <acronym>TZ</acronym>) is
    a collection of words that includes information about the relationships
    of words and phrases, i.e., broader terms (<acronym>BT</acronym>), narrower
    terms (<acronym>NT</acronym>), preferred terms, non-preferred terms, related
    terms, etc.
   </para>

   <para>
    Basically a thesaurus dictionary replaces all non-preferred terms by one
    preferred term and, optionally, preserves the original terms for indexing
    as well.  <productname>PostgreSQL</productname>'s current implementation of the
    thesaurus dictionary is an extension of the synonym dictionary with added
    <firstterm>phrase</firstterm> support.  A thesaurus dictionary requires
    a configuration file of the following format:

<programlisting>
# this is a comment
sample word(s) : indexed word(s)
more sample word(s) : more indexed word(s)
...
</programlisting>

    where  the colon (<symbol>:</symbol>) symbol acts as a delimiter between a
    phrase and its replacement.
   </para>

   <para>
    A thesaurus dictionary uses a <firstterm>subdictionary</firstterm> (which
    is specified in the dictionary's configuration) to normalize the input
    text before checking for phrase matches. It is only possible to select one
    subdictionary.  An error is reported if the

Title: Synonym Dictionary Configuration and Thesaurus Dictionary Introduction
Summary
The synonym dictionary's configuration file, specified by the SYNONYMS parameter, contains lines of words and their synonyms. An optional CaseSensitive parameter determines if case is considered. An asterisk at the end of a synonym indicates a prefix match in to_tsquery(). The thesaurus dictionary replaces non-preferred terms with preferred terms, optionally preserving the original terms. It supports phrases and requires a configuration file where phrases and their replacements are delimited by a colon.