Home Explore Blog CI



postgresql

33th chunk of `doc/src/sgml/textsearch.sgml`
aa4503f96939c56bd5fdf7747151858a761b002b5dba13890000000100000fa0
 ts_lexize('public.simple_dict', 'YeS');
 ts_lexize
-----------


SELECT ts_lexize('public.simple_dict', 'The');
 ts_lexize
-----------
 {}
</screen>
   </para>

   <para>
    With the default setting of <literal>Accept</literal> = <literal>true</literal>,
    it is only useful to place a <literal>simple</literal> dictionary at the end
    of a list of dictionaries, since it will never pass on any token to
    a following dictionary.  Conversely, <literal>Accept</literal> = <literal>false</literal>
    is only useful when there is at least one following dictionary.
   </para>

   <caution>
    <para>
     Most types of dictionaries rely on configuration files, such as files of
     stop words.  These files <emphasis>must</emphasis> be stored in UTF-8 encoding.
     They will be translated to the actual database encoding, if that is
     different, when they are read into the server.
    </para>
   </caution>

   <caution>
    <para>
     Normally, a database session will read a dictionary configuration file
     only once, when it is first used within the session.  If you modify a
     configuration file and want to force existing sessions to pick up the
     new contents, issue an <command>ALTER TEXT SEARCH DICTIONARY</command> command
     on the dictionary.  This can be a <quote>dummy</quote> update that doesn't
     actually change any parameter values.
    </para>
   </caution>

  </sect2>

  <sect2 id="textsearch-synonym-dictionary">
   <title>Synonym Dictionary</title>

   <para>
    This dictionary template is used to create dictionaries that replace a
    word with a synonym. Phrases are not supported (use the thesaurus
    template (<xref linkend="textsearch-thesaurus"/>) for that).  A synonym
    dictionary can be used to overcome linguistic problems, for example, to
    prevent an English stemmer dictionary from reducing the word <quote>Paris</quote> to
    <quote>pari</quote>.  It is enough to have a <literal>Paris paris</literal> line in the
    synonym dictionary and put it before the <literal>english_stem</literal>
    dictionary.  For example:

<screen>
SELECT * FROM ts_debug('english', 'Paris');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+-------+----------------+--------------+---------
 asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | {pari}

CREATE TEXT SEARCH DICTIONARY my_synonym (
    TEMPLATE = synonym,
    SYNONYMS = my_synonyms
);

ALTER TEXT SEARCH CONFIGURATION english
    ALTER MAPPING FOR asciiword
    WITH my_synonym, english_stem;

SELECT * FROM ts_debug('english', 'Paris');
   alias   |   description   | token |       dictionaries        | dictionary | lexemes
-----------+-----------------+-------+---------------------------+------------+---------
 asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris}
</screen>
   </para>

   <para>
    The only parameter required by the <literal>synonym</literal> template is
    <literal>SYNONYMS</literal>, which is the base name of its configuration file
    &mdash; <literal>my_synonyms</literal> in the above example.
    The file's full name will be
    <filename>$SHAREDIR/tsearch_data/my_synonyms.syn</filename>
    (where <literal>$SHAREDIR</literal> means the
    <productname>PostgreSQL</productname> installation's shared-data directory).
    The file format is just one line
    per word to be substituted, with the word followed by its synonym,
    separated by white space.  Blank lines and trailing spaces are ignored.
   </para>

   <para>
    The <literal>synonym</literal> template also has an optional parameter
    <literal>CaseSensitive</literal>, which defaults to <literal>false</literal>.  When
    <literal>CaseSensitive</literal> is <literal>false</literal>, words in the synonym file
    are folded to lower case, as are input tokens.  When it is
    <literal>true</literal>, words and tokens are not folded to lower case,
    but are

Title: Simple and Synonym Dictionaries Configuration and Usage
Summary
Simple dictionaries are most useful at the end of a dictionary list with default settings. If Accept = false, they are best when followed by other dictionaries. Configuration files like stop word lists must be in UTF-8 encoding. To update changes to dictionary configuration files in existing sessions, use ALTER TEXT SEARCH DICTIONARY. The synonym dictionary replaces words with synonyms, using a configuration file specified by the SYNONYMS parameter, where each line contains a word and its synonym. The CaseSensitive parameter controls whether case is considered during synonym replacement.