Simple and Synonym Dictionaries Configuration and Usage

ts_lexize('public.simple_dict', 'YeS'); ts_lexize ----------- SELECT ts_lexize('public.simple_dict', 'The'); ts_lexize ----------- {} </screen> </para> <para> With the default setting of <literal>Accept</literal> = <literal>true</literal>, it is only useful to place a <literal>simple</literal> dictionary at the end of a list of dictionaries, since it will never pass on any token to a following dictionary. Conversely, <literal>Accept</literal> = <literal>false</literal> is only useful when there is at least one following dictionary. </para> <caution> <para> Most types of dictionaries rely on configuration files, such as files of stop words. These files <emphasis>must</emphasis> be stored in UTF-8 encoding. They will be translated to the actual database encoding, if that is different, when they are read into the server. </para> </caution> <caution> <para> Normally, a database session will read a dictionary configuration file only once, when it is first used within the session. If you modify a configuration file and want to force existing sessions to pick up the new contents, issue an <command>ALTER TEXT SEARCH DICTIONARY</command> command on the dictionary. This can be a <quote>dummy</quote> update that doesn't actually change any parameter values. </para> </caution> </sect2> <sect2 id="textsearch-synonym-dictionary"> <title>Synonym Dictionary</title> <para> This dictionary template is used to create dictionaries that replace a word with a synonym. Phrases are not supported (use the thesaurus template (<xref linkend="textsearch-thesaurus"/>) for that). A synonym dictionary can be used to overcome linguistic problems, for example, to prevent an English stemmer dictionary from reducing the word <quote>Paris</quote> to <quote>pari</quote>. It is enough to have a <literal>Paris paris</literal> line in the synonym dictionary and put it before the <literal>english_stem</literal> dictionary. For example: <screen> SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+----------------+--------------+--------- asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | {pari} CREATE TEXT SEARCH DICTIONARY my_synonym ( TEMPLATE = synonym, SYNONYMS = my_synonyms ); ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR asciiword WITH my_synonym, english_stem; SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} </screen> </para> <para> The only parameter required by the <literal>synonym</literal> template is <literal>SYNONYMS</literal>, which is the base name of its configuration file — <literal>my_synonyms</literal> in the above example. The file's full name will be <filename>$SHAREDIR/tsearch_data/my_synonyms.syn</filename> (where <literal>$SHAREDIR</literal> means the <productname>PostgreSQL</productname> installation's shared-data directory). The file format is just one line per word to be substituted, with the word followed by its synonym, separated by white space. Blank lines and trailing spaces are ignored. </para> <para> The <literal>synonym</literal> template also has an optional parameter <literal>CaseSensitive</literal>, which defaults to <literal>false</literal>. When <literal>CaseSensitive</literal> is <literal>false</literal>, words in the synonym file are folded to lower case, as are input tokens. When it is <literal>true</literal>, words and tokens are not folded to lower case, but are

Simple dictionaries are most useful at the end of a dictionary list with default settings. If Accept = false, they are best when followed by other dictionaries. Configuration files like stop word lists must be in UTF-8 encoding. To update changes to dictionary configuration files in existing sessions, use ALTER TEXT SEARCH DICTIONARY. The synonym dictionary replaces words with synonyms, using a configuration file specified by the SYNONYMS parameter, where each line contains a word and its synonym. The CaseSensitive parameter controls whether case is considered during synonym replacement.