Synonym Dictionary Configuration and Thesaurus Dictionary Introduction

<literal>synonym</literal> template is <literal>SYNONYMS</literal>, which is the base name of its configuration file — <literal>my_synonyms</literal> in the above example. The file's full name will be <filename>$SHAREDIR/tsearch_data/my_synonyms.syn</filename> (where <literal>$SHAREDIR</literal> means the <productname>PostgreSQL</productname> installation's shared-data directory). The file format is just one line per word to be substituted, with the word followed by its synonym, separated by white space. Blank lines and trailing spaces are ignored. </para> <para> The <literal>synonym</literal> template also has an optional parameter <literal>CaseSensitive</literal>, which defaults to <literal>false</literal>. When <literal>CaseSensitive</literal> is <literal>false</literal>, words in the synonym file are folded to lower case, as are input tokens. When it is <literal>true</literal>, words and tokens are not folded to lower case, but are compared as-is. </para> <para> An asterisk (<literal>*</literal>) can be placed at the end of a synonym in the configuration file. This indicates that the synonym is a prefix. The asterisk is ignored when the entry is used in <function>to_tsvector()</function>, but when it is used in <function>to_tsquery()</function>, the result will be a query item with the prefix match marker (see <xref linkend="textsearch-parsing-queries"/>). For example, suppose we have these entries in <filename>$SHAREDIR/tsearch_data/synonym_sample.syn</filename>: <programlisting> postgres pgsql postgresql pgsql postgre pgsql gogle googl indices index* </programlisting> Then we will get these results: <screen> mydb=# CREATE TEXT SEARCH DICTIONARY syn (template=synonym, synonyms='synonym_sample'); mydb=# SELECT ts_lexize('syn', 'indices'); ts_lexize ----------- {index} (1 row) mydb=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple); mydb=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn; mydb=# SELECT to_tsvector('tst', 'indices'); to_tsvector ------------- 'index':1 (1 row) mydb=# SELECT to_tsquery('tst', 'indices'); to_tsquery ------------ 'index':* (1 row) mydb=# SELECT 'indexes are very useful'::tsvector; tsvector --------------------------------- 'are' 'indexes' 'useful' 'very' (1 row) mydb=# SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst', 'indices'); ?column? ---------- t (1 row) </screen> </para> </sect2> <sect2 id="textsearch-thesaurus"> <title>Thesaurus Dictionary</title> <para> A thesaurus dictionary (sometimes abbreviated as <acronym>TZ</acronym>) is a collection of words that includes information about the relationships of words and phrases, i.e., broader terms (<acronym>BT</acronym>), narrower terms (<acronym>NT</acronym>), preferred terms, non-preferred terms, related terms, etc. </para> <para> Basically a thesaurus dictionary replaces all non-preferred terms by one preferred term and, optionally, preserves the original terms for indexing as well. <productname>PostgreSQL</productname>'s current implementation of the thesaurus dictionary is an extension of the synonym dictionary with added <firstterm>phrase</firstterm> support. A thesaurus dictionary requires a configuration file of the following format: <programlisting> # this is a comment sample word(s) : indexed word(s) more sample word(s) : more indexed word(s) ... </programlisting> where the colon (<symbol>:</symbol>) symbol acts as a delimiter between a phrase and its replacement. </para> <para> A thesaurus dictionary uses a <firstterm>subdictionary</firstterm> (which is specified in the dictionary's configuration) to normalize the input text before checking for phrase matches. It is only possible to select one subdictionary. An error is reported if the

The synonym dictionary's configuration file, specified by the SYNONYMS parameter, contains lines of words and their synonyms. An optional CaseSensitive parameter determines if case is considered. An asterisk at the end of a synonym indicates a prefix match in to_tsquery(). The thesaurus dictionary replaces non-preferred terms with preferred terms, optionally preserving the original terms. It supports phrases and requires a configuration file where phrases and their replacements are delimited by a colon.