<sect3 id="textsearch-thesaurus-config">
<title>Thesaurus Configuration</title>
<para>
To define a new thesaurus dictionary, use the <literal>thesaurus</literal>
template. For example:
<programlisting>
CREATE TEXT SEARCH DICTIONARY thesaurus_simple (
TEMPLATE = thesaurus,
DictFile = mythesaurus,
Dictionary = pg_catalog.english_stem
);
</programlisting>
Here:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
<literal>thesaurus_simple</literal> is the new dictionary's name
</para>
</listitem>
<listitem>
<para>
<literal>mythesaurus</literal> is the base name of the thesaurus
configuration file.
(Its full name will be <filename>$SHAREDIR/tsearch_data/mythesaurus.ths</filename>,
where <literal>$SHAREDIR</literal> means the installation shared-data
directory.)
</para>
</listitem>
<listitem>
<para>
<literal>pg_catalog.english_stem</literal> is the subdictionary (here,
a Snowball English stemmer) to use for thesaurus normalization.
Notice that the subdictionary will have its own
configuration (for example, stop words), which is not shown here.
</para>
</listitem>
</itemizedlist>
Now it is possible to bind the thesaurus dictionary <literal>thesaurus_simple</literal>
to the desired token types in a configuration, for example:
<programlisting>
ALTER TEXT SEARCH CONFIGURATION russian
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
WITH thesaurus_simple;
</programlisting>
</para>
</sect3>
<sect3 id="textsearch-thesaurus-examples">
<title>Thesaurus Example</title>
<para>
Consider a simple astronomical thesaurus <literal>thesaurus_astro</literal>,
which contains some astronomical word combinations:
<programlisting>
supernovae stars : sn
crab nebulae : crab
</programlisting>
Below we create a dictionary and bind some token types to
an astronomical thesaurus and English stemmer:
<programlisting>
CREATE TEXT SEARCH DICTIONARY thesaurus_astro (
TEMPLATE = thesaurus,
DictFile = thesaurus_astro,
Dictionary = english_stem
);
ALTER TEXT SEARCH CONFIGURATION russian
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
WITH thesaurus_astro, english_stem;
</programlisting>
Now we can see how it works.
<function>ts_lexize</function> is not very useful for testing a thesaurus,
because it treats its input as a single token. Instead we can use
<function>plainto_tsquery</function> and <function>to_tsvector</function>
which will break their input strings into multiple tokens:
<screen>
SELECT plainto_tsquery('supernova star');
plainto_tsquery
-----------------
'sn'
SELECT to_tsvector('supernova star');
to_tsvector
-------------
'sn':1
</screen>
In principle, one can use <function>to_tsquery</function> if you quote
the argument:
<screen>
SELECT to_tsquery('''supernova star''');
to_tsquery
------------
'sn'
</screen>
Notice that <literal>supernova star</literal> matches <literal>supernovae
stars</literal> in <literal>thesaurus_astro</literal> because we specified
the <literal>english_stem</literal> stemmer in the thesaurus definition.
The stemmer removed the <literal>e</literal> and <literal>s</literal>.
</para>
<para>
To index the original phrase as well as the substitute, just include it
in the right-hand part of the definition:
<screen>
supernovae stars : sn supernovae stars
SELECT plainto_tsquery('supernova star');
plainto_tsquery
-----------------------------
'sn' & 'supernova' & 'star'
</screen>
</para>
</sect3>
</sect2>
<sect2 id="textsearch-ispell-dictionary">
<title><application>Ispell</application> Dictionary</title>
<para>
The <application>Ispell</application> dictionary template supports
<firstterm>morphological dictionaries</firstterm>, which can normalize many