Thesaurus Configuration and Examples

<sect3 id="textsearch-thesaurus-config"> <title>Thesaurus Configuration</title> <para> To define a new thesaurus dictionary, use the <literal>thesaurus</literal> template. For example: <programlisting> CREATE TEXT SEARCH DICTIONARY thesaurus_simple ( TEMPLATE = thesaurus, DictFile = mythesaurus, Dictionary = pg_catalog.english_stem ); </programlisting> Here: <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> <literal>thesaurus_simple</literal> is the new dictionary's name </para> </listitem> <listitem> <para> <literal>mythesaurus</literal> is the base name of the thesaurus configuration file. (Its full name will be <filename>$SHAREDIR/tsearch_data/mythesaurus.ths</filename>, where <literal>$SHAREDIR</literal> means the installation shared-data directory.) </para> </listitem> <listitem> <para> <literal>pg_catalog.english_stem</literal> is the subdictionary (here, a Snowball English stemmer) to use for thesaurus normalization. Notice that the subdictionary will have its own configuration (for example, stop words), which is not shown here. </para> </listitem> </itemizedlist> Now it is possible to bind the thesaurus dictionary <literal>thesaurus_simple</literal> to the desired token types in a configuration, for example: <programlisting> ALTER TEXT SEARCH CONFIGURATION russian ALTER MAPPING FOR asciiword, asciihword, hword_asciipart WITH thesaurus_simple; </programlisting> </para> </sect3> <sect3 id="textsearch-thesaurus-examples"> <title>Thesaurus Example</title> <para> Consider a simple astronomical thesaurus <literal>thesaurus_astro</literal>, which contains some astronomical word combinations: <programlisting> supernovae stars : sn crab nebulae : crab </programlisting> Below we create a dictionary and bind some token types to an astronomical thesaurus and English stemmer: <programlisting> CREATE TEXT SEARCH DICTIONARY thesaurus_astro ( TEMPLATE = thesaurus, DictFile = thesaurus_astro, Dictionary = english_stem ); ALTER TEXT SEARCH CONFIGURATION russian ALTER MAPPING FOR asciiword, asciihword, hword_asciipart WITH thesaurus_astro, english_stem; </programlisting> Now we can see how it works. <function>ts_lexize</function> is not very useful for testing a thesaurus, because it treats its input as a single token. Instead we can use <function>plainto_tsquery</function> and <function>to_tsvector</function> which will break their input strings into multiple tokens: <screen> SELECT plainto_tsquery('supernova star'); plainto_tsquery ----------------- 'sn' SELECT to_tsvector('supernova star'); to_tsvector ------------- 'sn':1 </screen> In principle, one can use <function>to_tsquery</function> if you quote the argument: <screen> SELECT to_tsquery('''supernova star'''); to_tsquery ------------ 'sn' </screen> Notice that <literal>supernova star</literal> matches <literal>supernovae stars</literal> in <literal>thesaurus_astro</literal> because we specified the <literal>english_stem</literal> stemmer in the thesaurus definition. The stemmer removed the <literal>e</literal> and <literal>s</literal>. </para> <para> To index the original phrase as well as the substitute, just include it in the right-hand part of the definition: <screen> supernovae stars : sn supernovae stars SELECT plainto_tsquery('supernova star'); plainto_tsquery ----------------------------- 'sn' & 'supernova' & 'star' </screen> </para> </sect3> </sect2> <sect2 id="textsearch-ispell-dictionary"> <title><application>Ispell</application> Dictionary</title> <para> The <application>Ispell</application> dictionary template supports <firstterm>morphological dictionaries</firstterm>, which can normalize many

This section explains how to configure and use a thesaurus dictionary in PostgreSQL. It details the use of the `thesaurus` template to define a new dictionary, specifying the configuration file and subdictionary for normalization. It provides an example of binding the thesaurus dictionary to specific token types. The section also presents a practical example with an astronomical thesaurus, demonstrating how to create the dictionary, bind token types, and test its functionality using `plainto_tsquery` and `to_tsvector`. Additionally, it explains how to index the original phrase along with the substitute. Finally, it transitions to the Ispell dictionary.