Ispell Dictionary Configuration

star'''); to_tsquery ------------ 'sn' </screen> Notice that <literal>supernova star</literal> matches <literal>supernovae stars</literal> in <literal>thesaurus_astro</literal> because we specified the <literal>english_stem</literal> stemmer in the thesaurus definition. The stemmer removed the <literal>e</literal> and <literal>s</literal>. </para> <para> To index the original phrase as well as the substitute, just include it in the right-hand part of the definition: <screen> supernovae stars : sn supernovae stars SELECT plainto_tsquery('supernova star'); plainto_tsquery ----------------------------- 'sn' & 'supernova' & 'star' </screen> </para> </sect3> </sect2> <sect2 id="textsearch-ispell-dictionary"> <title><application>Ispell</application> Dictionary</title> <para> The <application>Ispell</application> dictionary template supports <firstterm>morphological dictionaries</firstterm>, which can normalize many different linguistic forms of a word into the same lexeme. For example, an English <application>Ispell</application> dictionary can match all declensions and conjugations of the search term <literal>bank</literal>, e.g., <literal>banking</literal>, <literal>banked</literal>, <literal>banks</literal>, <literal>banks'</literal>, and <literal>bank's</literal>. </para> <para> The standard <productname>PostgreSQL</productname> distribution does not include any <application>Ispell</application> configuration files. Dictionaries for a large number of languages are available from <ulink url="https://www.cs.hmc.edu/~geoff/ispell.html">Ispell</ulink>. Also, some more modern dictionary file formats are supported — <ulink url="https://en.wikipedia.org/wiki/MySpell">MySpell</ulink> (OO < 2.0.1) and <ulink url="https://hunspell.github.io/">Hunspell</ulink> (OO >= 2.0.2). A large list of dictionaries is available on the <ulink url="https://wiki.openoffice.org/wiki/Dictionaries">OpenOffice Wiki</ulink>. </para> <para> To create an <application>Ispell</application> dictionary perform these steps: </para> <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> download dictionary configuration files. <productname>OpenOffice</productname> extension files have the <filename>.oxt</filename> extension. It is necessary to extract <filename>.aff</filename> and <filename>.dic</filename> files, change extensions to <filename>.affix</filename> and <filename>.dict</filename>. For some dictionary files it is also needed to convert characters to the UTF-8 encoding with commands (for example, for a Norwegian language dictionary): <programlisting> iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic </programlisting> </para> </listitem> <listitem> <para> copy files to the <filename>$SHAREDIR/tsearch_data</filename> directory </para> </listitem> <listitem> <para> load files into PostgreSQL with the following command: <programlisting> CREATE TEXT SEARCH DICTIONARY english_hunspell ( TEMPLATE = ispell, DictFile = en_us, AffFile = en_us, Stopwords = english); </programlisting> </para> </listitem> </itemizedlist> <para> Here, <literal>DictFile</literal>, <literal>AffFile</literal>, and <literal>StopWords</literal> specify the base names of the dictionary, affixes, and stop-words files. The stop-words file has the same format explained above for the <literal>simple</literal> dictionary type. The format of the other files is not specified here but is available from the above-mentioned web sites. </para> <para> Ispell dictionaries usually recognize a limited set of words, so they should be followed by another broader dictionary; for example, a Snowball dictionary,

This section explains how to configure and use an Ispell dictionary in PostgreSQL for morphological normalization. It highlights that Ispell dictionaries can normalize various linguistic forms of a word into a single lexeme. It notes that PostgreSQL doesn't include Ispell configuration files by default but provides links to resources for downloading dictionaries. The section outlines the steps to create an Ispell dictionary: downloading and converting dictionary files to UTF-8 encoding, copying them to the `$SHAREDIR/tsearch_data` directory, and loading them into PostgreSQL using the `CREATE TEXT SEARCH DICTIONARY` command. It clarifies the roles of `DictFile`, `AffFile`, and `StopWords` parameters, and suggests using a broader dictionary (like Snowball) after the Ispell dictionary due to its limited word recognition.