the <filename>$SHAREDIR/tsearch_data</filename> directory
</para>
</listitem>
<listitem>
<para>
load files into PostgreSQL with the following command:
<programlisting>
CREATE TEXT SEARCH DICTIONARY english_hunspell (
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
Stopwords = english);
</programlisting>
</para>
</listitem>
</itemizedlist>
<para>
Here, <literal>DictFile</literal>, <literal>AffFile</literal>, and <literal>StopWords</literal>
specify the base names of the dictionary, affixes, and stop-words files.
The stop-words file has the same format explained above for the
<literal>simple</literal> dictionary type. The format of the other files is
not specified here but is available from the above-mentioned web sites.
</para>
<para>
Ispell dictionaries usually recognize a limited set of words, so they
should be followed by another broader dictionary; for
example, a Snowball dictionary, which recognizes everything.
</para>
<para>
The <filename>.affix</filename> file of <application>Ispell</application> has the following
structure:
<programlisting>
prefixes
flag *A:
. > RE # As in enter > reenter
suffixes
flag T:
E > ST # As in late > latest
[^AEIOU]Y > -Y,IEST # As in dirty > dirtiest
[AEIOU]Y > EST # As in gray > grayest
[^EY] > EST # As in small > smallest
</programlisting>
</para>
<para>
And the <filename>.dict</filename> file has the following structure:
<programlisting>
lapse/ADGRS
lard/DGRS
large/PRTY
lark/MRS
</programlisting>
</para>
<para>
Format of the <filename>.dict</filename> file is:
<programlisting>
basic_form/affix_class_name
</programlisting>
</para>
<para>
In the <filename>.affix</filename> file every affix flag is described in the
following format:
<programlisting>
condition > [-stripping_letters,] adding_affix
</programlisting>
</para>
<para>
Here, condition has a format similar to the format of regular expressions.
It can use groupings <literal>[...]</literal> and <literal>[^...]</literal>.
For example, <literal>[AEIOU]Y</literal> means that the last letter of the word
is <literal>"y"</literal> and the penultimate letter is <literal>"a"</literal>,
<literal>"e"</literal>, <literal>"i"</literal>, <literal>"o"</literal> or <literal>"u"</literal>.
<literal>[^EY]</literal> means that the last letter is neither <literal>"e"</literal>
nor <literal>"y"</literal>.
</para>
<para>
Ispell dictionaries support splitting compound words;
a useful feature.
Notice that the affix file should specify a special flag using the
<literal>compoundwords controlled</literal> statement that marks dictionary
words that can participate in compound formation:
<programlisting>
compoundwords controlled z
</programlisting>
Here are some examples for the Norwegian language:
<programlisting>
SELECT ts_lexize('norwegian_ispell', 'overbuljongterningpakkmesterassistent');
{over,buljong,terning,pakk,mester,assistent}
SELECT ts_lexize('norwegian_ispell', 'sjokoladefabrikk');
{sjokoladefabrikk,sjokolade,fabrikk}
</programlisting>
</para>
<para>
<application>MySpell</application> format is a subset of <application>Hunspell</application>.
The <filename>.affix</filename> file of <application>Hunspell</application> has the following
structure:
<programlisting>
PFX A Y 1
PFX A 0 re .
SFX T N 4
SFX T 0 st e
SFX T y iest [^aeiou]y
SFX T 0 est [aeiou]y
SFX T 0 est [^ey]
</programlisting>
</para>
<para>
The first line of an affix class is the header. Fields of an affix rules are
listed after the header:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
parameter name (PFX or SFX)
</para>
</listitem>