|
asciiword | Word, all ASCII | it | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | ate | {english_stem} | english_stem | {ate}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | rats | {english_stem} | english_stem | {rat}
</screen>
</para>
<para>
For a more extensive demonstration, we
first create a <literal>public.english</literal> configuration and
Ispell dictionary for the English language:
</para>
<programlisting>
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);
ALTER TEXT SEARCH CONFIGURATION public.english
ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
</programlisting>
<screen>
SELECT * FROM ts_debug('public.english', 'The Brightest supernovaes');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------------+-------------------------------+----------------+-------------
asciiword | Word, all ASCII | The | {english_ispell,english_stem} | english_ispell | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | Brightest | {english_ispell,english_stem} | english_ispell | {bright}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | english_stem | {supernova}
</screen>
<para>
In this example, the word <literal>Brightest</literal> was recognized by the
parser as an <literal>ASCII word</literal> (alias <literal>asciiword</literal>).
For this token type the dictionary list is
<literal>english_ispell</literal> and
<literal>english_stem</literal>. The word was recognized by
<literal>english_ispell</literal>, which reduced it to the noun
<literal>bright</literal>. The word <literal>supernovaes</literal> is
unknown to the <literal>english_ispell</literal> dictionary so it
was passed to the next dictionary, and, fortunately, was recognized (in
fact, <literal>english_stem</literal> is a Snowball dictionary which
recognizes everything; that is why it was placed at the end of the
dictionary list).
</para>
<para>
The word <literal>The</literal> was recognized by the
<literal>english_ispell</literal> dictionary as a stop word (<xref
linkend="textsearch-stopwords"/>) and will not be indexed.
The spaces are discarded too, since the configuration provides no
dictionaries at all for them.
</para>
<para>
You can reduce the width of the output by explicitly specifying which columns
you want to see:
<screen>
SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english', 'The Brightest supernovaes');
alias | token | dictionary | lexemes
-----------+-------------+----------------+-------------
asciiword | The | english_ispell | {}
blank | | |
asciiword | Brightest | english_ispell | {bright}
blank | | |
asciiword | supernovaes | english_stem | {supernova}
</screen>
</para>
</sect2>
<sect2 id="textsearch-parser-testing">
<title>Parser Testing</title>
<para>
The following functions allow direct testing of a text search parser.
</para>
<indexterm>
<primary>ts_parse</primary>
</indexterm>
<synopsis>
ts_parse(<replaceable