Home Explore Blog CI



postgresql

40th chunk of `doc/src/sgml/textsearch.sgml`
a5bf65cba821afc6f02ecc6174dd77cfb61c4df1572e414e0000000100000fa0
 example, there is a built-in definition equivalent to

<programlisting>
CREATE TEXT SEARCH DICTIONARY english_stem (
    TEMPLATE = snowball,
    Language = english,
    StopWords = english
);
</programlisting>

    The stopword file format is the same as already explained.
   </para>

   <para>
    A <application>Snowball</application> dictionary recognizes everything, whether
    or not it is able to simplify the word, so it should be placed
    at the end of the dictionary list. It is useless to have it
    before any other dictionary because a token will never pass through it to
    the next dictionary.
   </para>

  </sect2>

 </sect1>

 <sect1 id="textsearch-configuration">
  <title>Configuration Example</title>

   <para>
    A text search configuration specifies all options necessary to transform a
    document into a <type>tsvector</type>: the parser to use to break text
    into tokens, and the dictionaries to use to transform each token into a
    lexeme.  Every call of
    <function>to_tsvector</function> or <function>to_tsquery</function>
    needs a text search configuration to perform its processing.
    The configuration parameter
    <xref linkend="guc-default-text-search-config"/>
    specifies the name of the default configuration, which is the
    one used by text search functions if an explicit configuration
    parameter is omitted.
    It can be set in <filename>postgresql.conf</filename>, or set for an
    individual session using the <command>SET</command> command.
   </para>

   <para>
    Several predefined text search configurations are available, and
    you can create custom configurations easily.  To facilitate management
    of text search objects, a set of <acronym>SQL</acronym> commands
    is available, and there are several <application>psql</application> commands that display information
    about text search objects (<xref linkend="textsearch-psql"/>).
   </para>

   <para>
    As an example we will create a configuration
    <literal>pg</literal>, starting by duplicating the built-in
    <literal>english</literal> configuration:

<programlisting>
CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english );
</programlisting>
   </para>

   <para>
    We will use a PostgreSQL-specific synonym list
    and store it in <filename>$SHAREDIR/tsearch_data/pg_dict.syn</filename>.
    The file contents look like:

<programlisting>
postgres    pg
pgsql       pg
postgresql  pg
</programlisting>

    We define the synonym dictionary like this:

<programlisting>
CREATE TEXT SEARCH DICTIONARY pg_dict (
    TEMPLATE = synonym,
    SYNONYMS = pg_dict
);
</programlisting>

    Next we register the <productname>Ispell</productname> dictionary
    <literal>english_ispell</literal>, which has its own configuration files:

<programlisting>
CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);
</programlisting>

    Now we can set up the mappings for words in configuration
    <literal>pg</literal>:

<programlisting>
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                      word, hword, hword_part
    WITH pg_dict, english_ispell, english_stem;
</programlisting>

    We choose not to index or search some token types that the built-in
    configuration does handle:

<programlisting>
ALTER TEXT SEARCH CONFIGURATION pg
    DROP MAPPING FOR email, url, url_path, sfloat, float;
</programlisting>
   </para>

   <para>
    Now we can test our configuration:

<programlisting>
SELECT * FROM ts_debug('public.pg', '
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software.
');
</programlisting>
   </para>

   <para>
    The next step is to set the session to use the new configuration, which was
    created in the <literal>public</literal> schema:

Title: Text Search Configuration Example
Summary
This section describes how to configure text search in PostgreSQL. It explains that a text search configuration specifies how to transform a document into a `tsvector`, including the parser and dictionaries. It covers setting the default configuration, creating custom configurations, and provides an example of creating a configuration named `pg` by duplicating the built-in `english` configuration. The example demonstrates creating a synonym dictionary, registering an Ispell dictionary, setting up mappings for words, dropping mappings for certain token types, and testing the new configuration using `ts_debug`.