example, to weight title differently from body — it is necessary
to write a custom trigger. Here is an example using
<application>PL/pgSQL</application> as the trigger language:
<programlisting>
CREATE FUNCTION messages_trigger() RETURNS trigger AS $$
begin
new.tsv :=
setweight(to_tsvector('pg_catalog.english', coalesce(new.title,'')), 'A') ||
setweight(to_tsvector('pg_catalog.english', coalesce(new.body,'')), 'D');
return new;
end
$$ LANGUAGE plpgsql;
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON messages FOR EACH ROW EXECUTE FUNCTION messages_trigger();
</programlisting>
</para>
<para>
Keep in mind that it is important to specify the configuration name
explicitly when creating <type>tsvector</type> values inside triggers,
so that the column's contents will not be affected by changes to
<varname>default_text_search_config</varname>. Failure to do this is likely to
lead to problems such as search results changing after a dump and restore.
</para>
</sect2>
<sect2 id="textsearch-statistics">
<title>Gathering Document Statistics</title>
<indexterm>
<primary>ts_stat</primary>
</indexterm>
<para>
The function <function>ts_stat</function> is useful for checking your
configuration and for finding stop-word candidates.
</para>
<synopsis>
ts_stat(<replaceable class="parameter">sqlquery</replaceable> <type>text</type>, <optional> <replaceable class="parameter">weights</replaceable> <type>text</type>, </optional>
OUT <replaceable class="parameter">word</replaceable> <type>text</type>, OUT <replaceable class="parameter">ndoc</replaceable> <type>integer</type>,
OUT <replaceable class="parameter">nentry</replaceable> <type>integer</type>) returns <type>setof record</type>
</synopsis>
<para>
<replaceable>sqlquery</replaceable> is a text value containing an SQL
query which must return a single <type>tsvector</type> column.
<function>ts_stat</function> executes the query and returns statistics about
each distinct lexeme (word) contained in the <type>tsvector</type>
data. The columns returned are
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
<replaceable>word</replaceable> <type>text</type> — the value of a lexeme
</para>
</listitem>
<listitem>
<para>
<replaceable>ndoc</replaceable> <type>integer</type> — number of documents
(<type>tsvector</type>s) the word occurred in
</para>
</listitem>
<listitem>
<para>
<replaceable>nentry</replaceable> <type>integer</type> — total number of
occurrences of the word
</para>
</listitem>
</itemizedlist>
If <replaceable>weights</replaceable> is supplied, only occurrences
having one of those weights are counted.
</para>
<para>
For example, to find the ten most frequent words in a document collection:
<programlisting>
SELECT * FROM ts_stat('SELECT vector FROM apod')
ORDER BY nentry DESC, ndoc DESC, word
LIMIT 10;
</programlisting>
The same, but counting only word occurrences with weight <literal>A</literal>
or <literal>B</literal>:
<programlisting>
SELECT * FROM ts_stat('SELECT vector FROM apod', 'ab')
ORDER BY nentry DESC, ndoc DESC, word
LIMIT 10;
</programlisting>
</para>
</sect2>
</sect1>
<sect1 id="textsearch-parsers">
<title>Parsers</title>
<para>
Text search parsers are responsible for splitting raw document text
into <firstterm>tokens</firstterm> and identifying each token's type, where
the set of possible types is defined by the parser itself.
Note that a parser does not modify the text at all — it simply
identifies plausible word boundaries. Because of this limited scope,
there is less need for application-specific custom parsers than there is
for custom dictionaries. At present <productname>PostgreSQL</productname>
provides just