Home Explore Blog CI



postgresql

26th chunk of `doc/src/sgml/textsearch.sgml`
8ddda3b6632bfebcf380ba6797b2864e19a52295407015e50000000100000fa0
 example, to weight title differently from body — it is necessary
    to write a custom trigger.  Here is an example using
    <application>PL/pgSQL</application> as the trigger language:

<programlisting>
CREATE FUNCTION messages_trigger() RETURNS trigger AS $$
begin
  new.tsv :=
     setweight(to_tsvector('pg_catalog.english', coalesce(new.title,'')), 'A') ||
     setweight(to_tsvector('pg_catalog.english', coalesce(new.body,'')), 'D');
  return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
    ON messages FOR EACH ROW EXECUTE FUNCTION messages_trigger();
</programlisting>
   </para>

   <para>
    Keep in mind that it is important to specify the configuration name
    explicitly when creating <type>tsvector</type> values inside triggers,
    so that the column's contents will not be affected by changes to
    <varname>default_text_search_config</varname>.  Failure to do this is likely to
    lead to problems such as search results changing after a dump and restore.
   </para>

  </sect2>

  <sect2 id="textsearch-statistics">
   <title>Gathering Document Statistics</title>

   <indexterm>
    <primary>ts_stat</primary>
   </indexterm>

   <para>
    The function <function>ts_stat</function> is useful for checking your
    configuration and for finding stop-word candidates.
   </para>

<synopsis>
ts_stat(<replaceable class="parameter">sqlquery</replaceable> <type>text</type>, <optional> <replaceable class="parameter">weights</replaceable> <type>text</type>, </optional>
        OUT <replaceable class="parameter">word</replaceable> <type>text</type>, OUT <replaceable class="parameter">ndoc</replaceable> <type>integer</type>,
        OUT <replaceable class="parameter">nentry</replaceable> <type>integer</type>) returns <type>setof record</type>
</synopsis>

   <para>
    <replaceable>sqlquery</replaceable> is a text value containing an SQL
    query which must return a single <type>tsvector</type> column.
    <function>ts_stat</function> executes the query and returns statistics about
    each distinct lexeme (word) contained in the <type>tsvector</type>
    data.  The columns returned are

    <itemizedlist  spacing="compact" mark="bullet">
     <listitem>
      <para>
       <replaceable>word</replaceable> <type>text</type> &mdash; the value of a lexeme
      </para>
     </listitem>
     <listitem>
      <para>
       <replaceable>ndoc</replaceable> <type>integer</type> &mdash; number of documents
       (<type>tsvector</type>s) the word occurred in
      </para>
     </listitem>
     <listitem>
      <para>
       <replaceable>nentry</replaceable> <type>integer</type> &mdash; total number of
       occurrences of the word
      </para>
     </listitem>
    </itemizedlist>

    If <replaceable>weights</replaceable> is supplied, only occurrences
    having one of those weights are counted.
   </para>

   <para>
    For example, to find the ten most frequent words in a document collection:

<programlisting>
SELECT * FROM ts_stat('SELECT vector FROM apod')
ORDER BY nentry DESC, ndoc DESC, word
LIMIT 10;
</programlisting>

    The same, but counting only word occurrences with weight <literal>A</literal>
    or <literal>B</literal>:

<programlisting>
SELECT * FROM ts_stat('SELECT vector FROM apod', 'ab')
ORDER BY nentry DESC, ndoc DESC, word
LIMIT 10;
</programlisting>
   </para>

  </sect2>

 </sect1>

 <sect1 id="textsearch-parsers">
  <title>Parsers</title>

  <para>
   Text search parsers are responsible for splitting raw document text
   into <firstterm>tokens</firstterm> and identifying each token's type, where
   the set of possible types is defined by the parser itself.
   Note that a parser does not modify the text at all &mdash; it simply
   identifies plausible word boundaries.  Because of this limited scope,
   there is less need for application-specific custom parsers than there is
   for custom dictionaries.  At present <productname>PostgreSQL</productname>
   provides just

Title: Custom Triggers, Configuration Importance, and Gathering Document Statistics
Summary
This section first demonstrates how to create custom triggers to weight title differently from body, using PL/pgSQL. Then it emphasizes the importance of explicitly specifying the configuration name when creating tsvector values inside triggers to avoid issues with search results changing after a dump and restore. Finally, it introduces the ts_stat function for checking text search configurations and finding stop-word candidates. It describes the function's parameters and the columns returned, providing examples of its usage to find the most frequent words in a document collection, optionally filtering by weight.