Home Explore Blog CI



postgresql

11th chunk of `doc/src/sgml/textsearch.sgml`
d5e08b004dc3e3c5e2efa81cca10a47a6c3993a0c102275b0000000100000fa2
 linkend="textsearch-configuration"/>).  It is possible to have
    many different configurations in the same database, and predefined
    configurations are available for various languages. In our example
    we used the default configuration <literal>english</literal> for the
    English language.
   </para>

   <para>
    The function <function>setweight</function> can be used to label the
    entries of a <type>tsvector</type> with a given <firstterm>weight</firstterm>,
    where a weight is one of the letters <literal>A</literal>, <literal>B</literal>,
    <literal>C</literal>, or <literal>D</literal>.
    This is typically used to mark entries coming from
    different parts of a document, such as title versus body.  Later, this
    information can be used for ranking of search results.
   </para>

   <para>
    Because <function>to_tsvector</function>(<literal>NULL</literal>) will
    return <literal>NULL</literal>, it is recommended to use
    <function>coalesce</function> whenever a field might be null.
    Here is the recommended method for creating
    a <type>tsvector</type> from a structured document:

<programlisting>
UPDATE tt SET ti =
    setweight(to_tsvector(coalesce(title,'')), 'A')    ||
    setweight(to_tsvector(coalesce(keyword,'')), 'B')  ||
    setweight(to_tsvector(coalesce(abstract,'')), 'C') ||
    setweight(to_tsvector(coalesce(body,'')), 'D');
</programlisting>

    Here we have used <function>setweight</function> to label the source
    of each lexeme in the finished <type>tsvector</type>, and then merged
    the labeled <type>tsvector</type> values using the <type>tsvector</type>
    concatenation operator <literal>||</literal>.  (<xref
    linkend="textsearch-manipulate-tsvector"/> gives details about these
    operations.)
   </para>

  </sect2>

  <sect2 id="textsearch-parsing-queries">
   <title>Parsing Queries</title>

   <para>
    <productname>PostgreSQL</productname> provides the
    functions <function>to_tsquery</function>,
    <function>plainto_tsquery</function>,
    <function>phraseto_tsquery</function> and
    <function>websearch_to_tsquery</function>
    for converting a query to the <type>tsquery</type> data type.
    <function>to_tsquery</function> offers access to more features
    than either <function>plainto_tsquery</function> or
    <function>phraseto_tsquery</function>, but it is less forgiving about its
    input. <function>websearch_to_tsquery</function> is a simplified version
    of <function>to_tsquery</function> with an alternative syntax, similar
    to the one used by web search engines.
   </para>

   <indexterm>
    <primary>to_tsquery</primary>
   </indexterm>

<synopsis>
to_tsquery(<optional> <replaceable class="parameter">config</replaceable> <type>regconfig</type>, </optional> <replaceable class="parameter">querytext</replaceable> <type>text</type>) returns <type>tsquery</type>
</synopsis>

   <para>
    <function>to_tsquery</function> creates a <type>tsquery</type> value from
    <replaceable>querytext</replaceable>, which must consist of single tokens
    separated by the <type>tsquery</type> operators <literal>&amp;</literal> (AND),
    <literal>|</literal> (OR), <literal>!</literal> (NOT), and
    <literal>&lt;-&gt;</literal> (FOLLOWED BY), possibly grouped
    using parentheses.  In other words, the input to
    <function>to_tsquery</function> must already follow the general rules for
    <type>tsquery</type> input, as described in <xref
    linkend="datatype-tsquery"/>.  The difference is that while basic
    <type>tsquery</type> input takes the tokens at face value,
    <function>to_tsquery</function> normalizes each token into a lexeme using
    the specified or default configuration, and discards any tokens that are
    stop words according to the configuration.  For example:

<screen>
SELECT to_tsquery('english', 'The &amp; Fat &amp; Rats');
  to_tsquery
---------------
 'fat' &amp; 'rat'
</screen>

    As in basic <type>tsquery</type> input, weight(s)

Title: Weighting tsvector Entries and Parsing Queries
Summary
The `setweight` function can label tsvector entries with weights (A, B, C, D) for ranking search results. It's recommended to use `coalesce` to handle potentially NULL fields when creating a tsvector from structured documents. PostgreSQL offers functions like `to_tsquery`, `plainto_tsquery`, `phraseto_tsquery`, and `websearch_to_tsquery` for converting queries to the `tsquery` data type. `to_tsquery` normalizes tokens into lexemes using a specified configuration, discarding stop words, and supports operators like AND, OR, NOT, and FOLLOWED BY.