Home Explore Blog CI



postgresql

9th chunk of `doc/src/sgml/textsearch.sgml`
a9e815eadb081e5afda896d9401135e6e9d112fe81aacd220000000100000fa3
 entry.  This
    would be useful, for example, if the document collection contained
    documents in different languages.  Again,
    queries that are meant to use the index must be phrased to match, e.g.,
    <literal>WHERE to_tsvector(config_name, body) @@ 'a &amp; b'</literal>.
   </para>

   <para>
    Indexes can even concatenate columns:

<programlisting>
CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' || body));
</programlisting>
   </para>

   <para>
    Another approach is to create a separate <type>tsvector</type> column
    to hold the output of <function>to_tsvector</function>.  To keep this
    column automatically up to date with its source data, use a stored
    generated column.  This example is a
    concatenation of <literal>title</literal> and <literal>body</literal>,
    using <function>coalesce</function> to ensure that one field will still be
    indexed when the other is <literal>NULL</literal>:

<programlisting>
ALTER TABLE pgweb
    ADD COLUMN textsearchable_index_col tsvector
               GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
</programlisting>

    Then we create a <acronym>GIN</acronym> index to speed up the search:

<programlisting>
CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);
</programlisting>

    Now we are ready to perform a fast full text search:

<programlisting>
SELECT title
FROM pgweb
WHERE textsearchable_index_col @@ to_tsquery('create &amp; table')
ORDER BY last_mod_date DESC
LIMIT 10;
</programlisting>
   </para>

   <para>
    One advantage of the separate-column approach over an expression index
    is that it is not necessary to explicitly specify the text search
    configuration in queries in order to make use of the index.  As shown
    in the example above, the query can depend on
    <varname>default_text_search_config</varname>.  Another advantage is that
    searches will be faster, since it will not be necessary to redo the
    <function>to_tsvector</function> calls to verify index matches.  (This is more
    important when using a GiST index than a GIN index; see <xref
    linkend="textsearch-indexes"/>.)  The expression-index approach is
    simpler to set up, however, and it requires less disk space since the
    <type>tsvector</type> representation is not stored explicitly.
   </para>

  </sect2>

 </sect1>

 <sect1 id="textsearch-controls">
  <title>Controlling Text Search</title>

  <para>
   To implement full text searching there must be a function to create a
   <type>tsvector</type> from a document and a <type>tsquery</type> from a
   user query. Also, we need to return results in a useful order, so we need
   a function that compares documents with respect to their relevance to
   the query. It's also important to be able to display the results nicely.
   <productname>PostgreSQL</productname> provides support for all of these
   functions.
  </para>

  <sect2 id="textsearch-parsing-documents">
   <title>Parsing Documents</title>

   <para>
    <productname>PostgreSQL</productname> provides the
    function <function>to_tsvector</function> for converting a document to
    the <type>tsvector</type> data type.
   </para>

   <indexterm>
    <primary>to_tsvector</primary>
   </indexterm>

<synopsis>
to_tsvector(<optional> <replaceable class="parameter">config</replaceable> <type>regconfig</type>, </optional> <replaceable class="parameter">document</replaceable> <type>text</type>) returns <type>tsvector</type>
</synopsis>

   <para>
    <function>to_tsvector</function> parses a textual document into tokens,
    reduces the tokens to lexemes, and returns a <type>tsvector</type> which
    lists the lexemes together with their positions in the document.
    The document is processed according to the specified or default
    text search configuration.
    Here is a simple example:

<screen>
SELECT to_tsvector('english', 'a fat  cat sat on a mat - it ate

Title: Creating and Using a Separate tsvector Column for Text Search
Summary
A separate tsvector column, updated using a stored generated column, holds the output of to_tsvector for efficient text searching. This allows queries to rely on default_text_search_config and avoids redoing to_tsvector calls. While easier to set up, expression indexes require less disk space. PostgreSQL offers functions for creating tsvector and tsquery data types, ranking relevance, and displaying results.