entry. This
would be useful, for example, if the document collection contained
documents in different languages. Again,
queries that are meant to use the index must be phrased to match, e.g.,
<literal>WHERE to_tsvector(config_name, body) @@ 'a & b'</literal>.
</para>
<para>
Indexes can even concatenate columns:
<programlisting>
CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' || body));
</programlisting>
</para>
<para>
Another approach is to create a separate <type>tsvector</type> column
to hold the output of <function>to_tsvector</function>. To keep this
column automatically up to date with its source data, use a stored
generated column. This example is a
concatenation of <literal>title</literal> and <literal>body</literal>,
using <function>coalesce</function> to ensure that one field will still be
indexed when the other is <literal>NULL</literal>:
<programlisting>
ALTER TABLE pgweb
ADD COLUMN textsearchable_index_col tsvector
GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
</programlisting>
Then we create a <acronym>GIN</acronym> index to speed up the search:
<programlisting>
CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);
</programlisting>
Now we are ready to perform a fast full text search:
<programlisting>
SELECT title
FROM pgweb
WHERE textsearchable_index_col @@ to_tsquery('create & table')
ORDER BY last_mod_date DESC
LIMIT 10;
</programlisting>
</para>
<para>
One advantage of the separate-column approach over an expression index
is that it is not necessary to explicitly specify the text search
configuration in queries in order to make use of the index. As shown
in the example above, the query can depend on
<varname>default_text_search_config</varname>. Another advantage is that
searches will be faster, since it will not be necessary to redo the
<function>to_tsvector</function> calls to verify index matches. (This is more
important when using a GiST index than a GIN index; see <xref
linkend="textsearch-indexes"/>.) The expression-index approach is
simpler to set up, however, and it requires less disk space since the
<type>tsvector</type> representation is not stored explicitly.
</para>
</sect2>
</sect1>
<sect1 id="textsearch-controls">
<title>Controlling Text Search</title>
<para>
To implement full text searching there must be a function to create a
<type>tsvector</type> from a document and a <type>tsquery</type> from a
user query. Also, we need to return results in a useful order, so we need
a function that compares documents with respect to their relevance to
the query. It's also important to be able to display the results nicely.
<productname>PostgreSQL</productname> provides support for all of these
functions.
</para>
<sect2 id="textsearch-parsing-documents">
<title>Parsing Documents</title>
<para>
<productname>PostgreSQL</productname> provides the
function <function>to_tsvector</function> for converting a document to
the <type>tsvector</type> data type.
</para>
<indexterm>
<primary>to_tsvector</primary>
</indexterm>
<synopsis>
to_tsvector(<optional> <replaceable class="parameter">config</replaceable> <type>regconfig</type>, </optional> <replaceable class="parameter">document</replaceable> <type>text</type>) returns <type>tsvector</type>
</synopsis>
<para>
<function>to_tsvector</function> parses a textual document into tokens,
reduces the tokens to lexemes, and returns a <type>tsvector</type> which
lists the lexemes together with their positions in the document.
The document is processed according to the specified or default
text search configuration.
Here is a simple example:
<screen>
SELECT to_tsvector('english', 'a fat cat sat on a mat - it ate