Creating and Using a Separate tsvector Column for Text Search

entry. This would be useful, for example, if the document collection contained documents in different languages. Again, queries that are meant to use the index must be phrased to match, e.g., <literal>WHERE to_tsvector(config_name, body) @@ 'a & b'</literal>. </para> <para> Indexes can even concatenate columns: <programlisting> CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' || body)); </programlisting> </para> <para> Another approach is to create a separate <type>tsvector</type> column to hold the output of <function>to_tsvector</function>. To keep this column automatically up to date with its source data, use a stored generated column. This example is a concatenation of <literal>title</literal> and <literal>body</literal>, using <function>coalesce</function> to ensure that one field will still be indexed when the other is <literal>NULL</literal>: <programlisting> ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED; </programlisting> Then we create a <acronym>GIN</acronym> index to speed up the search: <programlisting> CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col); </programlisting> Now we are ready to perform a fast full text search: <programlisting> SELECT title FROM pgweb WHERE textsearchable_index_col @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10; </programlisting> </para> <para> One advantage of the separate-column approach over an expression index is that it is not necessary to explicitly specify the text search configuration in queries in order to make use of the index. As shown in the example above, the query can depend on <varname>default_text_search_config</varname>. Another advantage is that searches will be faster, since it will not be necessary to redo the <function>to_tsvector</function> calls to verify index matches. (This is more important when using a GiST index than a GIN index; see <xref linkend="textsearch-indexes"/>.) The expression-index approach is simpler to set up, however, and it requires less disk space since the <type>tsvector</type> representation is not stored explicitly. </para> </sect2> </sect1> <sect1 id="textsearch-controls"> <title>Controlling Text Search</title> <para> To implement full text searching there must be a function to create a <type>tsvector</type> from a document and a <type>tsquery</type> from a user query. Also, we need to return results in a useful order, so we need a function that compares documents with respect to their relevance to the query. It's also important to be able to display the results nicely. <productname>PostgreSQL</productname> provides support for all of these functions. </para> <sect2 id="textsearch-parsing-documents"> <title>Parsing Documents</title> <para> <productname>PostgreSQL</productname> provides the function <function>to_tsvector</function> for converting a document to the <type>tsvector</type> data type. </para> <indexterm> <primary>to_tsvector</primary> </indexterm> <synopsis> to_tsvector(<optional> <replaceable class="parameter">config</replaceable> <type>regconfig</type>, </optional> <replaceable class="parameter">document</replaceable> <type>text</type>) returns <type>tsvector</type> </synopsis> <para> <function>to_tsvector</function> parses a textual document into tokens, reduces the tokens to lexemes, and returns a <type>tsvector</type> which lists the lexemes together with their positions in the document. The document is processed according to the specified or default text search configuration. Here is a simple example: <screen> SELECT to_tsvector('english', 'a fat cat sat on a mat - it ate

A separate tsvector column, updated using a stored generated column, holds the output of to_tsvector for efficient text searching. This allows queries to rely on default_text_search_config and avoids redoing to_tsvector calls. While easier to set up, expression indexes require less disk space. PostgreSQL offers functions for creating tsvector and tsquery data types, ranking relevance, and displaying results.