<type>tsvector</type> is provided for storing preprocessed
documents, along with a type <type>tsquery</type> for representing processed
queries (<xref linkend="datatype-textsearch"/>). There are many
functions and operators available for these data types
(<xref linkend="functions-textsearch"/>), the most important of which is
the match operator <literal>@@</literal>, which we introduce in
<xref linkend="textsearch-matching"/>. Full text searches can be accelerated
using indexes (<xref linkend="textsearch-indexes"/>).
</para>
<sect2 id="textsearch-document">
<title>What Is a Document?</title>
<indexterm zone="textsearch-document">
<primary>document</primary>
<secondary>text search</secondary>
</indexterm>
<para>
A <firstterm>document</firstterm> is the unit of searching in a full text search
system; for example, a magazine article or email message. The text search
engine must be able to parse documents and store associations of lexemes
(key words) with their parent document. Later, these associations are
used to search for documents that contain query words.
</para>
<para>
For searches within <productname>PostgreSQL</productname>,
a document is normally a textual field within a row of a database table,
or possibly a combination (concatenation) of such fields, perhaps stored
in several tables or obtained dynamically. In other words, a document can
be constructed from different parts for indexing and it might not be
stored anywhere as a whole. For example:
<programlisting>
SELECT title || ' ' || author || ' ' || abstract || ' ' || body AS document
FROM messages
WHERE mid = 12;
SELECT m.title || ' ' || m.author || ' ' || m.abstract || ' ' || d.body AS document
FROM messages m, docs d
WHERE m.mid = d.did AND m.mid = 12;
</programlisting>
</para>
<note>
<para>
Actually, in these example queries, <function>coalesce</function>
should be used to prevent a single <literal>NULL</literal> attribute from
causing a <literal>NULL</literal> result for the whole document.
</para>
</note>
<para>
Another possibility is to store the documents as simple text files in the
file system. In this case, the database can be used to store the full text
index and to execute searches, and some unique identifier can be used to
retrieve the document from the file system. However, retrieving files
from outside the database requires superuser permissions or special
function support, so this is usually less convenient than keeping all
the data inside <productname>PostgreSQL</productname>. Also, keeping
everything inside the database allows easy access
to document metadata to assist in indexing and display.
</para>
<para>
For text search purposes, each document must be reduced to the
preprocessed <type>tsvector</type> format. Searching and ranking
are performed entirely on the <type>tsvector</type> representation
of a document — the original text need only be retrieved
when the document has been selected for display to a user.
We therefore often speak of the <type>tsvector</type> as being the
document, but of course it is only a compact representation of
the full document.
</para>
</sect2>
<sect2 id="textsearch-matching">
<title>Basic Text Matching</title>
<para>
Full text searching in <productname>PostgreSQL</productname> is based on
the match operator <literal>@@</literal>, which returns
<literal>true</literal> if a <type>tsvector</type>
(document) matches a <type>tsquery</type> (query).
It doesn't matter which data type is written first:
<programlisting>
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector @@ 'cat & rat'::tsquery;
?column?
----------
t
SELECT 'fat & cow'::tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::tsvector;
?column?
----------
f
</programlisting>
</para>