Basic Text Matching in PostgreSQL

<type>tsvector</type> representation of a document — the original text need only be retrieved when the document has been selected for display to a user. We therefore often speak of the <type>tsvector</type> as being the document, but of course it is only a compact representation of the full document. </para> </sect2> <sect2 id="textsearch-matching"> <title>Basic Text Matching</title> <para> Full text searching in <productname>PostgreSQL</productname> is based on the match operator <literal>@@</literal>, which returns <literal>true</literal> if a <type>tsvector</type> (document) matches a <type>tsquery</type> (query). It doesn't matter which data type is written first: <programlisting> SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector @@ 'cat & rat'::tsquery; ?column? ---------- t SELECT 'fat & cow'::tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::tsvector; ?column? ---------- f </programlisting> </para> <para> As the above example suggests, a <type>tsquery</type> is not just raw text, any more than a <type>tsvector</type> is. A <type>tsquery</type> contains search terms, which must be already-normalized lexemes, and may combine multiple terms using AND, OR, NOT, and FOLLOWED BY operators. (For syntax details see <xref linkend="datatype-tsquery"/>.) There are functions <function>to_tsquery</function>, <function>plainto_tsquery</function>, and <function>phraseto_tsquery</function> that are helpful in converting user-written text into a proper <type>tsquery</type>, primarily by normalizing words appearing in the text. Similarly, <function>to_tsvector</function> is used to parse and normalize a document string. So in practice a text search match would look more like this: <programlisting> SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); ?column? ---------- t </programlisting> Observe that this match would not succeed if written as <programlisting> SELECT 'fat cats ate fat rats'::tsvector @@ to_tsquery('fat & rat'); ?column? ---------- f </programlisting> since here no normalization of the word <literal>rats</literal> will occur. The elements of a <type>tsvector</type> are lexemes, which are assumed already normalized, so <literal>rats</literal> does not match <literal>rat</literal>. </para> <para> The <literal>@@</literal> operator also supports <type>text</type> input, allowing explicit conversion of a text string to <type>tsvector</type> or <type>tsquery</type> to be skipped in simple cases. The variants available are: <programlisting> tsvector @@ tsquery tsquery @@ tsvector text @@ tsquery text @@ text </programlisting> </para> <para> The first two of these we saw already. The form <type>text</type> <literal>@@</literal> <type>tsquery</type> is equivalent to <literal>to_tsvector(x) @@ y</literal>. The form <type>text</type> <literal>@@</literal> <type>text</type> is equivalent to <literal>to_tsvector(x) @@ plainto_tsquery(y)</literal>. </para> <para> Within a <type>tsquery</type>, the <literal>&</literal> (AND) operator specifies that both its arguments must appear in the document to have a match. Similarly, the <literal>|</literal> (OR) operator specifies that at least one of its arguments must appear, while the <literal>!</literal> (NOT) operator specifies that its argument must <emphasis>not</emphasis> appear in order to have a match. For example, the query <literal>fat & ! rat</literal> matches documents that contain <literal>fat</literal> but not <literal>rat</literal>. </para> <para> Searching for phrases is possible with the help of the <literal><-></literal> (FOLLOWED BY) <type>tsquery</type> operator, which matches only if its arguments have matches that are adjacent and in the given order.

PostgreSQL's full-text search relies on the @@ operator, which checks if a tsvector (document) matches a tsquery (query). A tsquery contains normalized search terms combined with AND, OR, NOT, and FOLLOWED BY operators. Functions like to_tsquery, plainto_tsquery, phraseto_tsquery, and to_tsvector are used to convert text into tsquery and tsvector formats. The @@ operator also supports text input, implicitly converting it to tsvector or tsquery. The & (AND), | (OR), and ! (NOT) operators are used to combine search terms. The <-> (FOLLOWED BY) operator searches for phrases with adjacent and ordered matches.