dictionary <literal>thesaurus_astro</literal> does know the
phrase <literal>supernovae stars</literal>, but <function>ts_lexize</function>
fails since it does not parse the input text but treats it as a single
token. Use <function>plainto_tsquery</function> or <function>to_tsvector</function> to
test thesaurus dictionaries, for example:
<screen>
SELECT plainto_tsquery('supernovae stars');
plainto_tsquery
-----------------
'sn'
</screen>
</para>
</note>
</sect2>
</sect1>
<sect1 id="textsearch-indexes">
<title>Preferred Index Types for Text Search</title>
<indexterm zone="textsearch-indexes">
<primary>text search</primary>
<secondary>indexes</secondary>
</indexterm>
<para>
There are two kinds of indexes that can be used to speed up full text
searches:
<link linkend="gin"><acronym>GIN</acronym></link> and
<link linkend="gist"><acronym>GiST</acronym></link>.
Note that indexes are not mandatory for full text searching, but in
cases where a column is searched on a regular basis, an index is
usually desirable.
</para>
<para>
To create such an index, do one of:
<variablelist>
<varlistentry>
<term>
<indexterm zone="textsearch-indexes">
<primary>index</primary>
<secondary>GIN</secondary>
<tertiary>text search</tertiary>
</indexterm>
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIN (<replaceable>column</replaceable>);</literal>
</term>
<listitem>
<para>
Creates a GIN (Generalized Inverted Index)-based index.
The <replaceable>column</replaceable> must be of <type>tsvector</type> type.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<indexterm zone="textsearch-indexes">
<primary>index</primary>
<secondary>GiST</secondary>
<tertiary>text search</tertiary>
</indexterm>
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIST (<replaceable>column</replaceable> [ { DEFAULT | tsvector_ops } (siglen = <replaceable>number</replaceable>) ] );</literal>
</term>
<listitem>
<para>
Creates a GiST (Generalized Search Tree)-based index.
The <replaceable>column</replaceable> can be of <type>tsvector</type> or
<type>tsquery</type> type.
Optional integer parameter <literal>siglen</literal> determines
signature length in bytes (see below for details).
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
<para>
GIN indexes are the preferred text search index type. As inverted
indexes, they contain an index entry for each word (lexeme), with a
compressed list of matching locations. Multi-word searches can find
the first match, then use the index to remove rows that are lacking
additional words. GIN indexes store only the words (lexemes) of
<type>tsvector</type> values, and not their weight labels. Thus a table
row recheck is needed when using a query that involves weights.
</para>
<para>
A GiST index is <firstterm>lossy</firstterm>, meaning that the index
might produce false matches, and it is necessary
to check the actual table row to eliminate such false matches.
(<productname>PostgreSQL</productname> does this automatically when needed.)
GiST indexes are lossy because each document is represented in the
index by a fixed-length signature. The signature length in bytes is determined
by the value of the optional integer parameter <literal>siglen</literal>.
The default signature length (when <literal>siglen</literal> is not specified) is
124 bytes, the maximum signature length is 2024 bytes. The signature is generated by hashing
each word into a single bit in an n-bit string, with all these bits OR-ed
together to produce an n-bit document signature. When two words hash to
the same bit position