Home Explore Blog CI



postgresql

51th chunk of `doc/src/sgml/textsearch.sgml`
ea968d7b5aa33d1d75e76b08a859734de60ad1261c286c480000000100000c1d
 Signed integer
 numhword        | Hyphenated word, letters and digits
 numword         | Word, letters and digits
 protocol        | Protocol head
 sfloat          | Scientific notation
 tag             | XML tag
 uint            | Unsigned integer
 url             | URL
 url_path        | URL path
 version         | Version number
 word            | Word, all letters
(23 rows)
</screen>
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
   <term><literal>\dFt<optional>+</optional> <optional>PATTERN</optional></literal></term>
    <listitem>
     <para>
      List text search templates (add <literal>+</literal> for more detail).
<screen>
=&gt; \dFt
                           List of text search templates
   Schema   |   Name    |                        Description
------------+-----------+-----------------------------------------------------------
 pg_catalog | ispell    | ispell dictionary
 pg_catalog | simple    | simple dictionary: just lower case and check for stopword
 pg_catalog | snowball  | snowball stemmer
 pg_catalog | synonym   | synonym dictionary: replace word by its synonym
 pg_catalog | thesaurus | thesaurus dictionary: phrase by phrase substitution
</screen>
     </para>
    </listitem>
   </varlistentry>
  </variablelist>

 </sect1>

 <sect1 id="textsearch-limitations">
  <title>Limitations</title>

  <para>
   The current limitations of <productname>PostgreSQL</productname>'s
   text search features are:
   <itemizedlist  spacing="compact" mark="bullet">
    <listitem>
     <para>The length of each lexeme must be less than 2 kilobytes</para>
    </listitem>
    <listitem>
     <para>The length of a <type>tsvector</type> (lexemes + positions) must be
     less than 1 megabyte</para>
    </listitem>
    <listitem>
     <!-- TODO: number of lexemes in what?  This is unclear -->
     <para>The number of lexemes must be less than
     2<superscript>64</superscript></para>
    </listitem>
    <listitem>
     <para>Position values in <type>tsvector</type> must be greater than 0 and
     no more than 16,383</para>
    </listitem>
    <listitem>
     <para>The match distance in a <literal>&lt;<replaceable>N</replaceable>&gt;</literal>
     (FOLLOWED BY) <type>tsquery</type> operator cannot be more than
     16,384</para>
    </listitem>
    <listitem>
     <para>No more than 256 positions per lexeme</para>
    </listitem>
    <listitem>
     <para>The number of nodes (lexemes + operators) in a <type>tsquery</type>
     must be less than 32,768</para>
    </listitem>
   </itemizedlist>
  </para>

  <para>
   For comparison, the <productname>PostgreSQL</productname> 8.1 documentation
   contained 10,441 unique words, a total of 335,420 words, and the most
   frequent word <quote>postgresql</quote> was mentioned 6,127 times in 655
   documents.
  </para>

   <!-- TODO we need to put a date on these numbers? -->
  <para>
   Another example &mdash; the <productname>PostgreSQL</productname> mailing
   list archives contained 910,989 unique words with 57,491,343 lexemes in
   461,020 messages.
  </para>

 </sect1>

</chapter>

Title: Listing Text Search Templates and Text Search Limitations
Summary
The text lists text search templates such as 'ispell', 'simple', 'snowball', 'synonym' and 'thesaurus'. It also describes the limitations of PostgreSQL's text search features, including restrictions on lexeme length, tsvector size, number of lexemes, position values, match distance, positions per lexeme, and nodes in a tsquery. It provides statistics from PostgreSQL documentation and mailing list archives as examples.