Home Explore Blog CI



postgresql

45th chunk of `doc/src/sgml/textsearch.sgml`
030d17917a0eda3ac7154b50a812605a79b647117ed335ef0000000100000fa3
 returns <type>setof record</type>
</synopsis>

  <para>
   <function>ts_token_type</function> returns a table which describes each type of
   token the specified parser can recognize.  For each token type, the table
   gives the integer <varname>tokid</varname> that the parser uses to label a
   token of that type, the <varname>alias</varname> that names the token type
   in configuration commands, and a short <varname>description</varname>.  For
   example:

<screen>
SELECT * FROM ts_token_type('default');
 tokid |      alias      |               description
-------+-----------------+------------------------------------------
     1 | asciiword       | Word, all ASCII
     2 | word            | Word, all letters
     3 | numword         | Word, letters and digits
     4 | email           | Email address
     5 | url             | URL
     6 | host            | Host
     7 | sfloat          | Scientific notation
     8 | version         | Version number
     9 | hword_numpart   | Hyphenated word part, letters and digits
    10 | hword_part      | Hyphenated word part, all letters
    11 | hword_asciipart | Hyphenated word part, all ASCII
    12 | blank           | Space symbols
    13 | tag             | XML tag
    14 | protocol        | Protocol head
    15 | numhword        | Hyphenated word, letters and digits
    16 | asciihword      | Hyphenated word, all ASCII
    17 | hword           | Hyphenated word, all letters
    18 | url_path        | URL path
    19 | file            | File or path name
    20 | float           | Decimal notation
    21 | int             | Signed integer
    22 | uint            | Unsigned integer
    23 | entity          | XML entity
</screen>
   </para>

  </sect2>

  <sect2 id="textsearch-dictionary-testing">
   <title>Dictionary Testing</title>

   <para>
    The <function>ts_lexize</function> function facilitates dictionary testing.
   </para>

   <indexterm>
    <primary>ts_lexize</primary>
   </indexterm>

<synopsis>
ts_lexize(<replaceable class="parameter">dict</replaceable> <type>regdictionary</type>, <replaceable class="parameter">token</replaceable> <type>text</type>) returns <type>text[]</type>
</synopsis>

   <para>
    <function>ts_lexize</function> returns an array of lexemes if the input
    <replaceable>token</replaceable> is known to the dictionary,
    or an empty array if the token
    is known to the dictionary but it is a stop word, or
    <literal>NULL</literal> if it is an unknown word.
   </para>

   <para>
    Examples:

<screen>
SELECT ts_lexize('english_stem', 'stars');
 ts_lexize
-----------
 {star}

SELECT ts_lexize('english_stem', 'a');
 ts_lexize
-----------
 {}
</screen>
   </para>

   <note>
    <para>
     The <function>ts_lexize</function> function expects a single
     <emphasis>token</emphasis>, not text. Here is a case
     where this can be confusing:

<screen>
SELECT ts_lexize('thesaurus_astro', 'supernovae stars') is null;
 ?column?
----------
 t
</screen>

     The thesaurus dictionary <literal>thesaurus_astro</literal> does know the
     phrase <literal>supernovae stars</literal>, but <function>ts_lexize</function>
     fails since it does not parse the input text but treats it as a single
     token. Use <function>plainto_tsquery</function> or <function>to_tsvector</function> to
     test thesaurus dictionaries, for example:

<screen>
SELECT plainto_tsquery('supernovae stars');
 plainto_tsquery
-----------------
 'sn'
</screen>
    </para>
   </note>

  </sect2>

 </sect1>

 <sect1 id="textsearch-indexes">
  <title>Preferred Index Types for Text Search</title>

  <indexterm zone="textsearch-indexes">
   <primary>text search</primary>
   <secondary>indexes</secondary>
  </indexterm>

  <para>
   There are two kinds of indexes that can be used to speed up full text
   searches:
   <link linkend="gin"><acronym>GIN</acronym></link> and
   <link linkend="gist"><acronym>GiST</acronym></link>.
   Note that indexes are not mandatory for full text searching, but

Title: Dictionary Testing with `ts_lexize` and Preferred Index Types for Text Search
Summary
This section describes the `ts_token_type` function and the token types it identifies. It then moves on to dictionary testing using the `ts_lexize` function, explaining how it returns an array of lexemes for known words, an empty array for stop words, and NULL for unknown words. It emphasizes that `ts_lexize` expects a single token as input, and suggests using `plainto_tsquery` or `to_tsvector` for testing thesaurus dictionaries. The section ends by introducing GIN and GiST indexes as ways to speed up full text searches.