Home Explore Blog CI



postgresql

44th chunk of `doc/src/sgml/textsearch.sgml`
b60007ab88bd5d2348b9254df5e3a99d6df2f9d1d8c723bc0000000100000fa2
 not be indexed.
   The spaces are discarded too, since the configuration provides no
   dictionaries at all for them.
  </para>

  <para>
   You can reduce the width of the output by explicitly specifying which columns
   you want to see:

<screen>
SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english', 'The Brightest supernovaes');
   alias   |    token    |   dictionary   |   lexemes
-----------+-------------+----------------+-------------
 asciiword | The         | english_ispell | {}
 blank     |             |                |
 asciiword | Brightest   | english_ispell | {bright}
 blank     |             |                |
 asciiword | supernovaes | english_stem   | {supernova}
</screen>
  </para>

  </sect2>

  <sect2 id="textsearch-parser-testing">
   <title>Parser Testing</title>

  <para>
   The following functions allow direct testing of a text search parser.
  </para>

  <indexterm>
   <primary>ts_parse</primary>
  </indexterm>

<synopsis>
ts_parse(<replaceable class="parameter">parser_name</replaceable> <type>text</type>, <replaceable class="parameter">document</replaceable> <type>text</type>,
         OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>, OUT <replaceable class="parameter">token</replaceable> <type>text</type>) returns <type>setof record</type>
ts_parse(<replaceable class="parameter">parser_oid</replaceable> <type>oid</type>, <replaceable class="parameter">document</replaceable> <type>text</type>,
         OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>, OUT <replaceable class="parameter">token</replaceable> <type>text</type>) returns <type>setof record</type>
</synopsis>

  <para>
   <function>ts_parse</function> parses the given <replaceable>document</replaceable>
   and returns a series of records, one for each token produced by
   parsing. Each record includes a <varname>tokid</varname> showing the
   assigned token type and a <varname>token</varname> which is the text of the
   token.  For example:

<screen>
SELECT * FROM ts_parse('default', '123 - a number');
 tokid | token
-------+--------
    22 | 123
    12 |
    12 | -
     1 | a
    12 |
     1 | number
</screen>
  </para>

  <indexterm>
   <primary>ts_token_type</primary>
  </indexterm>

<synopsis>
ts_token_type(<replaceable class="parameter">parser_name</replaceable> <type>text</type>, OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>,
              OUT <replaceable class="parameter">alias</replaceable> <type>text</type>, OUT <replaceable class="parameter">description</replaceable> <type>text</type>) returns <type>setof record</type>
ts_token_type(<replaceable class="parameter">parser_oid</replaceable> <type>oid</type>, OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>,
              OUT <replaceable class="parameter">alias</replaceable> <type>text</type>, OUT <replaceable class="parameter">description</replaceable> <type>text</type>) returns <type>setof record</type>
</synopsis>

  <para>
   <function>ts_token_type</function> returns a table which describes each type of
   token the specified parser can recognize.  For each token type, the table
   gives the integer <varname>tokid</varname> that the parser uses to label a
   token of that type, the <varname>alias</varname> that names the token type
   in configuration commands, and a short <varname>description</varname>.  For
   example:

<screen>
SELECT * FROM ts_token_type('default');
 tokid |      alias      |               description
-------+-----------------+------------------------------------------
     1 | asciiword       | Word, all ASCII
     2 | word            | Word, all letters
     3 | numword         | Word, letters and digits
     4 | email           | Email address
     5 | url             | URL
     6 | host            | Host
     7 | sfloat          | Scientific notation
     8 | version         | Version number
     9 | hword_numpart   | Hyphenated

Title: Parser Testing with `ts_parse` and `ts_token_type`
Summary
This section introduces functions for directly testing text search parsers. It explains how to use `ts_parse` to parse a document and return a series of records representing the tokens produced, including their token ID and text. It also introduces `ts_token_type`, which returns a table describing the types of tokens a parser can recognize, including the token ID, alias, and description for each type. Examples are provided to illustrate the usage of both functions.