Parser Testing with `ts_parse` and `ts_token

not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them. </para> <para> You can reduce the width of the output by explicitly specifying which columns you want to see: <screen> SELECT alias, token, dictionary, lexemes FROM ts_debug('public.english', 'The Brightest supernovaes'); alias | token | dictionary | lexemes -----------+-------------+----------------+------------- asciiword | The | english_ispell | {} blank | | | asciiword | Brightest | english_ispell | {bright} blank | | | asciiword | supernovaes | english_stem | {supernova} </screen> </para> </sect2> <sect2 id="textsearch-parser-testing"> <title>Parser Testing</title> <para> The following functions allow direct testing of a text search parser. </para> <indexterm> <primary>ts_parse</primary> </indexterm> <synopsis> ts_parse(<replaceable class="parameter">parser_name</replaceable> <type>text</type>, <replaceable class="parameter">document</replaceable> <type>text</type>, OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>, OUT <replaceable class="parameter">token</replaceable> <type>text</type>) returns <type>setof record</type> ts_parse(<replaceable class="parameter">parser_oid</replaceable> <type>oid</type>, <replaceable class="parameter">document</replaceable> <type>text</type>, OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>, OUT <replaceable class="parameter">token</replaceable> <type>text</type>) returns <type>setof record</type> </synopsis> <para> <function>ts_parse</function> parses the given <replaceable>document</replaceable> and returns a series of records, one for each token produced by parsing. Each record includes a <varname>tokid</varname> showing the assigned token type and a <varname>token</varname> which is the text of the token. For example: <screen> SELECT * FROM ts_parse('default', '123 - a number'); tokid | token -------+-------- 22 | 123 12 | 12 | - 1 | a 12 | 1 | number </screen> </para> <indexterm> <primary>ts_token_type</primary> </indexterm> <synopsis> ts_token_type(<replaceable class="parameter">parser_name</replaceable> <type>text</type>, OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>, OUT <replaceable class="parameter">alias</replaceable> <type>text</type>, OUT <replaceable class="parameter">description</replaceable> <type>text</type>) returns <type>setof record</type> ts_token_type(<replaceable class="parameter">parser_oid</replaceable> <type>oid</type>, OUT <replaceable class="parameter">tokid</replaceable> <type>integer</type>, OUT <replaceable class="parameter">alias</replaceable> <type>text</type>, OUT <replaceable class="parameter">description</replaceable> <type>text</type>) returns <type>setof record</type> </synopsis> <para> <function>ts_token_type</function> returns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integer <varname>tokid</varname> that the parser uses to label a token of that type, the <varname>alias</varname> that names the token type in configuration commands, and a short <varname>description</varname>. For example: <screen> SELECT * FROM ts_token_type('default'); tokid | alias | description -------+-----------------+------------------------------------------ 1 | asciiword | Word, all ASCII 2 | word | Word, all letters 3 | numword | Word, letters and digits 4 | email | Email address 5 | url | URL 6 | host | Host 7 | sfloat | Scientific notation 8 | version | Version number 9 | hword_numpart | Hyphenated

This section introduces functions for directly testing text search parsers. It explains how to use `ts_parse` to parse a document and return a series of records representing the tokens produced, including their token ID and text. It also introduces `ts_token_type`, which returns a table describing the types of tokens a parser can recognize, including the token ID, alias, and description for each type. Examples are provided to illustrate the usage of both functions.