Home Explore Blog CI



postgresql

28th chunk of `doc/src/sgml/textsearch.sgml`
3bec3c7ea0c9cb5f800bdd12dadebab429521204d27f87be0000000100000fa3
 <entry>Hyphenated word part, all letters</entry>
      <entry><literal>l&oacute;gico</literal> or <literal>matem&aacute;tica</literal>
       in the context <literal>l&oacute;gico-matem&aacute;tica</literal></entry>
     </row>
     <row>
      <entry><literal>hword_numpart</literal></entry>
      <entry>Hyphenated word part, letters and digits</entry>
      <entry><literal>beta1</literal> in the context
       <literal>postgresql-beta1</literal></entry>
     </row>
     <row>
      <entry><literal>email</literal></entry>
      <entry>Email address</entry>
      <entry><literal>foo@example.com</literal></entry>
     </row>
     <row>
      <entry><literal>protocol</literal></entry>
      <entry>Protocol head</entry>
      <entry><literal>http://</literal></entry>
     </row>
     <row>
      <entry><literal>url</literal></entry>
      <entry>URL</entry>
      <entry><literal>example.com/stuff/index.html</literal></entry>
     </row>
     <row>
      <entry><literal>host</literal></entry>
      <entry>Host</entry>
      <entry><literal>example.com</literal></entry>
     </row>
     <row>
      <entry><literal>url_path</literal></entry>
      <entry>URL path</entry>
      <entry><literal>/stuff/index.html</literal>, in the context of a URL</entry>
     </row>
     <row>
      <entry><literal>file</literal></entry>
      <entry>File or path name</entry>
      <entry><literal>/usr/local/foo.txt</literal>, if not within a URL</entry>
     </row>
     <row>
      <entry><literal>sfloat</literal></entry>
      <entry>Scientific notation</entry>
      <entry><literal>-1.234e56</literal></entry>
     </row>
     <row>
      <entry><literal>float</literal></entry>
      <entry>Decimal notation</entry>
      <entry><literal>-1.234</literal></entry>
     </row>
     <row>
      <entry><literal>int</literal></entry>
      <entry>Signed integer</entry>
      <entry><literal>-1234</literal></entry>
     </row>
     <row>
      <entry><literal>uint</literal></entry>
      <entry>Unsigned integer</entry>
      <entry><literal>1234</literal></entry>
     </row>
     <row>
      <entry><literal>version</literal></entry>
      <entry>Version number</entry>
      <entry><literal>8.3.0</literal></entry>
     </row>
     <row>
      <entry><literal>tag</literal></entry>
      <entry>XML tag</entry>
      <entry><literal>&lt;a href="dictionaries.html"&gt;</literal></entry>
     </row>
     <row>
      <entry><literal>entity</literal></entry>
      <entry>XML entity</entry>
      <entry><literal>&amp;amp;</literal></entry>
     </row>
     <row>
      <entry><literal>blank</literal></entry>
      <entry>Space symbols</entry>
      <entry>(any whitespace or punctuation not otherwise recognized)</entry>
     </row>
    </tbody>
   </tgroup>
  </table>

  <note>
   <para>
    The parser's notion of a <quote>letter</quote> is determined by the database's
    locale setting, specifically <varname>lc_ctype</varname>.  Words containing
    only the basic ASCII letters are reported as a separate token type,
    since it is sometimes useful to distinguish them.  In most European
    languages, token types <literal>word</literal> and <literal>asciiword</literal>
    should be treated alike.
   </para>

   <para>
    <literal>email</literal> does not support all valid email characters as
    defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc5322">RFC 5322</ulink>.
    Specifically, the only non-alphanumeric characters supported for
    email user names are period, dash, and underscore.
   </para>

   <para>
    <literal>tag</literal> does not support all valid tag names as defined by
    <ulink url="https://www.w3.org/TR/xml/">W3C Recommendation, XML</ulink>.
    Specifically, the only tag names supported are those starting with an
    ASCII letter, underscore, or colon, and containing only letters, digits,
    hyphens, underscores, periods, and colons. <literal>tag</literal> also
    includes XML comments starting with <literal>&lt;!--</literal> and

Title: Default Parser Token Types in PostgreSQL: Continued
Summary
This section continues the list of token types recognized by PostgreSQL's default parser, including hword_part, hword_numpart, email, protocol, url, host, url_path, file, sfloat, float, int, uint, version, tag, entity, and blank. It also provides notes about the definition of 'letter' being locale-dependent, limitations in email and tag support compared to RFC 5322 and W3C XML recommendations, respectively.