Home Explore Blog CI



postgresql

84th chunk of `doc/src/sgml/func.sgml`
ac0bed0d15604419df39ffe70baf4234091cb4c98a55b4800000000100000fa2
 is
    treated as a single element of the bracket expression's list.  This
    allows a bracket
    expression containing a multiple-character collating element to
    match more than one character, e.g., if the collating sequence
    includes a <literal>ch</literal> collating element, then the RE
    <literal>[[.ch.]]*c</literal> matches the first five characters of
    <literal>chchcc</literal>.
   </para>

   <note>
    <para>
     <productname>PostgreSQL</productname> currently does not support multi-character collating
     elements. This information describes possible future behavior.
    </para>
   </note>

   <para>
    Within a bracket expression, a collating element enclosed in
    <literal>[=</literal> and <literal>=]</literal> is an <firstterm>equivalence
    class</firstterm>, standing for the sequences of characters of all collating
    elements equivalent to that one, including itself.  (If there are
    no other equivalent collating elements, the treatment is as if the
    enclosing delimiters were <literal>[.</literal> and
    <literal>.]</literal>.)  For example, if <literal>o</literal> and
    <literal>^</literal> are the members of an equivalence class, then
    <literal>[[=o=]]</literal>, <literal>[[=^=]]</literal>, and
    <literal>[o^]</literal> are all synonymous.  An equivalence class
    cannot be an endpoint of a range.
   </para>

   <para>
    Within a bracket expression, the name of a character class
    enclosed in <literal>[:</literal> and <literal>:]</literal> stands
    for the list of all characters belonging to that class.  A character
    class cannot be used as an endpoint of a range.
    The <acronym>POSIX</acronym> standard defines these character class
    names:
    <literal>alnum</literal> (letters and numeric digits),
    <literal>alpha</literal> (letters),
    <literal>blank</literal> (space and tab),
    <literal>cntrl</literal> (control characters),
    <literal>digit</literal> (numeric digits),
    <literal>graph</literal> (printable characters except space),
    <literal>lower</literal> (lower-case letters),
    <literal>print</literal> (printable characters including space),
    <literal>punct</literal> (punctuation),
    <literal>space</literal> (any white space),
    <literal>upper</literal> (upper-case letters),
    and <literal>xdigit</literal> (hexadecimal digits).
    The behavior of these standard character classes is generally
    consistent across platforms for characters in the 7-bit ASCII set.
    Whether a given non-ASCII character is considered to belong to one
    of these classes depends on the <firstterm>collation</firstterm>
    that is used for the regular-expression function or operator
    (see <xref linkend="collation"/>), or by default on the
    database's <envar>LC_CTYPE</envar> locale setting (see
    <xref linkend="locale"/>).  The classification of non-ASCII
    characters can vary across platforms even in similarly-named
    locales.  (But the <literal>C</literal> locale never considers any
    non-ASCII characters to belong to any of these classes.)
    In addition to these standard character
    classes, <productname>PostgreSQL</productname> defines
    the <literal>word</literal> character class, which is the same as
    <literal>alnum</literal> plus the underscore (<literal>_</literal>)
    character, and
    the <literal>ascii</literal> character class, which contains exactly
    the 7-bit ASCII set.
   </para>

   <para>
    There are two special cases of bracket expressions:  the bracket
    expressions <literal>[[:&lt;:]]</literal> and
    <literal>[[:&gt;:]]</literal> are constraints,
    matching empty strings at the beginning
    and end of a word respectively.  A word is defined as a sequence
    of word characters that is neither preceded nor followed by word
    characters.  A word character is any character belonging to the
    <literal>word</literal> character class, that is, any letter, digit,
    or underscore.  This is an extension,

Title: Equivalence Classes, Character Classes, and Word Boundaries in Bracket Expressions
Summary
This section elaborates on bracket expressions in regular expressions, explaining equivalence classes (characters treated as equivalent) represented by '[= =]'. It also details character classes enclosed in '[: :]', which denote a list of characters belonging to a specific class like 'alnum', 'alpha', etc., as defined by POSIX. The behavior of these classes with non-ASCII characters depends on collation settings. PostgreSQL adds 'word' and 'ascii' classes. It also describes '[[:<:]]' and '[[:>:]]' as constraints matching the beginning and end of words respectively, where a word is a sequence of word characters not surrounded by word characters.