Equivalence Classes, Character Classes, and Word Boundaries in Bracket Expressions

is treated as a single element of the bracket expression's list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a <literal>ch</literal> collating element, then the RE <literal>[[.ch.]]*c</literal> matches the first five characters of <literal>chchcc</literal>. </para> <note> <para> <productname>PostgreSQL</productname> currently does not support multi-character collating elements. This information describes possible future behavior. </para> </note> <para> Within a bracket expression, a collating element enclosed in <literal>[=</literal> and <literal>=]</literal> is an <firstterm>equivalence class</firstterm>, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were <literal>[.</literal> and <literal>.]</literal>.) For example, if <literal>o</literal> and <literal>^</literal> are the members of an equivalence class, then <literal>[[=o=]]</literal>, <literal>[[=^=]]</literal>, and <literal>[o^]</literal> are all synonymous. An equivalence class cannot be an endpoint of a range. </para> <para> Within a bracket expression, the name of a character class enclosed in <literal>[:</literal> and <literal>:]</literal> stands for the list of all characters belonging to that class. A character class cannot be used as an endpoint of a range. The <acronym>POSIX</acronym> standard defines these character class names: <literal>alnum</literal> (letters and numeric digits), <literal>alpha</literal> (letters), <literal>blank</literal> (space and tab), <literal>cntrl</literal> (control characters), <literal>digit</literal> (numeric digits), <literal>graph</literal> (printable characters except space), <literal>lower</literal> (lower-case letters), <literal>print</literal> (printable characters including space), <literal>punct</literal> (punctuation), <literal>space</literal> (any white space), <literal>upper</literal> (upper-case letters), and <literal>xdigit</literal> (hexadecimal digits). The behavior of these standard character classes is generally consistent across platforms for characters in the 7-bit ASCII set. Whether a given non-ASCII character is considered to belong to one of these classes depends on the <firstterm>collation</firstterm> that is used for the regular-expression function or operator (see <xref linkend="collation"/>), or by default on the database's <envar>LC_CTYPE</envar> locale setting (see <xref linkend="locale"/>). The classification of non-ASCII characters can vary across platforms even in similarly-named locales. (But the <literal>C</literal> locale never considers any non-ASCII characters to belong to any of these classes.) In addition to these standard character classes, <productname>PostgreSQL</productname> defines the <literal>word</literal> character class, which is the same as <literal>alnum</literal> plus the underscore (<literal>_</literal>) character, and the <literal>ascii</literal> character class, which contains exactly the 7-bit ASCII set. </para> <para> There are two special cases of bracket expressions: the bracket expressions <literal>[[:<:]]</literal> and <literal>[[:>:]]</literal> are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is any character belonging to the <literal>word</literal> character class, that is, any letter, digit, or underscore. This is an extension,

This section elaborates on bracket expressions in regular expressions, explaining equivalence classes (characters treated as equivalent) represented by '[= =]'. It also details character classes enclosed in '[: :]', which denote a list of characters belonging to a specific class like 'alnum', 'alpha', etc., as defined by POSIX. The behavior of these classes with non-ASCII characters depends on collation settings. PostgreSQL adds 'word' and 'ascii' classes. It also describes '[[:<:]]' and '[[:>:]]' as constraints matching the beginning and end of words respectively, where a word is a sequence of word characters not surrounded by word characters.