Regular Expression Lookarounds and Bracket Expressions

<literal>(?<=</literal><replaceable>re</replaceable><literal>)</literal> </entry> <entry> <firstterm>positive lookbehind</firstterm> matches at any point where a substring matching <replaceable>re</replaceable> ends (AREs only) </entry> </row> <row> <entry> <literal>(?<!</literal><replaceable>re</replaceable><literal>)</literal> </entry> <entry> <firstterm>negative lookbehind</firstterm> matches at any point where no substring matching <replaceable>re</replaceable> ends (AREs only) </entry> </row> </tbody> </tgroup> </table> <para> Lookahead and lookbehind constraints cannot contain <firstterm>back references</firstterm> (see <xref linkend="posix-escape-sequences"/>), and all parentheses within them are considered non-capturing. </para> </sect3> <sect3 id="posix-bracket-expressions"> <title>Bracket Expressions</title> <para> A <firstterm>bracket expression</firstterm> is a list of characters enclosed in <literal>[]</literal>. It normally matches any single character from the list (but see below). If the list begins with <literal>^</literal>, it matches any single character <emphasis>not</emphasis> from the rest of the list. If two characters in the list are separated by <literal>-</literal>, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g., <literal>[0-9]</literal> in <acronym>ASCII</acronym> matches any decimal digit. It is illegal for two ranges to share an endpoint, e.g., <literal>a-c-e</literal>. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them. </para> <para> To include a literal <literal>]</literal> in the list, make it the first character (after <literal>^</literal>, if that is used). To include a literal <literal>-</literal>, make it the first or last character, or the second endpoint of a range. To use a literal <literal>-</literal> as the first endpoint of a range, enclose it in <literal>[.</literal> and <literal>.]</literal> to make it a collating element (see below). With the exception of these characters, some combinations using <literal>[</literal> (see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. In particular, <literal>\</literal> is not special when following ERE or BRE rules, though it is special (as introducing an escape) in AREs. </para> <para> Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in <literal>[.</literal> and <literal>.]</literal> stands for the sequence of characters of that collating element. The sequence is treated as a single element of the bracket expression's list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a <literal>ch</literal> collating element, then the RE <literal>[[.ch.]]*c</literal> matches the first five characters of <literal>chchcc</literal>. </para> <note> <para> <productname>PostgreSQL</productname> currently does not support multi-character collating elements. This information describes possible future behavior. </para> </note> <para> Within a bracket expression, a collating element enclosed in <literal>[=</literal> and <literal>=]</literal> is an <firstterm>equivalence class</firstterm>, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the

This section continues to explain regular expression constraints, focusing on positive and negative lookbehind assertions which are available in AREs. It then shifts to bracket expressions, detailing how they match single characters from a specified list or, with '^', characters not in the list. It covers using '-' to define character ranges, and how to include literal ']', '-', and other special characters within a bracket expression. It describes collating elements enclosed in '[.' and '.']', which represent a sequence of characters and equivalence classes enclosed in '[=' and '=]' representing all equivalent characters.