Regular Expression Back References and Metasyntax

<title>Regular Expression Back References</title> <tgroup cols="2"> <thead> <row> <entry>Escape</entry> <entry>Description</entry> </row> </thead> <tbody> <row> <entry> <literal>\</literal><replaceable>m</replaceable> </entry> <entry> (where <replaceable>m</replaceable> is a nonzero digit) a back reference to the <replaceable>m</replaceable>'th subexpression </entry> </row> <row> <entry> <literal>\</literal><replaceable>mnn</replaceable> </entry> <entry> (where <replaceable>m</replaceable> is a nonzero digit, and <replaceable>nn</replaceable> is some more digits, and the decimal value <replaceable>mnn</replaceable> is not greater than the number of closing capturing parentheses seen so far) a back reference to the <replaceable>mnn</replaceable>'th subexpression </entry> </row> </tbody> </tgroup> </table> <note> <para> There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by the following heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal. </para> </note> </sect3> <sect3 id="posix-metasyntax"> <title>Regular Expression Metasyntax</title> <para> In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available. </para> <para> An RE can begin with one of two special <firstterm>director</firstterm> prefixes. If an RE begins with <literal>***:</literal>, the rest of the RE is taken as an ARE. (This normally has no effect in <productname>PostgreSQL</productname>, since REs are assumed to be AREs; but it does have an effect if ERE or BRE mode had been specified by the <replaceable>flags</replaceable> parameter to a regex function.) If an RE begins with <literal>***=</literal>, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters. </para> <para> An ARE can begin with <firstterm>embedded options</firstterm>: a sequence <literal>(?</literal><replaceable>xyz</replaceable><literal>)</literal> (where <replaceable>xyz</replaceable> is one or more alphabetic characters) specifies options affecting the rest of the RE. These options override any previously determined options — in particular, they can override the case-sensitivity behavior implied by a regex operator, or the <replaceable>flags</replaceable> parameter to a regex function. The available option letters are shown in <xref linkend="posix-embedded-options-table"/>. Note that these same option letters are used in the <replaceable>flags</replaceable> parameters of regex functions. </para> <table id="posix-embedded-options-table"> <title>ARE Embedded-Option Letters</title> <tgroup cols="2"> <thead> <row> <entry>Option</entry> <entry>Description</entry> </row> </thead> <tbody> <row> <entry> <literal>b</literal> </entry> <entry> rest of RE is a BRE </entry> </row> <row> <entry> <literal>c</literal> </entry> <entry> case-sensitive matching (overrides operator type) </entry> </row> <row> <entry> <literal>e</literal> </entry> <entry> rest of RE is an ERE </entry> </row> <row> <entry> <literal>i</literal> </entry> <entry> case-insensitive matching (see <xref linkend="posix-matching-rules"/>)

This section details regular expression back references (`\m` and `\mnn`) and explains how the system resolves ambiguities between octal character-entry escapes and back references. It then discusses regular expression metasyntax, including director prefixes (`***:` for ARE, `***=` for literal string) and embedded options within AREs using the format `(?xyz)`. It lists available option letters such as `b` (BRE), `c` (case-sensitive), `e` (ERE), and `i` (case-insensitive).