Home Explore Blog CI



postgresql

89th chunk of `doc/src/sgml/func.sgml`
3d3ea60018ece5eb934f361e264954a290d1e3bca95472590000000100000fa4
 <title>Regular Expression Back References</title>

    <tgroup cols="2">
     <thead>
      <row>
       <entry>Escape</entry>
       <entry>Description</entry>
      </row>
     </thead>

      <tbody>
       <row>
       <entry> <literal>\</literal><replaceable>m</replaceable> </entry>
       <entry> (where <replaceable>m</replaceable> is a nonzero digit)
       a back reference to the <replaceable>m</replaceable>'th subexpression </entry>
       </row>

       <row>
       <entry> <literal>\</literal><replaceable>mnn</replaceable> </entry>
       <entry> (where <replaceable>m</replaceable> is a nonzero digit, and
       <replaceable>nn</replaceable> is some more digits, and the decimal value
       <replaceable>mnn</replaceable> is not greater than the number of closing capturing
       parentheses seen so far)
       a back reference to the <replaceable>mnn</replaceable>'th subexpression </entry>
       </row>
      </tbody>
     </tgroup>
    </table>

   <note>
    <para>
     There is an inherent ambiguity between octal character-entry
     escapes and back references, which is resolved by the following heuristics,
     as hinted at above.
     A leading zero always indicates an octal escape.
     A single non-zero digit, not followed by another digit,
     is always taken as a back reference.
     A multi-digit sequence not starting with a zero is taken as a back
     reference if it comes after a suitable subexpression
     (i.e., the number is in the legal range for a back reference),
     and otherwise is taken as octal.
    </para>
   </note>
   </sect3>

   <sect3 id="posix-metasyntax">
    <title>Regular Expression Metasyntax</title>

   <para>
    In addition to the main syntax described above, there are some special
    forms and miscellaneous syntactic facilities available.
   </para>

   <para>
    An RE can begin with one of two special <firstterm>director</firstterm> prefixes.
    If an RE begins with <literal>***:</literal>,
    the rest of the RE is taken as an ARE.  (This normally has no effect in
    <productname>PostgreSQL</productname>, since REs are assumed to be AREs;
    but it does have an effect if ERE or BRE mode had been specified by
    the <replaceable>flags</replaceable> parameter to a regex function.)
    If an RE begins with <literal>***=</literal>,
    the rest of the RE is taken to be a literal string,
    with all characters considered ordinary characters.
   </para>

   <para>
    An ARE can begin with <firstterm>embedded options</firstterm>:
    a sequence <literal>(?</literal><replaceable>xyz</replaceable><literal>)</literal>
    (where <replaceable>xyz</replaceable> is one or more alphabetic characters)
    specifies options affecting the rest of the RE.
    These options override any previously determined options &mdash;
    in particular, they can override the case-sensitivity behavior implied by
    a regex operator, or the <replaceable>flags</replaceable> parameter to a regex
    function.
    The available option letters are
    shown in <xref linkend="posix-embedded-options-table"/>.
    Note that these same option letters are used in the <replaceable>flags</replaceable>
    parameters of regex functions.
   </para>

   <table id="posix-embedded-options-table">
    <title>ARE Embedded-Option Letters</title>

    <tgroup cols="2">
     <thead>
      <row>
       <entry>Option</entry>
       <entry>Description</entry>
      </row>
     </thead>

      <tbody>
       <row>
       <entry> <literal>b</literal> </entry>
       <entry> rest of RE is a BRE </entry>
       </row>

       <row>
       <entry> <literal>c</literal> </entry>
       <entry> case-sensitive matching (overrides operator type) </entry>
       </row>

       <row>
       <entry> <literal>e</literal> </entry>
       <entry> rest of RE is an ERE </entry>
       </row>

       <row>
       <entry> <literal>i</literal> </entry>
       <entry> case-insensitive matching (see
       <xref linkend="posix-matching-rules"/>)

Title: Regular Expression Back References and Metasyntax
Summary
This section details regular expression back references (`\m` and `\mnn`) and explains how the system resolves ambiguities between octal character-entry escapes and back references. It then discusses regular expression metasyntax, including director prefixes (`***:` for ARE, `***=` for literal string) and embedded options within AREs using the format `(?xyz)`. It lists available option letters such as `b` (BRE), `c` (case-sensitive), `e` (ERE), and `i` (case-insensitive).