Regular Expression Details in PostgreSQL

indicating which subexpression is of interest: the result is the substring matching that subexpression. Subexpressions are numbered in the order of their leading parentheses. When <replaceable>subexpr</replaceable> is omitted or zero, the result is the whole match regardless of parenthesized subexpressions. </para> <para> Some examples: <programlisting> regexp_substr('number of your street, town zip, FR', '[^,]+', 1, 2) <lineannotation> town zip</lineannotation> regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2) <lineannotation>FGH</lineannotation> </programlisting> </para>  <sect3 id="posix-syntax-details"> <title>Regular Expression Details</title> <para> <productname>PostgreSQL</productname>'s regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual. </para> <para> Regular expressions (<acronym>RE</acronym>s), as defined in <acronym>POSIX</acronym> 1003.2, come in two forms: <firstterm>extended</firstterm> <acronym>RE</acronym>s or <acronym>ERE</acronym>s (roughly those of <command>egrep</command>), and <firstterm>basic</firstterm> <acronym>RE</acronym>s or <acronym>BRE</acronym>s (roughly those of <command>ed</command>). <productname>PostgreSQL</productname> supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. <acronym>RE</acronym>s using these non-POSIX extensions are called <firstterm>advanced</firstterm> <acronym>RE</acronym>s or <acronym>ARE</acronym>s in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ. </para> <note> <para> <productname>PostgreSQL</productname> always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an <firstterm>embedded option</firstterm> to the RE pattern, as described in <xref linkend="posix-metasyntax"/>. This can be useful for compatibility with applications that expect exactly the <acronym>POSIX</acronym> 1003.2 rules. </para> </note> <para> A regular expression is defined as one or more <firstterm>branches</firstterm>, separated by <literal>|</literal>. It matches anything that matches one of the branches. </para> <para> A branch is zero or more <firstterm>quantified atoms</firstterm> or <firstterm>constraints</firstterm>, concatenated. It matches a match for the first, followed by a match for the second, etc.; an empty branch matches the empty string. </para> <para> A quantified atom is an <firstterm>atom</firstterm> possibly followed by a single <firstterm>quantifier</firstterm>. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An <firstterm>atom</firstterm> can be any of the possibilities shown in <xref linkend="posix-atoms-table"/>. The possible quantifiers and their meanings are shown in <xref linkend="posix-quantifiers-table"/>. </para> <para> A <firstterm>constraint</firstterm> matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in <xref linkend="posix-constraints-table"/>; some more

This section details regular expressions in PostgreSQL, implemented using software by Henry Spencer. It covers POSIX regular expression forms: Extended (ERE) and Basic (BRE), and PostgreSQL's extensions called Advanced (ARE). AREs are supersets of EREs, while BREs have incompatibilities. PostgreSQL defaults to ARE rules, but ERE or BRE can be specified via embedded options. It defines regular expressions as branches separated by '|', and branches as quantified atoms or constraints. Atoms can be quantified, and constraints match empty strings under specific conditions.