Home Explore Blog CI



postgresql

91th chunk of `doc/src/sgml/func.sgml`
2acde976c630d2c1dda240429fe42f43c0b029b45b50d0dd0000000100000fa6
 following newline (or the end of the RE).  This
    permits paragraphing and commenting a complex RE.
    There are three exceptions to that basic rule:

    <itemizedlist>
     <listitem>
      <para>
       a white-space character or <literal>#</literal> preceded by <literal>\</literal> is
       retained
      </para>
     </listitem>
     <listitem>
      <para>
       white space or <literal>#</literal> within a bracket expression is retained
      </para>
     </listitem>
     <listitem>
      <para>
       white space and comments cannot appear within multi-character symbols,
       such as <literal>(?:</literal>
      </para>
     </listitem>
    </itemizedlist>

    For this purpose, white-space characters are blank, tab, newline, and
    any character that belongs to the <replaceable>space</replaceable> character class.
   </para>

   <para>
    Finally, in an ARE, outside bracket expressions, the sequence
    <literal>(?#</literal><replaceable>ttt</replaceable><literal>)</literal>
    (where <replaceable>ttt</replaceable> is any text not containing a <literal>)</literal>)
    is a comment, completely ignored.
    Again, this is not allowed between the characters of
    multi-character symbols, like <literal>(?:</literal>.
    Such comments are more a historical artifact than a useful facility,
    and their use is deprecated; use the expanded syntax instead.
   </para>

   <para>
    <emphasis>None</emphasis> of these metasyntax extensions is available if
    an initial <literal>***=</literal> director
    has specified that the user's input be treated as a literal string
    rather than as an RE.
   </para>
   </sect3>

   <sect3 id="posix-matching-rules">
    <title>Regular Expression Matching Rules</title>

   <para>
    In the event that an RE could match more than one substring of a given
    string, the RE matches the one starting earliest in the string.
    If the RE could match more than one substring starting at that point,
    either the longest possible match or the shortest possible match will
    be taken, depending on whether the RE is <firstterm>greedy</firstterm> or
    <firstterm>non-greedy</firstterm>.
   </para>

   <para>
    Whether an RE is greedy or not is determined by the following rules:
    <itemizedlist>
     <listitem>
      <para>
       Most atoms, and all constraints, have no greediness attribute (because
       they cannot match variable amounts of text anyway).
      </para>
     </listitem>
     <listitem>
      <para>
       Adding parentheses around an RE does not change its greediness.
      </para>
     </listitem>
     <listitem>
      <para>
       A quantified atom with a fixed-repetition quantifier
       (<literal>{</literal><replaceable>m</replaceable><literal>}</literal>
       or
       <literal>{</literal><replaceable>m</replaceable><literal>}?</literal>)
       has the same greediness (possibly none) as the atom itself.
      </para>
     </listitem>
     <listitem>
      <para>
       A quantified atom with other normal quantifiers (including
       <literal>{</literal><replaceable>m</replaceable><literal>,</literal><replaceable>n</replaceable><literal>}</literal>
       with <replaceable>m</replaceable> equal to <replaceable>n</replaceable>)
       is greedy (prefers longest match).
      </para>
     </listitem>
     <listitem>
      <para>
       A quantified atom with a non-greedy quantifier (including
       <literal>{</literal><replaceable>m</replaceable><literal>,</literal><replaceable>n</replaceable><literal>}?</literal>
       with <replaceable>m</replaceable> equal to <replaceable>n</replaceable>)
       is non-greedy (prefers shortest match).
      </para>
     </listitem>
     <listitem>
      <para>
       A branch &mdash; that is, an RE that has no top-level
       <literal>|</literal> operator &mdash; has the same greediness as the first
       quantified atom in it that has a greediness attribute.
      </para>
     </listitem>
     <listitem>
      <para>

Title: ARE Comments, Literal Strings, and Regular Expression Matching Rules
Summary
This section describes comments in AREs using the form `(?#ttt)`, which are ignored but deprecated in favor of the expanded syntax. It notes that metasyntax extensions are unavailable if the input is treated as a literal string using the `***=` director. The section then explains the rules for regular expression matching, stating that the earliest match in the string is preferred. When multiple matches start at the same point, the longest (greedy) or shortest (non-greedy) match is chosen, based on the greediness of the RE, which is determined by factors such as quantifiers and the structure of the expression.