Home Explore Blog CI



postgresql

93th chunk of `doc/src/sgml/func.sgml`
0c979da45e9b88b286a7bddf86ff87bc3766c9ceb7344b7a0000000100000fa0
 non-greedy subexpressions,
    the total match length is either as long as possible or as short as
    possible, according to the attribute assigned to the whole RE.  The
    attributes assigned to the subexpressions only affect how much of that
    match they are allowed to <quote>eat</quote> relative to each other.
   </para>

   <para>
    The quantifiers <literal>{1,1}</literal> and <literal>{1,1}?</literal>
    can be used to force greediness or non-greediness, respectively,
    on a subexpression or a whole RE.
    This is useful when you need the whole RE to have a greediness attribute
    different from what's deduced from its elements.  As an example,
    suppose that we are trying to separate a string containing some digits
    into the digits and the parts before and after them.  We might try to
    do that like this:
<screen>
SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
<lineannotation>Result: </lineannotation><computeroutput>{abc0123,4,xyz}</computeroutput>
</screen>
    That didn't work: the first <literal>.*</literal> is greedy so
    it <quote>eats</quote> as much as it can, leaving the <literal>\d+</literal> to
    match at the last possible place, the last digit.  We might try to fix
    that by making it non-greedy:
<screen>
SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
<lineannotation>Result: </lineannotation><computeroutput>{abc,0,""}</computeroutput>
</screen>
    That didn't work either, because now the RE as a whole is non-greedy
    and so it ends the overall match as soon as possible.  We can get what
    we want by forcing the RE as a whole to be greedy:
<screen>
SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
<lineannotation>Result: </lineannotation><computeroutput>{abc,01234,xyz}</computeroutput>
</screen>
    Controlling the RE's overall greediness separately from its components'
    greediness allows great flexibility in handling variable-length patterns.
   </para>

   <para>
    When deciding what is a longer or shorter match,
    match lengths are measured in characters, not collating elements.
    An empty string is considered longer than no match at all.
    For example:
    <literal>bb*</literal>
    matches the three middle characters of <literal>abbbc</literal>;
    <literal>(week|wee)(night|knights)</literal>
    matches all ten characters of <literal>weeknights</literal>;
    when <literal>(.*).*</literal>
    is matched against <literal>abc</literal> the parenthesized subexpression
    matches all three characters; and when
    <literal>(a*)*</literal> is matched against <literal>bc</literal>
    both the whole RE and the parenthesized
    subexpression match an empty string.
   </para>

   <para>
    If case-independent matching is specified,
    the effect is much as if all case distinctions had vanished from the
    alphabet.
    When an alphabetic that exists in multiple cases appears as an
    ordinary character outside a bracket expression, it is effectively
    transformed into a bracket expression containing both cases,
    e.g., <literal>x</literal> becomes <literal>[xX]</literal>.
    When it appears inside a bracket expression, all case counterparts
    of it are added to the bracket expression, e.g.,
    <literal>[x]</literal> becomes <literal>[xX]</literal>
    and <literal>[^x]</literal> becomes <literal>[^xX]</literal>.
   </para>

   <para>
    If newline-sensitive matching is specified, <literal>.</literal>
    and bracket expressions using <literal>^</literal>
    will never match the newline character
    (so that matches will not cross lines unless the RE
    explicitly includes a newline)
    and <literal>^</literal> and <literal>$</literal>
    will match the empty string after and before a newline
    respectively, in addition to matching at beginning and end of string
    respectively.
    But the ARE escapes <literal>\A</literal> and <literal>\Z</literal>
    continue to match beginning or end of string <emphasis>only</emphasis>.

Title: Controlling Greediness and Case Sensitivity in Regular Expressions
Summary
This section provides a detailed explanation of how to control greediness in regular expressions using quantifiers like `{1,1}` and `{1,1}?`. It demonstrates scenarios where adjusting the greediness of the entire RE separately from its components is necessary to achieve the desired matching behavior, especially when dealing with variable-length patterns. The section also defines how match lengths are determined. Furthermore, it covers case-independent and newline-sensitive matching, explaining how these options affect the behavior of character matching, anchor points, and bracket expressions.