non-greedy subexpressions,
the total match length is either as long as possible or as short as
possible, according to the attribute assigned to the whole RE. The
attributes assigned to the subexpressions only affect how much of that
match they are allowed to <quote>eat</quote> relative to each other.
</para>
<para>
The quantifiers <literal>{1,1}</literal> and <literal>{1,1}?</literal>
can be used to force greediness or non-greediness, respectively,
on a subexpression or a whole RE.
This is useful when you need the whole RE to have a greediness attribute
different from what's deduced from its elements. As an example,
suppose that we are trying to separate a string containing some digits
into the digits and the parts before and after them. We might try to
do that like this:
<screen>
SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
<lineannotation>Result: </lineannotation><computeroutput>{abc0123,4,xyz}</computeroutput>
</screen>
That didn't work: the first <literal>.*</literal> is greedy so
it <quote>eats</quote> as much as it can, leaving the <literal>\d+</literal> to
match at the last possible place, the last digit. We might try to fix
that by making it non-greedy:
<screen>
SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
<lineannotation>Result: </lineannotation><computeroutput>{abc,0,""}</computeroutput>
</screen>
That didn't work either, because now the RE as a whole is non-greedy
and so it ends the overall match as soon as possible. We can get what
we want by forcing the RE as a whole to be greedy:
<screen>
SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
<lineannotation>Result: </lineannotation><computeroutput>{abc,01234,xyz}</computeroutput>
</screen>
Controlling the RE's overall greediness separately from its components'
greediness allows great flexibility in handling variable-length patterns.
</para>
<para>
When deciding what is a longer or shorter match,
match lengths are measured in characters, not collating elements.
An empty string is considered longer than no match at all.
For example:
<literal>bb*</literal>
matches the three middle characters of <literal>abbbc</literal>;
<literal>(week|wee)(night|knights)</literal>
matches all ten characters of <literal>weeknights</literal>;
when <literal>(.*).*</literal>
is matched against <literal>abc</literal> the parenthesized subexpression
matches all three characters; and when
<literal>(a*)*</literal> is matched against <literal>bc</literal>
both the whole RE and the parenthesized
subexpression match an empty string.
</para>
<para>
If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
ordinary character outside a bracket expression, it is effectively
transformed into a bracket expression containing both cases,
e.g., <literal>x</literal> becomes <literal>[xX]</literal>.
When it appears inside a bracket expression, all case counterparts
of it are added to the bracket expression, e.g.,
<literal>[x]</literal> becomes <literal>[xX]</literal>
and <literal>[^x]</literal> becomes <literal>[^xX]</literal>.
</para>
<para>
If newline-sensitive matching is specified, <literal>.</literal>
and bracket expressions using <literal>^</literal>
will never match the newline character
(so that matches will not cross lines unless the RE
explicitly includes a newline)
and <literal>^</literal> and <literal>$</literal>
will match the empty string after and before a newline
respectively, in addition to matching at beginning and end of string
respectively.
But the ARE escapes <literal>\A</literal> and <literal>\Z</literal>
continue to match beginning or end of string <emphasis>only</emphasis>.