ARE Comments, Literal Strings, and Regular Expression Matching Rules

following newline (or the end of the RE). This permits paragraphing and commenting a complex RE. There are three exceptions to that basic rule: <itemizedlist> <listitem> <para> a white-space character or <literal>#</literal> preceded by <literal>\</literal> is retained </para> </listitem> <listitem> <para> white space or <literal>#</literal> within a bracket expression is retained </para> </listitem> <listitem> <para> white space and comments cannot appear within multi-character symbols, such as <literal>(?:</literal> </para> </listitem> </itemizedlist> For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the <replaceable>space</replaceable> character class. </para> <para> Finally, in an ARE, outside bracket expressions, the sequence <literal>(?#</literal><replaceable>ttt</replaceable><literal>)</literal> (where <replaceable>ttt</replaceable> is any text not containing a <literal>)</literal>) is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols, like <literal>(?:</literal>. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead. </para> <para> <emphasis>None</emphasis> of these metasyntax extensions is available if an initial <literal>***=</literal> director has specified that the user's input be treated as a literal string rather than as an RE. </para> </sect3> <sect3 id="posix-matching-rules"> <title>Regular Expression Matching Rules</title> <para> In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, either the longest possible match or the shortest possible match will be taken, depending on whether the RE is <firstterm>greedy</firstterm> or <firstterm>non-greedy</firstterm>. </para> <para> Whether an RE is greedy or not is determined by the following rules: <itemizedlist> <listitem> <para> Most atoms, and all constraints, have no greediness attribute (because they cannot match variable amounts of text anyway). </para> </listitem> <listitem> <para> Adding parentheses around an RE does not change its greediness. </para> </listitem> <listitem> <para> A quantified atom with a fixed-repetition quantifier (<literal>{</literal><replaceable>m</replaceable><literal>}</literal> or <literal>{</literal><replaceable>m</replaceable><literal>}?</literal>) has the same greediness (possibly none) as the atom itself. </para> </listitem> <listitem> <para> A quantified atom with other normal quantifiers (including <literal>{</literal><replaceable>m</replaceable><literal>,</literal><replaceable>n</replaceable><literal>}</literal> with <replaceable>m</replaceable> equal to <replaceable>n</replaceable>) is greedy (prefers longest match). </para> </listitem> <listitem> <para> A quantified atom with a non-greedy quantifier (including <literal>{</literal><replaceable>m</replaceable><literal>,</literal><replaceable>n</replaceable><literal>}?</literal> with <replaceable>m</replaceable> equal to <replaceable>n</replaceable>) is non-greedy (prefers shortest match). </para> </listitem> <listitem> <para> A branch — that is, an RE that has no top-level <literal>|</literal> operator — has the same greediness as the first quantified atom in it that has a greediness attribute. </para> </listitem> <listitem> <para>

This section describes comments in AREs using the form `(?#ttt)`, which are ignored but deprecated in favor of the expanded syntax. It notes that metasyntax extensions are unavailable if the input is treated as a literal string using the `***=` director. The section then explains the rules for regular expression matching, stating that the earliest match in the string is preferred. When multiple matches start at the same point, the longest (greedy) or shortest (non-greedy) match is chosen, based on the greediness of the RE, which is determined by factors such as quantifiers and the structure of the expression.