Differences Between POSIX and XQuery Regular Expressions (Continued)

<itemizedlist> <listitem> <para> XQuery character class subtraction is not supported. An example of this feature is using the following to match only English consonants: <literal>[a-z-[aeiou]]</literal>. </para> </listitem> <listitem> <para> XQuery character class shorthands <literal>\c</literal>, <literal>\C</literal>, <literal>\i</literal>, and <literal>\I</literal> are not supported. </para> </listitem> <listitem> <para> XQuery character class elements using <literal>\p{UnicodeProperty}</literal> or the inverse <literal>\P{UnicodeProperty}</literal> are not supported. </para> </listitem> <listitem> <para> POSIX interprets character classes such as <literal>\w</literal> (see <xref linkend="posix-class-shorthand-escapes-table"/>) according to the prevailing locale (which you can control by attaching a <literal>COLLATE</literal> clause to the operator or function). XQuery specifies these classes by reference to Unicode character properties, so equivalent behavior is obtained only with a locale that follows the Unicode rules. </para> </listitem> <listitem> <para> The SQL standard (not XQuery itself) attempts to cater for more variants of <quote>newline</quote> than POSIX does. The newline-sensitive matching options described above consider only ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>) (a Windows-style newline), and some Unicode-only characters like LINE SEPARATOR (U+2028) as newlines as well. Notably, <literal>.</literal> and <literal>\s</literal> should count <literal>\r\n</literal> as one character not two according to SQL. </para> </listitem> <listitem> <para> Of the character-entry escapes described in <xref linkend="posix-character-entry-escapes-table"/>, XQuery supports only <literal>\n</literal>, <literal>\r</literal>, and <literal>\t</literal>. </para> </listitem> <listitem> <para> XQuery does not support the <literal>[:<replaceable>name</replaceable>:]</literal> syntax for character classes within bracket expressions. </para> </listitem> <listitem> <para> XQuery does not have lookahead or lookbehind constraints, nor any of the constraint escapes described in <xref linkend="posix-constraint-escapes-table"/>. </para> </listitem> <listitem> <para> The metasyntax forms described in <xref linkend="posix-metasyntax"/> do not exist in XQuery. </para> </listitem> <listitem> <para> The regular expression flag letters defined by XQuery are related to but not the same as the option letters for POSIX (<xref linkend="posix-embedded-options-table"/>). While the <literal>i</literal> and <literal>q</literal> options behave the same, others do not: <itemizedlist> <listitem> <para> XQuery's <literal>s</literal> (allow dot to match newline) and <literal>m</literal> (allow <literal>^</literal> and <literal>$</literal> to match at newlines) flags provide access to the same behaviors as POSIX's <literal>n</literal>, <literal>p</literal> and <literal>w</literal> flags, but they do <emphasis>not</emphasis> match the behavior of POSIX's <literal>s</literal> and <literal>m</literal> flags. Note in particular that dot-matches-newline is the default behavior in POSIX but not XQuery. </para> </listitem> <listitem>

This section continues detailing the differences between POSIX and XQuery regular expressions. Key differences include how POSIX interprets character classes according to the locale, while XQuery uses Unicode character properties. SQL attempts to cater to more newline variants than POSIX. XQuery supports a limited set of character-entry escapes. XQuery lacks support for character class syntax within bracket expressions, lookahead/lookbehind constraints, and metasyntax forms. Finally, it highlights the differences in regular expression flag letters, particularly regarding the 's' and 'm' flags, noting that the 's' flag (dot matches newline) is the default behavior in POSIX but not in XQuery.