POSIX Regular Expressions: Definition, Examples, and substring Function

<returnvalue>t</returnvalue> </para></entry> </row> <row> <entry role="func_table_entry"><para role="func_signature"> <type>text</type> <literal>!~*</literal> <type>text</type> <returnvalue>boolean</returnvalue> </para> <para> String does not match regular expression, case-insensitively </para> <para> <literal>'thomas' !~* 'T.*ma'</literal> <returnvalue>f</returnvalue> </para></entry> </row> </tbody> </tgroup> </table> <para> <acronym>POSIX</acronym> regular expressions provide a more powerful means for pattern matching than the <function>LIKE</function> and <function>SIMILAR TO</function> operators. Many Unix tools such as <command>egrep</command>, <command>sed</command>, or <command>awk</command> use a pattern matching language that is similar to the one described here. </para> <para> A regular expression is a character sequence that is an abbreviated definition of a set of strings (a <firstterm>regular set</firstterm>). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with <function>LIKE</function>, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than <function>LIKE</function> does. Unlike <function>LIKE</function> patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string. </para> <para> Some examples: <programlisting> 'abcd' ~ 'bc' <lineannotation>true</lineannotation> 'abcd' ~ 'a.c' <lineannotation>true — dot matches any character</lineannotation> 'abcd' ~ 'a.*d' <lineannotation>true — <literal>*</literal> repeats the preceding pattern item</lineannotation> 'abcd' ~ '(b|x)' <lineannotation>true — <literal>|</literal> means OR, parentheses group</lineannotation> 'abcd' ~ '^a' <lineannotation>true — <literal>^</literal> anchors to start of string</lineannotation> 'abcd' ~ '^(b|c)' <lineannotation>false — would match except for anchoring</lineannotation> </programlisting> </para> <para> The <acronym>POSIX</acronym> pattern language is described in much greater detail below. </para> <para> The <function>substring</function> function with two parameters, <function>substring(<replaceable>string</replaceable> from <replaceable>pattern</replaceable>)</function>, provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the first portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below. </para> <para> Some examples: <programlisting> substring('foobar' from 'o.b') <lineannotation>oob</lineannotation> substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation> </programlisting> </para> <para> The <function>regexp_count</function> function counts the number of places where a POSIX regular expression pattern matches a string. It has the syntax <function>regexp_count</function>(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>start</replaceable>

This section delves deeper into POSIX regular expressions, defining them as abbreviated definitions of string sets. It explains the matching process and highlights the differences between regular expression special characters and those used in `LIKE`. Examples illustrate how regular expressions work, including anchoring and OR operations. It introduces the `substring` function for extracting substrings that match a POSIX regular expression pattern, detailing how parentheses within the pattern affect the returned value. Finally, it introduces the `regexp_count` function.