substring Function with Regular Expressions

expressions; see <xref linkend="posix-character-entry-escapes-table"/>, <xref linkend="posix-class-shorthand-escapes-table"/>, and <xref linkend="posix-constraint-escapes-table"/> below. </para> <para> Some examples: <programlisting> 'abc' SIMILAR TO 'abc' <lineannotation>true</lineannotation> 'abc' SIMILAR TO 'a' <lineannotation>false</lineannotation> 'abc' SIMILAR TO '%(b|d)%' <lineannotation>true</lineannotation> 'abc' SIMILAR TO '(b|c)%' <lineannotation>false</lineannotation> '-abc-' SIMILAR TO '%\mabc\M%' <lineannotation>true</lineannotation> 'xabcy' SIMILAR TO '%\mabc\M%' <lineannotation>false</lineannotation> </programlisting> </para> <para> The <function>substring</function> function with three parameters provides extraction of a substring that matches an SQL regular expression pattern. The function can be written according to standard SQL syntax: <synopsis> substring(<replaceable>string</replaceable> similar <replaceable>pattern</replaceable> escape <replaceable>escape-character</replaceable>) </synopsis> or using the now obsolete SQL:1999 syntax: <synopsis> substring(<replaceable>string</replaceable> from <replaceable>pattern</replaceable> for <replaceable>escape-character</replaceable>) </synopsis> or as a plain three-argument function: <synopsis> substring(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>, <replaceable>escape-character</replaceable>) </synopsis> As with <literal>SIMILAR TO</literal>, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern for which the matching data sub-string is of interest, the pattern should contain two occurrences of the escape character followed by a double quote (<literal>"</literal>).  The text matching the portion of the pattern between these separators is returned when the match is successful. </para> <para> The escape-double-quote separators actually divide <function>substring</function>'s pattern into three independent regular expressions; for example, a vertical bar (<literal>|</literal>) in any of the three sections affects only that section. Also, the first and third of these regular expressions are defined to match the smallest possible amount of text, not the largest, when there is any ambiguity about how much of the data string matches which pattern. (In POSIX parlance, the first and third regular expressions are forced to be non-greedy.) </para> <para> As an extension to the SQL standard, <productname>PostgreSQL</productname> allows there to be just one escape-double-quote separator, in which case the third regular expression is taken as empty; or no separators, in which case the first and third regular expressions are taken as empty. </para> <para> Some examples, with <literal>#"</literal> delimiting the return string: <programlisting> substring('foobar' similar '%#"o_b#"%' escape '#') <lineannotation>oob</lineannotation> substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</lineannotation> </programlisting> </para> </sect2> <sect2 id="functions-posix-regexp"> <title><acronym>POSIX</acronym> Regular Expressions</title> <indexterm zone="functions-posix-regexp"> <primary>regular expression</primary> <seealso>pattern matching</seealso> </indexterm> <indexterm> <primary>substring</primary> </indexterm> <indexterm> <primary>regexp_count</primary> </indexterm> <indexterm> <primary>regexp_instr</primary> </indexterm> <indexterm> <primary>regexp_like</primary> </indexterm> <indexterm> <primary>regexp_match</primary> </indexterm> <indexterm> <primary>regexp_matches</primary> </indexterm> <indexterm> <primary>regexp_replace</primary>

This section describes the `substring` function used with SQL regular expressions for extracting substrings. It details the function's syntax, including standard SQL and older SQL:1999 versions, and explains how to use escape characters and double quotes to specify the part of the pattern to extract. The function requires the pattern to match the entire string, returning null otherwise. It also covers PostgreSQL's extension allowing for one or no escape-double-quote separators. Examples are provided to illustrate the function's usage. The section then transitions to discussing POSIX regular expressions.