Home Explore Blog CI



postgresql

7th chunk of `doc/src/sgml/syntax.sgml`
015f5e8739e51f00f6afae8b3fd1800045041f33c8a391310000000100000fa3
 <indexterm  zone="sql-syntax-strings-uescape">
     <primary>Unicode escape</primary>
     <secondary>in string constants</secondary>
    </indexterm>

    <para>
     <productname>PostgreSQL</productname> also supports another type
     of escape syntax for strings that allows specifying arbitrary
     Unicode characters by code point.  A Unicode escape string
     constant starts with <literal>U&amp;</literal> (upper or lower case
     letter U followed by ampersand) immediately before the opening
     quote, without any spaces in between, for
     example <literal>U&amp;'foo'</literal>.  (Note that this creates an
     ambiguity with the operator <literal>&amp;</literal>.  Use spaces
     around the operator to avoid this problem.)  Inside the quotes,
     Unicode characters can be specified in escaped form by writing a
     backslash followed by the four-digit hexadecimal code point
     number or alternatively a backslash followed by a plus sign
     followed by a six-digit hexadecimal code point number.  For
     example, the string <literal>'data'</literal> could be written as
<programlisting>
U&amp;'d\0061t\+000061'
</programlisting>
     The following less trivial example writes the Russian
     word <quote>slon</quote> (elephant) in Cyrillic letters:
<programlisting>
U&amp;'\0441\043B\043E\043D'
</programlisting>
    </para>

    <para>
     If a different escape character than backslash is desired, it can
     be specified using
     the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
     clause after the string, for example:
<programlisting>
U&amp;'d!0061t!+000061' UESCAPE '!'
</programlisting>
     The escape character can be any single character other than a
     hexadecimal digit, the plus sign, a single quote, a double quote,
     or a whitespace character.
    </para>

    <para>
     To include the escape character in the string literally, write
     it twice.
    </para>

    <para>
     Either the 4-digit or the 6-digit escape form can be used to
     specify UTF-16 surrogate pairs to compose characters with code
     points larger than U+FFFF, although the availability of the
     6-digit form technically makes this unnecessary.  (Surrogate
     pairs are not stored directly, but are combined into a single
     code point.)
    </para>

    <para>
     If the server encoding is not UTF-8, the Unicode code point identified
     by one of these escape sequences is converted to the actual server
     encoding; an error is reported if that's not possible.
    </para>

    <para>
     Also, the Unicode escape syntax for string constants only works
     when the configuration
     parameter <xref linkend="guc-standard-conforming-strings"/> is
     turned on.  This is because otherwise this syntax could confuse
     clients that parse the SQL statements to the point that it could
     lead to SQL injections and similar security issues.  If the
     parameter is set to off, this syntax will be rejected with an
     error message.
    </para>
   </sect3>

   <sect3 id="sql-syntax-dollar-quoting">
    <title>Dollar-Quoted String Constants</title>

     <indexterm>
      <primary>dollar quoting</primary>
     </indexterm>

    <para>
     While the standard syntax for specifying string constants is usually
     convenient, it can be difficult to understand when the desired string
     contains many single quotes, since each of those must
     be doubled. To allow more readable queries in such situations,
     <productname>PostgreSQL</productname> provides another way, called
     <quote>dollar quoting</quote>, to write string constants.
     A dollar-quoted string constant
     consists of a dollar sign (<literal>$</literal>), an optional
     <quote>tag</quote> of zero or more characters, another dollar
     sign, an arbitrary sequence of characters that makes up the
     string content, a dollar sign, the same tag that began this
     dollar quote, and a dollar sign. For example, here are

Title: Details and Usage of Unicode Escape Syntax in PostgreSQL
Summary
This section elaborates on the Unicode escape syntax in PostgreSQL, explaining how to specify Unicode characters using 4 or 6-digit hexadecimal code points, and how to use a custom escape character with the UESCAPE clause. It covers handling the escape character literally, using surrogate pairs for characters beyond U+FFFF, and the conversion to the server encoding. It emphasizes that this syntax is only enabled when standard_conforming_strings is turned on to prevent SQL injection risks. Furthermore, it introduces dollar-quoted string constants as an alternative to standard syntax, especially useful when dealing with many single quotes.