<indexterm zone="sql-syntax-strings-uescape">
<primary>Unicode escape</primary>
<secondary>in string constants</secondary>
</indexterm>
<para>
<productname>PostgreSQL</productname> also supports another type
of escape syntax for strings that allows specifying arbitrary
Unicode characters by code point. A Unicode escape string
constant starts with <literal>U&</literal> (upper or lower case
letter U followed by ampersand) immediately before the opening
quote, without any spaces in between, for
example <literal>U&'foo'</literal>. (Note that this creates an
ambiguity with the operator <literal>&</literal>. Use spaces
around the operator to avoid this problem.) Inside the quotes,
Unicode characters can be specified in escaped form by writing a
backslash followed by the four-digit hexadecimal code point
number or alternatively a backslash followed by a plus sign
followed by a six-digit hexadecimal code point number. For
example, the string <literal>'data'</literal> could be written as
<programlisting>
U&'d\0061t\+000061'
</programlisting>
The following less trivial example writes the Russian
word <quote>slon</quote> (elephant) in Cyrillic letters:
<programlisting>
U&'\0441\043B\043E\043D'
</programlisting>
</para>
<para>
If a different escape character than backslash is desired, it can
be specified using
the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
clause after the string, for example:
<programlisting>
U&'d!0061t!+000061' UESCAPE '!'
</programlisting>
The escape character can be any single character other than a
hexadecimal digit, the plus sign, a single quote, a double quote,
or a whitespace character.
</para>
<para>
To include the escape character in the string literally, write
it twice.
</para>
<para>
Either the 4-digit or the 6-digit escape form can be used to
specify UTF-16 surrogate pairs to compose characters with code
points larger than U+FFFF, although the availability of the
6-digit form technically makes this unnecessary. (Surrogate
pairs are not stored directly, but are combined into a single
code point.)
</para>
<para>
If the server encoding is not UTF-8, the Unicode code point identified
by one of these escape sequences is converted to the actual server
encoding; an error is reported if that's not possible.
</para>
<para>
Also, the Unicode escape syntax for string constants only works
when the configuration
parameter <xref linkend="guc-standard-conforming-strings"/> is
turned on. This is because otherwise this syntax could confuse
clients that parse the SQL statements to the point that it could
lead to SQL injections and similar security issues. If the
parameter is set to off, this syntax will be rejected with an
error message.
</para>
</sect3>
<sect3 id="sql-syntax-dollar-quoting">
<title>Dollar-Quoted String Constants</title>
<indexterm>
<primary>dollar quoting</primary>
</indexterm>
<para>
While the standard syntax for specifying string constants is usually
convenient, it can be difficult to understand when the desired string
contains many single quotes, since each of those must
be doubled. To allow more readable queries in such situations,
<productname>PostgreSQL</productname> provides another way, called
<quote>dollar quoting</quote>, to write string constants.
A dollar-quoted string constant
consists of a dollar sign (<literal>$</literal>), an optional
<quote>tag</quote> of zero or more characters, another dollar
sign, an arbitrary sequence of characters that makes up the
string content, a dollar sign, the same tag that began this
dollar quote, and a dollar sign. For example, here are