<literal>C.UTF-8</literal> locale is available only for when the
database encoding is <literal>UTF-8</literal>, and the behavior is
based on Unicode. The collation uses the code point values only. The
regular expression character classes are based on the "POSIX
Compatible" semantics, and the case mapping is the "simple" variant.
</para>
<para>
The <literal>PG_UNICODE_FAST</literal> locale is available only when
the database encoding is <literal>UTF-8</literal>, and the behavior is
based on Unicode. The collation uses the code point values only. The
regular expression character classes are based on the "Standard"
semantics, and the case mapping is the "full" variant.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>icu</literal></term>
<listitem>
<para>
The <literal>icu</literal> provider uses the external
ICU<indexterm><primary>ICU</primary></indexterm>
library. <productname>PostgreSQL</productname> must have been
configured with support.
</para>
<para>
ICU provides collation and character classification behavior that is
independent of the operating system and database encoding, which is
preferable if you expect to transition to other platforms without any
change in results. <literal>LC_COLLATE</literal> and
<literal>LC_CTYPE</literal> can be set independently of the ICU
locale.
</para>
<note>
<para>
For the ICU provider, results may depend on the version of the ICU
library used, as it is updated to reflect changes in natural language
over time.
</para>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>libc</literal></term>
<listitem>
<para>
The <literal>libc</literal> provider uses the operating system's C
library. The collation and character classification behavior is
controlled by the settings <literal>LC_COLLATE</literal> and
<literal>LC_CTYPE</literal>, so they cannot be set independently.
</para>
<note>
<para>
The same locale name may have different behavior on different
platforms when using the libc provider.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="icu-locales">
<title>ICU Locales</title>
<sect3 id="icu-locale-names">
<title>ICU Locale Names</title>
<para>
The ICU format for the locale name is a <link
linkend="icu-language-tag">Language Tag</link>.
<programlisting>
CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
</programlisting>
</para>
</sect3>
<sect3 id="icu-canonicalization">
<title>Locale Canonicalization and Validation</title>
<para>
When defining a new ICU collation object or database with ICU as the
provider, the given locale name is transformed ("canonicalized") into a
language tag if not already in that form. For instance,
<screen>
CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
NOTICE: using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
</screen>
If you see this notice, ensure that the <symbol>provider</symbol> and
<symbol>locale</symbol> are the expected result. For consistent results
when using the ICU provider, specify the canonical <link
linkend="icu-language-tag">language tag</link> instead of relying on the
transformation.
</para>
<para>
A locale with no language name, or the special language name
<literal>root</literal>, is transformed to have the language
<literal>und</literal> ("undefined").
</para>