Locale Providers and ICU Locales in PostgreSQL

<literal>C.UTF-8</literal> locale is available only for when the database encoding is <literal>UTF-8</literal>, and the behavior is based on Unicode. The collation uses the code point values only. The regular expression character classes are based on the "POSIX Compatible" semantics, and the case mapping is the "simple" variant. </para> <para> The <literal>PG_UNICODE_FAST</literal> locale is available only when the database encoding is <literal>UTF-8</literal>, and the behavior is based on Unicode. The collation uses the code point values only. The regular expression character classes are based on the "Standard" semantics, and the case mapping is the "full" variant. </para> </listitem> </varlistentry> <varlistentry> <term><literal>icu</literal></term> <listitem> <para> The <literal>icu</literal> provider uses the external ICU<indexterm><primary>ICU</primary></indexterm> library. <productname>PostgreSQL</productname> must have been configured with support. </para> <para> ICU provides collation and character classification behavior that is independent of the operating system and database encoding, which is preferable if you expect to transition to other platforms without any change in results. <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> can be set independently of the ICU locale. </para> <note> <para> For the ICU provider, results may depend on the version of the ICU library used, as it is updated to reflect changes in natural language over time. </para> </note> </listitem> </varlistentry> <varlistentry> <term><literal>libc</literal></term> <listitem> <para> The <literal>libc</literal> provider uses the operating system's C library. The collation and character classification behavior is controlled by the settings <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal>, so they cannot be set independently. </para> <note> <para> The same locale name may have different behavior on different platforms when using the libc provider. </para> </note> </listitem> </varlistentry> </variablelist> </sect2> <sect2 id="icu-locales"> <title>ICU Locales</title> <sect3 id="icu-locale-names"> <title>ICU Locale Names</title> <para> The ICU format for the locale name is a <link linkend="icu-language-tag">Language Tag</link>. <programlisting> CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP'); CREATE COLLATION mycollation2 (provider = icu, locale = 'fr'); </programlisting> </para> </sect3> <sect3 id="icu-canonicalization"> <title>Locale Canonicalization and Validation</title> <para> When defining a new ICU collation object or database with ICU as the provider, the given locale name is transformed ("canonicalized") into a language tag if not already in that form. For instance, <screen> CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true'); NOTICE: using standard form "en-US-u-kn" for locale "en-US-u-kn-true" CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8'); NOTICE: using standard form "de-DE" for locale "de_DE.utf8" </screen> If you see this notice, ensure that the <symbol>provider</symbol> and <symbol>locale</symbol> are the expected result. For consistent results when using the ICU provider, specify the canonical <link linkend="icu-language-tag">language tag</link> instead of relying on the transformation. </para> <para> A locale with no language name, or the special language name <literal>root</literal>, is transformed to have the language <literal>und</literal> ("undefined"). </para>

This section describes the available locale providers in PostgreSQL, including 'builtin', 'icu', and 'libc'. The 'icu' provider uses the external ICU library and provides collation and character classification behavior independent of the operating system and database encoding. The section also explains how to use ICU locales, including locale names, canonicalization, and validation, to ensure consistent results when using the ICU provider.