Home Explore Blog CI



postgresql

6th chunk of `doc/src/sgml/charset.sgml`
bc2072f97d423b153db2327e95ff47433304b1e2f56474df0000000100000fa0
 <literal>C.UTF-8</literal> locale is available only for when the
       database encoding is <literal>UTF-8</literal>, and the behavior is
       based on Unicode. The collation uses the code point values only. The
       regular expression character classes are based on the "POSIX
       Compatible" semantics, and the case mapping is the "simple" variant.
      </para>
      <para>
       The <literal>PG_UNICODE_FAST</literal> locale is available only when
       the database encoding is <literal>UTF-8</literal>, and the behavior is
       based on Unicode. The collation uses the code point values only. The
       regular expression character classes are based on the "Standard"
       semantics, and the case mapping is the "full" variant.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><literal>icu</literal></term>
     <listitem>
      <para>
       The <literal>icu</literal> provider uses the external
       ICU<indexterm><primary>ICU</primary></indexterm>
       library. <productname>PostgreSQL</productname> must have been
       configured with support.
      </para>
      <para>
       ICU provides collation and character classification behavior that is
       independent of the operating system and database encoding, which is
       preferable if you expect to transition to other platforms without any
       change in results. <literal>LC_COLLATE</literal> and
       <literal>LC_CTYPE</literal> can be set independently of the ICU
       locale.
      </para>
      <note>
       <para>
        For the ICU provider, results may depend on the version of the ICU
        library used, as it is updated to reflect changes in natural language
        over time.
       </para>
      </note>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><literal>libc</literal></term>
     <listitem>
      <para>
       The <literal>libc</literal> provider uses the operating system's C
       library. The collation and character classification behavior is
       controlled by the settings <literal>LC_COLLATE</literal> and
       <literal>LC_CTYPE</literal>, so they cannot be set independently.
      </para>
      <note>
       <para>
        The same locale name may have different behavior on different
        platforms when using the libc provider.
       </para>
      </note>
     </listitem>
    </varlistentry>
   </variablelist>
  </sect2>

  <sect2 id="icu-locales">
   <title>ICU Locales</title>

   <sect3 id="icu-locale-names">
    <title>ICU Locale Names</title>

    <para>
     The ICU format for the locale name is a <link
     linkend="icu-language-tag">Language Tag</link>.

<programlisting>
CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
</programlisting>
    </para>
   </sect3>

   <sect3 id="icu-canonicalization">
    <title>Locale Canonicalization and Validation</title>
    <para>
     When defining a new ICU collation object or database with ICU as the
     provider, the given locale name is transformed ("canonicalized") into a
     language tag if not already in that form. For instance,

<screen>
CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
</screen>

     If you see this notice, ensure that the <symbol>provider</symbol> and
     <symbol>locale</symbol> are the expected result. For consistent results
     when using the ICU provider, specify the canonical <link
     linkend="icu-language-tag">language tag</link> instead of relying on the
     transformation.
    </para>

    <para>
     A locale with no language name, or the special language name
     <literal>root</literal>, is transformed to have the language
     <literal>und</literal> ("undefined").
    </para>

 

Title: Locale Providers and ICU Locales in PostgreSQL
Summary
This section describes the available locale providers in PostgreSQL, including 'builtin', 'icu', and 'libc'. The 'icu' provider uses the external ICU library and provides collation and character classification behavior independent of the operating system and database encoding. The section also explains how to use ICU locales, including locale names, canonicalization, and validation, to ensure consistent results when using the ICU provider.