Home Explore Blog CI



postgresql

7th chunk of `doc/src/sgml/charset.sgml`
fe74c51c30164e639e297ddc4853b19928dd69489c9096630000000100000fa2
 object or database with ICU as the
     provider, the given locale name is transformed ("canonicalized") into a
     language tag if not already in that form. For instance,

<screen>
CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
</screen>

     If you see this notice, ensure that the <symbol>provider</symbol> and
     <symbol>locale</symbol> are the expected result. For consistent results
     when using the ICU provider, specify the canonical <link
     linkend="icu-language-tag">language tag</link> instead of relying on the
     transformation.
    </para>

    <para>
     A locale with no language name, or the special language name
     <literal>root</literal>, is transformed to have the language
     <literal>und</literal> ("undefined").
    </para>

    <para>
     ICU can transform most libc locale names, as well as some other formats,
     into language tags for easier transition to ICU. If a libc locale name is
     used in ICU, it may not have precisely the same behavior as in libc.
    </para>

    <para>
     If there is a problem interpreting the locale name, or if the locale name
     represents a language or region that ICU does not recognize, you will see
     the following warning:

<screen>
CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
WARNING:  ICU locale "nonsense" has unknown language "nonsense"
HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
CREATE COLLATION
</screen>

     <xref linkend="guc-icu-validation-level"/> controls how the message is
     reported. Unless set to <literal>ERROR</literal>, the collation will
     still be created, but the behavior may not be what the user intended.
    </para>
   </sect3>

   <sect3 id="icu-language-tag">
    <title>Language Tag</title>

    <para>
     A language tag, defined in BCP 47, is a standardized identifier used to
     identify languages, regions, and other information about a locale.
    </para>

    <para>
     Basic language tags are simply
     <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
     or even just <replaceable>language</replaceable>. The
     <replaceable>language</replaceable> is a language code
     (e.g. <literal>fr</literal> for French), and
     <replaceable>region</replaceable> is a region code
     (e.g. <literal>CA</literal> for Canada). Examples:
     <literal>ja-JP</literal>, <literal>de</literal>, or
     <literal>fr-CA</literal>.
    </para>

    <para>
     Collation settings may be included in the language tag to customize
     collation behavior. ICU allows extensive customization, such as
     sensitivity (or insensitivity) to accents, case, and punctuation;
     treatment of digits within text; and many other options to satisfy a
     variety of uses.
    </para>

    <para>
     To include this additional collation information in a language tag,
     append <literal>-u</literal>, which indicates there are additional
     collation settings, followed by one or more
     <literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
     pairs. The <replaceable>key</replaceable> is the key for a <link
     linkend="icu-collation-settings">collation setting</link> and
     <replaceable>value</replaceable> is a valid value for that setting. For
     boolean settings, the <literal>-</literal><replaceable>key</replaceable>
     may be specified without a corresponding
     <literal>-</literal><replaceable>value</replaceable>, which implies a
     value of <literal>true</literal>.
    </para>

    <para>
     For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
     means the locale with the English language in the US region, with
     collation

Title: ICU Locale Canonicalization and Language Tags
Summary
This section explains how ICU locale names are transformed into language tags, and how to use language tags to specify collation settings. It also describes the format of language tags, including basic language tags and extensions for customizing collation behavior. The section provides examples of language tags and explains how to include additional collation information in a language tag using the '-u' extension and key-value pairs.