object or database with ICU as the
provider, the given locale name is transformed ("canonicalized") into a
language tag if not already in that form. For instance,
<screen>
CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
NOTICE: using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
</screen>
If you see this notice, ensure that the <symbol>provider</symbol> and
<symbol>locale</symbol> are the expected result. For consistent results
when using the ICU provider, specify the canonical <link
linkend="icu-language-tag">language tag</link> instead of relying on the
transformation.
</para>
<para>
A locale with no language name, or the special language name
<literal>root</literal>, is transformed to have the language
<literal>und</literal> ("undefined").
</para>
<para>
ICU can transform most libc locale names, as well as some other formats,
into language tags for easier transition to ICU. If a libc locale name is
used in ICU, it may not have precisely the same behavior as in libc.
</para>
<para>
If there is a problem interpreting the locale name, or if the locale name
represents a language or region that ICU does not recognize, you will see
the following warning:
<screen>
CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
WARNING: ICU locale "nonsense" has unknown language "nonsense"
HINT: To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
CREATE COLLATION
</screen>
<xref linkend="guc-icu-validation-level"/> controls how the message is
reported. Unless set to <literal>ERROR</literal>, the collation will
still be created, but the behavior may not be what the user intended.
</para>
</sect3>
<sect3 id="icu-language-tag">
<title>Language Tag</title>
<para>
A language tag, defined in BCP 47, is a standardized identifier used to
identify languages, regions, and other information about a locale.
</para>
<para>
Basic language tags are simply
<replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
or even just <replaceable>language</replaceable>. The
<replaceable>language</replaceable> is a language code
(e.g. <literal>fr</literal> for French), and
<replaceable>region</replaceable> is a region code
(e.g. <literal>CA</literal> for Canada). Examples:
<literal>ja-JP</literal>, <literal>de</literal>, or
<literal>fr-CA</literal>.
</para>
<para>
Collation settings may be included in the language tag to customize
collation behavior. ICU allows extensive customization, such as
sensitivity (or insensitivity) to accents, case, and punctuation;
treatment of digits within text; and many other options to satisfy a
variety of uses.
</para>
<para>
To include this additional collation information in a language tag,
append <literal>-u</literal>, which indicates there are additional
collation settings, followed by one or more
<literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
pairs. The <replaceable>key</replaceable> is the key for a <link
linkend="icu-collation-settings">collation setting</link> and
<replaceable>value</replaceable> is a valid value for that setting. For
boolean settings, the <literal>-</literal><replaceable>key</replaceable>
may be specified without a corresponding
<literal>-</literal><replaceable>value</replaceable>, which implies a
value of <literal>true</literal>.
</para>
<para>
For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
means the locale with the English language in the US region, with
collation