See also <ulink url="https://www.unicode.org/reports/tr10">Unicode Technical
Standard 10</ulink> for more information on the terminology.
</para>
<para>
To create a nondeterministic collation, specify the property
<literal>deterministic = false</literal> to <command>CREATE
COLLATION</command>, for example:
<programlisting>
CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
</programlisting>
This example would use the standard Unicode collation in a
nondeterministic way. In particular, this would allow strings in
different normal forms to be compared correctly. More interesting
examples make use of the ICU customization facilities explained above.
For example:
<programlisting>
CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
</programlisting>
</para>
<para>
All standard and predefined collations are deterministic, all
user-defined collations are deterministic by default. While
nondeterministic collations give a more <quote>correct</quote> behavior,
especially when considering the full power of Unicode and its many
special cases, they also have some drawbacks. Foremost, their use leads
to a performance penalty. Note, in particular, that B-tree cannot use
deduplication with indexes that use a nondeterministic collation. Also,
certain operations are not possible with nondeterministic collations,
such as some pattern matching operations. Therefore, they should be used
only in cases where they are specifically wanted.
</para>
<tip>
<para>
To deal with text in different Unicode normalization forms, it is also
an option to use the functions/expressions
<function>normalize</function> and <literal>is normalized</literal> to
preprocess or check the strings, instead of using nondeterministic
collations. There are different trade-offs for each approach.
</para>
</tip>
</sect3>
</sect2>
<sect2 id="icu-custom-collations">
<title>ICU Custom Collations</title>
<para>
ICU allows extensive control over collation behavior by defining new
collations with collation settings as a part of the language tag. These
settings can modify the collation order to suit a variety of needs. For
instance:
<programlisting>
-- ignore differences in accents and case
CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
-- upper case letters sort before lower case.
CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
SELECT 'B' < 'b' COLLATE upper_first; -- true
-- treat digits numerically and ignore punctuation
CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-kn');
SELECT 'id-45' < 'id-123' COLLATE num_ignore_punct; -- true
SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
</programlisting>
Many of the available options are described in <xref
linkend="icu-collation-settings"/>, or see <xref
linkend="icu-external-references"/> for more details.
</para>
<sect3 id="icu-collation-comparison-levels">
<title>ICU Comparison Levels</title>
<para>
Comparison of two strings (collation) in ICU is determined by a
multi-level process, where textual features are grouped into
"levels". Treatment of each level is controlled by the <link
linkend="icu-collation-settings-table">collation settings</link>. Higher
levels correspond to finer textual features.
</para>
<para>
<xref linkend="icu-collation-levels"/> shows which textual feature
differences are considered significant when