ICU Tailoring Rules and External References

default is Latin before Greek.) </para> </listitem> </varlistentry> <varlistentry id="collation-managing-create-icu-en-u-kf-upper"> <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term> <listitem> <para> Sort upper-case letters before lower-case letters. (The default is lower-case letters first.) </para> </listitem> </varlistentry> <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn"> <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term> <listitem> <para> Combines both of the above options. </para> </listitem> </varlistentry> </variablelist> </sect3> <sect3 id="icu-tailoring-rules"> <title>ICU Tailoring Rules</title> <para> If the options provided by the collation settings shown above are not sufficient, the order of collation elements can be changed with tailoring rules, whose syntax is detailed at <ulink url="https://unicode-org.github.io/icu/userguide/collation/customization/"></ulink>. </para> <para> This small example creates a collation based on the root locale with a tailoring rule: <programlisting> <![CDATA[CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&V << w <<< W');]]> </programlisting> With this rule, the letter <quote>W</quote> is sorted after <quote>V</quote>, but is treated as a secondary difference similar to an accent. Rules like this are contained in the locale definitions of some languages. (Of course, if a locale definition already contains the desired rules, then they don't need to be specified again explicitly.) </para> <para> Here is a more complex example. The following statement sets up a collation named <literal>ebcdic</literal> with rules to sort US-ASCII characters in the order of the EBCDIC encoding. <programlisting> <![CDATA[CREATE COLLATION ebcdic (provider = icu, locale = 'und', rules = $$ & ' ' < '.' < '<' < '(' < '+' < \| < '&' < '!' < '$' < '*' < ')' < ';' < '-' < '/' < ',' < '%' < '_' < '>' < '?' < '`' < ':' < '#' < '@' < \' < '=' < '"' <*a-r < '~' <*s-z < '^' < '[' < ']' < '{' <*A-I < '}' <*J-R < '\' <*S-Z <*0-9 $$);]]> SELECT c FROM (VALUES ('a'), ('b'), ('A'), ('B'), ('1'), ('2'), ('!'), ('^')) AS x(c) ORDER BY c COLLATE ebcdic; c --- ! a b ^ A B 1 2 </programlisting> </para> </sect3> <sect3 id="icu-external-references"> <title>External References for ICU</title> <para> This section (<xref linkend="icu-custom-collations"/>) is only a brief overview of ICU behavior and language tags. Refer to the following documents for technical details, additional options, and new behavior: </para> <itemizedlist> <listitem> <para> <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink> </para> </listitem> <listitem> <para> <ulink url="https://www.rfc-editor.org/info/bcp47">BCP 47</ulink> </para> </listitem> <listitem> <para> <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink> </para> </listitem> <listitem> <para> <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink> </para> </listitem> <listitem> <para> <ulink url="https://unicode-org.github.io/icu/userguide/collation/"></ulink> </para> </listitem> </itemizedlist> </sect3> </sect2> </sect1> <sect1 id="multibyte"> <title>Character Set Support</title> <indexterm zone="multibyte"><primary>character set</primary></indexterm> <para> The character set support in <productname>PostgreSQL</productname> allows

This section explains how to customize ICU collation using tailoring rules, which can change the order of collation elements, and provides examples of creating custom collations with specific rules, as well as references to external documents for further technical details on ICU behavior, language tags, and character set support.