Collation Management and Standard Collations

<symbol>LC_COLLATE</symbol>, which controls the sort order. But it is rarely necessary in practice to have an <symbol>LC_CTYPE</symbol> setting that is different from <symbol>LC_COLLATE</symbol>, so it is more convenient to collect these under one concept than to create another infrastructure for setting <symbol>LC_CTYPE</symbol> per expression.) Also, a <literal>libc</literal> collation is tied to a character set encoding (see <xref linkend="multibyte"/>). The same collation name may exist for different encodings. </para> <para> A collation object provided by <literal>icu</literal> maps to a named collator provided by the ICU library. ICU does not support separate <quote>collate</quote> and <quote>ctype</quote> settings, so they are always the same. Also, ICU collations are independent of the encoding, so there is always only one ICU collation of a given name in a database. </para> <sect3 id="collation-managing-standard"> <title>Standard Collations</title> <para> On all platforms, the following collations are supported: <variablelist> <varlistentry> <term><literal>unicode</literal></term> <listitem> <para> This SQL standard collation sorts using the Unicode Collation Algorithm with the Default Unicode Collation Element Table. It is available in all encodings. ICU support is required to use this collation, and behavior may change if <productname>PostgreSQL</productname> is built with a different version of ICU. (This collation has the same behavior as the ICU root locale; see <xref linkend="collation-managing-predefined-icu-und-x-icu"/>.) </para> </listitem> </varlistentry> <varlistentry> <term><literal>ucs_basic</literal></term> <listitem> <para> This SQL standard collation sorts using the Unicode code point values rather than natural language order, and only the ASCII letters <quote><literal>A</literal></quote> through <quote><literal>Z</literal></quote> are treated as letters. The behavior is efficient and stable across all versions. Only available for encoding <literal>UTF8</literal>. (This collation has the same behavior as the libc locale specification <literal>C</literal> in <literal>UTF8</literal> encoding.) </para> </listitem> </varlistentry> <varlistentry> <term><literal>pg_unicode_fast</literal></term> <listitem> <para> This collation sorts by Unicode code point values rather than natural language order. For the functions <function>lower</function>, <function>initcap</function>, and <function>upper</function> it uses Unicode full case mapping. For pattern matching (including regular expressions), it uses the Standard variant of Unicode <ulink url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility Properties</ulink>. Behavior is efficient and stable within a <productname>Postgres</productname> major version. It is only available for encoding <literal>UTF8</literal>. </para> </listitem> </varlistentry> <varlistentry> <term><literal>pg_c_utf8</literal></term> <listitem> <para> This collation sorts by Unicode code point values rather than natural language order. For the functions <function>lower</function>, <function>initcap</function>, and <function>upper</function>, it uses Unicode simple case mapping. For pattern matching (including regular expressions), it uses the POSIX Compatible variant of Unicode <ulink url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility Properties</ulink>. Behavior is efficient and stable within a <productname>PostgreSQL</productname>

This section discusses the management of collations, including the difference between libc and ICU collations, and their relationship to character set encodings. It also describes several standard collations, including unicode, ucs_basic, pg_unicode_fast, and pg_c_utf8, each with its own sorting behavior and compatibility characteristics, and available in specific encodings such as UTF8.