Managing Predefined Collations

inspect the currently available locales, use the query <literal>SELECT * FROM pg_collation</literal>, or the command <command>\dOS+</command> in <application>psql</application>. </para> <sect4 id="collation-managing-predefined-libc"> <title>libc Collations</title> <para> For example, the operating system might provide a locale named <literal>de_DE.utf8</literal>. <command>initdb</command> would then create a collation named <literal>de_DE.utf8</literal> for encoding <literal>UTF8</literal> that has both <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> set to <literal>de_DE.utf8</literal>. It will also create a collation with the <literal>.utf8</literal> tag stripped off the name. So you could also use the collation under the name <literal>de_DE</literal>, which is less cumbersome to write and makes the name less encoding-dependent. Note that, nevertheless, the initial set of collation names is platform-dependent. </para> <para> The default set of collations provided by <literal>libc</literal> map directly to the locales installed in the operating system, which can be listed using the command <literal>locale -a</literal>. In case a <literal>libc</literal> collation is needed that has different values for <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>, or if new locales are installed in the operating system after the database system was initialized, then a new collation may be created using the <xref linkend="sql-createcollation"/> command. New operating system locales can also be imported en masse using the <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link> function. </para> <para> Within any particular database, only collations that use that database's encoding are of interest. Other entries in <literal>pg_collation</literal> are ignored. Thus, a stripped collation name such as <literal>de_DE</literal> can be considered unique within a given database even though it would not be unique globally. Use of the stripped collation names is recommended, since it will make one fewer thing you need to change if you decide to change to another database encoding. Note however that the <literal>default</literal>, <literal>C</literal>, and <literal>POSIX</literal> collations can be used regardless of the database encoding. </para> <para> <productname>PostgreSQL</productname> considers distinct collation objects to be incompatible even when they have identical properties. Thus for example, <programlisting> SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; </programlisting> will draw an error even though the <literal>C</literal> and <literal>POSIX</literal> collations have identical behaviors. Mixing stripped and non-stripped collation names is therefore not recommended. </para> </sect4> <sect4 id="collation-managing-predefined-icu"> <title>ICU Collations</title> <para> With ICU, it is not sensible to enumerate all possible locale names. ICU uses a particular naming system for locales, but there are many more ways to name a locale than there are actually distinct locales. <command>initdb</command> uses the ICU APIs to extract a set of distinct locales to populate the initial set of collations. Collations provided by ICU are created in the SQL environment with names in BCP 47 language tag format, with a <quote>private use</quote> extension <literal>-x-icu</literal> appended, to distinguish them from libc locales. </para> <para> Here are some example collations that might be created: <variablelist> <varlistentry id="collation-managing-predefined-icu-de-x-icu"> <term><literal>de-x-icu</literal></term> <listitem> <para>German collation, default variant</para> </listitem> </varlistentry>

This section explains how to manage predefined collations in PostgreSQL, including inspecting available locales, creating new collations, and understanding the differences between libc and ICU collations. It also discusses how collations are stored in the pg_collation system catalog and how to use them in SQL queries.