Collation Resolution and Management

explicit collation derivation overrides the implicit one. Furthermore, given <programlisting> SELECT a < b FROM test1; </programlisting> the parser cannot determine which collation to apply, since the <structfield>a</structfield> and <structfield>b</structfield> columns have conflicting implicit collations. Since the <literal><</literal> operator does need to know which collation to use, this will result in an error. The error can be resolved by attaching an explicit collation specifier to either input expression, thus: <programlisting> SELECT a < b COLLATE "de_DE" FROM test1; </programlisting> or equivalently <programlisting> SELECT a COLLATE "de_DE" < b FROM test1; </programlisting> On the other hand, the structurally similar case <programlisting> SELECT a || b FROM test1; </programlisting> does not result in an error, because the <literal>||</literal> operator does not care about collations: its result is the same regardless of the collation. </para> <para> The collation assigned to a function or operator's combined input expressions is also considered to apply to the function or operator's result, if the function or operator delivers a result of a collatable data type. So, in <programlisting> SELECT * FROM test1 ORDER BY a || 'foo'; </programlisting> the ordering will be done according to <literal>de_DE</literal> rules. But this query: <programlisting> SELECT * FROM test1 ORDER BY a || b; </programlisting> results in an error, because even though the <literal>||</literal> operator doesn't need to know a collation, the <literal>ORDER BY</literal> clause does. As before, the conflict can be resolved with an explicit collation specifier: <programlisting> SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; </programlisting> </para> </sect2> <sect2 id="collation-managing"> <title>Managing Collations</title> <para> A collation is an SQL schema object that maps an SQL name to locales provided by libraries installed in the operating system. A collation definition has a <firstterm>provider</firstterm> that specifies which library supplies the locale data. One standard provider name is <literal>libc</literal>, which uses the locales provided by the operating system C library. These are the locales used by most tools provided by the operating system. Another provider is <literal>icu</literal>, which uses the external ICU<indexterm><primary>ICU</primary></indexterm> library. ICU locales can only be used if support for ICU was configured when PostgreSQL was built. </para> <para> A collation object provided by <literal>libc</literal> maps to a combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings, as accepted by the <literal>setlocale()</literal> system library call. (As the name would suggest, the main purpose of a collation is to set <symbol>LC_COLLATE</symbol>, which controls the sort order. But it is rarely necessary in practice to have an <symbol>LC_CTYPE</symbol> setting that is different from <symbol>LC_COLLATE</symbol>, so it is more convenient to collect these under one concept than to create another infrastructure for setting <symbol>LC_CTYPE</symbol> per expression.) Also, a <literal>libc</literal> collation is tied to a character set encoding (see <xref linkend="multibyte"/>). The same collation name may exist for different encodings. </para> <para> A collation object provided by <literal>icu</literal> maps to a named collator provided by the ICU library. ICU does not support separate <quote>collate</quote> and <quote>ctype</quote> settings, so they are always the same. Also, ICU collations are independent of the encoding, so there is always only one ICU collation of a given name in a database. </para> <sect3 id="collation-managing-standard">

This section discusses how collations are resolved and managed in PostgreSQL, including how explicit collations override implicit ones, and how collations are applied to function and operator results. It also introduces the concept of managing collations, including the role of providers such as libc and ICU, and how collations are tied to character set encodings and locales.