Home Explore Blog CI



postgresql

11th chunk of `doc/src/sgml/charset.sgml`
9c741da49df9f0e2513fc6bd3146541b4f002beda47cd9db0000000100000fae
 explicit collation derivation overrides the implicit one.
    Furthermore, given
<programlisting>
SELECT a &lt; b FROM test1;
</programlisting>
    the parser cannot determine which collation to apply, since the
    <structfield>a</structfield> and <structfield>b</structfield> columns have conflicting
    implicit collations.  Since the <literal>&lt;</literal> operator
    does need to know which collation to use, this will result in an
    error.  The error can be resolved by attaching an explicit collation
    specifier to either input expression, thus:
<programlisting>
SELECT a &lt; b COLLATE "de_DE" FROM test1;
</programlisting>
    or equivalently
<programlisting>
SELECT a COLLATE "de_DE" &lt; b FROM test1;
</programlisting>
    On the other hand, the structurally similar case
<programlisting>
SELECT a || b FROM test1;
</programlisting>
    does not result in an error, because the <literal>||</literal> operator
    does not care about collations: its result is the same regardless
    of the collation.
   </para>

   <para>
    The collation assigned to a function or operator's combined input
    expressions is also considered to apply to the function or operator's
    result, if the function or operator delivers a result of a collatable
    data type.  So, in
<programlisting>
SELECT * FROM test1 ORDER BY a || 'foo';
</programlisting>
    the ordering will be done according to <literal>de_DE</literal> rules.
    But this query:
<programlisting>
SELECT * FROM test1 ORDER BY a || b;
</programlisting>
    results in an error, because even though the <literal>||</literal> operator
    doesn't need to know a collation, the <literal>ORDER BY</literal> clause does.
    As before, the conflict can be resolved with an explicit collation
    specifier:
<programlisting>
SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
</programlisting>
   </para>
  </sect2>

  <sect2 id="collation-managing">
   <title>Managing Collations</title>

   <para>
    A collation is an SQL schema object that maps an SQL name to locales
    provided by libraries installed in the operating system.  A collation
    definition has a <firstterm>provider</firstterm> that specifies which
    library supplies the locale data.  One standard provider name
    is <literal>libc</literal>, which uses the locales provided by the
    operating system C library.  These are the locales used by most tools
    provided by the operating system.  Another provider
    is <literal>icu</literal>, which uses the external
    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can only be
    used if support for ICU was configured when PostgreSQL was built.
   </para>

   <para>
    A collation object provided by <literal>libc</literal> maps to a
    combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>
    settings, as accepted by the <literal>setlocale()</literal> system library call.  (As
    the name would suggest, the main purpose of a collation is to set
    <symbol>LC_COLLATE</symbol>, which controls the sort order.  But
    it is rarely necessary in practice to have an
    <symbol>LC_CTYPE</symbol> setting that is different from
    <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
    these under one concept than to create another infrastructure for
    setting <symbol>LC_CTYPE</symbol> per expression.)  Also,
    a <literal>libc</literal> collation
    is tied to a character set encoding (see <xref linkend="multibyte"/>).
    The same collation name may exist for different encodings.
   </para>

   <para>
    A collation object provided by <literal>icu</literal> maps to a named
    collator provided by the ICU library.  ICU does not support
    separate <quote>collate</quote> and <quote>ctype</quote> settings, so
    they are always the same.  Also, ICU collations are independent of the
    encoding, so there is always only one ICU collation of a given name in
    a database.
   </para>

   <sect3 id="collation-managing-standard">

Title: Collation Resolution and Management
Summary
This section discusses how collations are resolved and managed in PostgreSQL, including how explicit collations override implicit ones, and how collations are applied to function and operator results. It also introduces the concept of managing collations, including the role of providers such as libc and ICU, and how collations are tied to character set encodings and locales.