Home Explore Blog CI



postgresql

12th chunk of `doc/src/sgml/charset.sgml`
23b3ca4e7d688182b3409dc44844098d2abf5117cb816f7a0000000100000fa1
 <symbol>LC_COLLATE</symbol>, which controls the sort order.  But
    it is rarely necessary in practice to have an
    <symbol>LC_CTYPE</symbol> setting that is different from
    <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
    these under one concept than to create another infrastructure for
    setting <symbol>LC_CTYPE</symbol> per expression.)  Also,
    a <literal>libc</literal> collation
    is tied to a character set encoding (see <xref linkend="multibyte"/>).
    The same collation name may exist for different encodings.
   </para>

   <para>
    A collation object provided by <literal>icu</literal> maps to a named
    collator provided by the ICU library.  ICU does not support
    separate <quote>collate</quote> and <quote>ctype</quote> settings, so
    they are always the same.  Also, ICU collations are independent of the
    encoding, so there is always only one ICU collation of a given name in
    a database.
   </para>

   <sect3 id="collation-managing-standard">
    <title>Standard Collations</title>

   <para>
    On all platforms, the following collations are supported:

    <variablelist>
     <varlistentry>
      <term><literal>unicode</literal></term>
      <listitem>
       <para>
        This SQL standard collation sorts using the Unicode Collation
        Algorithm with the Default Unicode Collation Element Table.  It is
        available in all encodings.  ICU support is required to use this
        collation, and behavior may change if <productname>PostgreSQL</productname> is built with a
        different version of ICU.  (This collation has the same behavior as
        the ICU root locale; see <xref
        linkend="collation-managing-predefined-icu-und-x-icu"/>.)
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term><literal>ucs_basic</literal></term>
      <listitem>
       <para>
        This SQL standard collation sorts using the Unicode code point values
        rather than natural language order, and only the ASCII letters
        <quote><literal>A</literal></quote> through
        <quote><literal>Z</literal></quote> are treated as letters.  The
        behavior is efficient and stable across all versions.  Only available
        for encoding <literal>UTF8</literal>.  (This collation has the same
        behavior as the libc locale specification <literal>C</literal> in
        <literal>UTF8</literal> encoding.)
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term><literal>pg_unicode_fast</literal></term>
      <listitem>
       <para>
        This collation sorts by Unicode code point values rather than natural
        language order.  For the functions <function>lower</function>,
        <function>initcap</function>, and <function>upper</function> it uses
        Unicode full case mapping. For pattern matching (including regular
        expressions), it uses the Standard variant of Unicode <ulink
        url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility
        Properties</ulink>.  Behavior is efficient and stable within a
        <productname>Postgres</productname> major version.  It is only
        available for encoding <literal>UTF8</literal>.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term><literal>pg_c_utf8</literal></term>
      <listitem>
       <para>
        This collation sorts by Unicode code point values rather than natural
        language order.  For the functions <function>lower</function>,
        <function>initcap</function>, and <function>upper</function>, it uses
        Unicode simple case mapping.  For pattern matching (including regular
        expressions), it uses the POSIX Compatible variant of Unicode <ulink
        url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility
        Properties</ulink>.  Behavior is efficient and stable within a
        <productname>PostgreSQL</productname>

Title: Collation Management and Standard Collations
Summary
This section discusses the management of collations, including the difference between libc and ICU collations, and their relationship to character set encodings. It also describes several standard collations, including unicode, ucs_basic, pg_unicode_fast, and pg_c_utf8, each with its own sorting behavior and compatibility characteristics, and available in specific encodings such as UTF8.