Home Explore Blog CI



postgresql

24th chunk of `doc/src/sgml/charset.sgml`
2da5e5e673ef4a1c4e09160ee621a81fb5a67d6410a0e33a0000000100000fa0
 url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
      </para>
     </listitem>
     <listitem>
      <para>
       <ulink url="https://www.rfc-editor.org/info/bcp47">BCP 47</ulink>
      </para>
     </listitem>
     <listitem>
      <para>
       <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
      </para>
     </listitem>
     <listitem>
      <para>
       <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
      </para>
     </listitem>
     <listitem>
      <para>
       <ulink url="https://unicode-org.github.io/icu/userguide/collation/"></ulink>
      </para>
     </listitem>
    </itemizedlist>
   </sect3>
  </sect2>
 </sect1>

 <sect1 id="multibyte">
  <title>Character Set Support</title>

  <indexterm zone="multibyte"><primary>character set</primary></indexterm>

  <para>
   The character set support in <productname>PostgreSQL</productname>
   allows you to store text in a variety of character sets (also called
   encodings), including
   single-byte character sets such as the ISO 8859 series and
   multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix
   Code), UTF-8, and Mule internal code.  All supported character sets
   can be used transparently by clients, but a few are not supported
   for use within the server (that is, as a server-side encoding).
   The default character set is selected while
   initializing your <productname>PostgreSQL</productname> database
   cluster using <command>initdb</command>.  It can be overridden when you
   create a database, so you can have multiple
   databases each with a different character set.
  </para>

  <para>
   An important restriction, however, is that each database's character set
   must be compatible with the database's <envar>LC_CTYPE</envar> (character
   classification) and <envar>LC_COLLATE</envar> (string sort order) locale
   settings. For <literal>C</literal> or
   <literal>POSIX</literal> locale, any character set is allowed, but for other
   libc-provided locales there is only one character set that will work
   correctly.
   (On Windows, however, UTF-8 encoding can be used with any locale.)
   If you have ICU support configured, ICU-provided locales can be used
   with most but not all server-side encodings.
  </para>

   <sect2 id="multibyte-charset-supported">
    <title>Supported Character Sets</title>

    <para>
     <xref linkend="charset-table"/> shows the character sets available
     for use in <productname>PostgreSQL</productname>.
    </para>

     <table id="charset-table">
      <title><productname>PostgreSQL</productname> Character Sets</title>
      <tgroup cols="7">
       <colspec colname="col1" colwidth="3*"/>
       <colspec colname="col2" colwidth="2*"/>
       <colspec colname="col3" colwidth="2*"/>
       <colspec colname="col4" colwidth="1.25*"/>
       <colspec colname="col5" colwidth="1*"/>
       <colspec colname="col6" colwidth="1*"/>
       <colspec colname="col7" colwidth="2*"/>
       <thead>
        <row>
         <entry>Name</entry>
         <entry>Description</entry>
         <entry>Language</entry>
         <entry>Server?</entry>
         <entry>ICU?</entry>
         <!--
          The Bytes/Char field is populated by looking at the values returned
          by pg_wchar_table.mblen function for each encoding.
         -->
         <entry>Bytes/&zwsp;Char</entry>
         <entry>Aliases</entry>
        </row>
       </thead>
       <tbody>
        <row>
         <entry><literal>BIG5</literal></entry>
         <entry>Big Five</entry>
         <entry>Traditional Chinese</entry>
         <entry>No</entry>
         <entry>No</entry>
         <entry>1&ndash;2</entry>
         <entry><literal>WIN950</literal>, <literal>Windows950</literal></entry>
        </row>
        <row>
         <entry><literal>EUC_CN</literal></entry>
         <entry>Extended UNIX Code-CN</entry>

Title: Character Set Support in PostgreSQL
Summary
PostgreSQL supports a variety of character sets, including single-byte and multiple-byte character sets, and allows storing text in different encodings, with the default character set selected during database cluster initialization, and each database's character set must be compatible with its locale settings, with a list of supported character sets available for reference.