url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
</para>
</listitem>
<listitem>
<para>
<ulink url="https://www.rfc-editor.org/info/bcp47">BCP 47</ulink>
</para>
</listitem>
<listitem>
<para>
<ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
</para>
</listitem>
<listitem>
<para>
<ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
</para>
</listitem>
<listitem>
<para>
<ulink url="https://unicode-org.github.io/icu/userguide/collation/"></ulink>
</para>
</listitem>
</itemizedlist>
</sect3>
</sect2>
</sect1>
<sect1 id="multibyte">
<title>Character Set Support</title>
<indexterm zone="multibyte"><primary>character set</primary></indexterm>
<para>
The character set support in <productname>PostgreSQL</productname>
allows you to store text in a variety of character sets (also called
encodings), including
single-byte character sets such as the ISO 8859 series and
multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix
Code), UTF-8, and Mule internal code. All supported character sets
can be used transparently by clients, but a few are not supported
for use within the server (that is, as a server-side encoding).
The default character set is selected while
initializing your <productname>PostgreSQL</productname> database
cluster using <command>initdb</command>. It can be overridden when you
create a database, so you can have multiple
databases each with a different character set.
</para>
<para>
An important restriction, however, is that each database's character set
must be compatible with the database's <envar>LC_CTYPE</envar> (character
classification) and <envar>LC_COLLATE</envar> (string sort order) locale
settings. For <literal>C</literal> or
<literal>POSIX</literal> locale, any character set is allowed, but for other
libc-provided locales there is only one character set that will work
correctly.
(On Windows, however, UTF-8 encoding can be used with any locale.)
If you have ICU support configured, ICU-provided locales can be used
with most but not all server-side encodings.
</para>
<sect2 id="multibyte-charset-supported">
<title>Supported Character Sets</title>
<para>
<xref linkend="charset-table"/> shows the character sets available
for use in <productname>PostgreSQL</productname>.
</para>
<table id="charset-table">
<title><productname>PostgreSQL</productname> Character Sets</title>
<tgroup cols="7">
<colspec colname="col1" colwidth="3*"/>
<colspec colname="col2" colwidth="2*"/>
<colspec colname="col3" colwidth="2*"/>
<colspec colname="col4" colwidth="1.25*"/>
<colspec colname="col5" colwidth="1*"/>
<colspec colname="col6" colwidth="1*"/>
<colspec colname="col7" colwidth="2*"/>
<thead>
<row>
<entry>Name</entry>
<entry>Description</entry>
<entry>Language</entry>
<entry>Server?</entry>
<entry>ICU?</entry>
<!--
The Bytes/Char field is populated by looking at the values returned
by pg_wchar_table.mblen function for each encoding.
-->
<entry>Bytes/&zwsp;Char</entry>
<entry>Aliases</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>BIG5</literal></entry>
<entry>Big Five</entry>
<entry>Traditional Chinese</entry>
<entry>No</entry>
<entry>No</entry>
<entry>1–2</entry>
<entry><literal>WIN950</literal>, <literal>Windows950</literal></entry>
</row>
<row>
<entry><literal>EUC_CN</literal></entry>
<entry>Extended UNIX Code-CN</entry>