Character Set Support in PostgreSQL

url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink> </para> </listitem> <listitem> <para> <ulink url="https://www.rfc-editor.org/info/bcp47">BCP 47</ulink> </para> </listitem> <listitem> <para> <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink> </para> </listitem> <listitem> <para> <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink> </para> </listitem> <listitem> <para> <ulink url="https://unicode-org.github.io/icu/userguide/collation/"></ulink> </para> </listitem> </itemizedlist> </sect3> </sect2> </sect1> <sect1 id="multibyte"> <title>Character Set Support</title> <indexterm zone="multibyte"><primary>character set</primary></indexterm> <para> The character set support in <productname>PostgreSQL</productname> allows you to store text in a variety of character sets (also called encodings), including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix Code), UTF-8, and Mule internal code. All supported character sets can be used transparently by clients, but a few are not supported for use within the server (that is, as a server-side encoding). The default character set is selected while initializing your <productname>PostgreSQL</productname> database cluster using <command>initdb</command>. It can be overridden when you create a database, so you can have multiple databases each with a different character set. </para> <para> An important restriction, however, is that each database's character set must be compatible with the database's <envar>LC_CTYPE</envar> (character classification) and <envar>LC_COLLATE</envar> (string sort order) locale settings. For <literal>C</literal> or <literal>POSIX</literal> locale, any character set is allowed, but for other libc-provided locales there is only one character set that will work correctly. (On Windows, however, UTF-8 encoding can be used with any locale.) If you have ICU support configured, ICU-provided locales can be used with most but not all server-side encodings. </para> <sect2 id="multibyte-charset-supported"> <title>Supported Character Sets</title> <para> <xref linkend="charset-table"/> shows the character sets available for use in <productname>PostgreSQL</productname>. </para> <table id="charset-table"> <title><productname>PostgreSQL</productname> Character Sets</title> <tgroup cols="7"> <colspec colname="col1" colwidth="3*"/> <colspec colname="col2" colwidth="2*"/> <colspec colname="col3" colwidth="2*"/> <colspec colname="col4" colwidth="1.25*"/> <colspec colname="col5" colwidth="1*"/> <colspec colname="col6" colwidth="1*"/> <colspec colname="col7" colwidth="2*"/> <thead> <row> <entry>Name</entry> <entry>Description</entry> <entry>Language</entry> <entry>Server?</entry> <entry>ICU?</entry>  <entry>Bytes/&zwsp;Char</entry> <entry>Aliases</entry> </row> </thead> <tbody> <row> <entry><literal>BIG5</literal></entry> <entry>Big Five</entry> <entry>Traditional Chinese</entry> <entry>No</entry> <entry>No</entry> <entry>1–2</entry> <entry><literal>WIN950</literal>, <literal>Windows950</literal></entry> </row> <row> <entry><literal>EUC_CN</literal></entry> <entry>Extended UNIX Code-CN</entry>

PostgreSQL supports a variety of character sets, including single-byte and multiple-byte character sets, and allows storing text in different encodings, with the default character set selected during database cluster initialization, and each database's character set must be compatible with its locale settings, with a list of supported character sets available for reference.