Character Set Settings and Limitations

<entry>Windows CP1256</entry> <entry>Arabic</entry> <entry>Yes</entry> <entry>Yes</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>WIN1257</literal></entry> <entry>Windows CP1257</entry> <entry>Baltic</entry> <entry>Yes</entry> <entry>Yes</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>WIN1258</literal></entry> <entry>Windows CP1258</entry> <entry>Vietnamese</entry> <entry>Yes</entry> <entry>Yes</entry> <entry>1</entry> <entry><literal>ABC</literal>, <literal>TCVN</literal>, <literal>TCVN5712</literal>, <literal>VSCII</literal></entry> </row> </tbody> </tgroup> </table> <para> Not all client <acronym>API</acronym>s support all the listed character sets. For example, the <productname>PostgreSQL</productname> JDBC driver does not support <literal>MULE_INTERNAL</literal>, <literal>LATIN6</literal>, <literal>LATIN8</literal>, and <literal>LATIN10</literal>. </para> <para> The <literal>SQL_ASCII</literal> setting behaves considerably differently from the other settings. When the server character set is <literal>SQL_ASCII</literal>, the server interprets byte values 0–127 according to the ASCII standard, while byte values 128–255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is <literal>SQL_ASCII</literal>. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the <literal>SQL_ASCII</literal> setting because <productname>PostgreSQL</productname> will be unable to help you by converting or validating non-ASCII characters. </para> </sect2> <sect2 id="multibyte-setting"> <title>Setting the Character Set</title> <para> <command>initdb</command> defines the default character set (encoding) for a <productname>PostgreSQL</productname> cluster. For example, <screen> initdb -E EUC_JP </screen> sets the default character set to <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You can use <option>--encoding</option> instead of <option>-E</option> if you prefer longer option strings. If no <option>-E</option> or <option>--encoding</option> option is given, <command>initdb</command> attempts to determine the appropriate encoding to use based on the specified or default locale. </para> <para> You can specify a non-default encoding at database creation time, provided that the encoding is compatible with the selected locale: <screen> createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean </screen> This will create a database named <literal>korean</literal> that uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>. Another way to accomplish this is to use this SQL command: <programlisting> CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; </programlisting> Notice that the above commands specify copying the <literal>template0</literal> database. When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data. For more information see <xref linkend="manage-ag-templatedbs"/>. </para> <para> The encoding for a database is stored in the system catalog <literal>pg_database</literal>. You can see it by using the <command>psql</command> <option>-l</option> option or the

The document discusses character set settings in PostgreSQL, including the SQL_ASCII setting, which treats byte values 0-127 as ASCII and 128-255 as uninterpreted characters, and how to set the default character set for a PostgreSQL cluster using initdb and create a database with a non-default encoding using createdb or SQL commands.