Home Explore Blog CI



postgresql

29th chunk of `doc/src/sgml/charset.sgml`
aad54a15fad97cafdf6535c4cf504b8b5ed27c8b3b832fbd0000000100000fa0
 <entry>Windows CP1256</entry>
         <entry>Arabic</entry>
         <entry>Yes</entry>
         <entry>Yes</entry>
         <entry>1</entry>
         <entry></entry>
        </row>
        <row>
         <entry><literal>WIN1257</literal></entry>
         <entry>Windows CP1257</entry>
         <entry>Baltic</entry>
         <entry>Yes</entry>
         <entry>Yes</entry>
         <entry>1</entry>
         <entry></entry>
        </row>
        <row>
         <entry><literal>WIN1258</literal></entry>
         <entry>Windows CP1258</entry>
         <entry>Vietnamese</entry>
         <entry>Yes</entry>
         <entry>Yes</entry>
         <entry>1</entry>
         <entry><literal>ABC</literal>, <literal>TCVN</literal>, <literal>TCVN5712</literal>, <literal>VSCII</literal></entry>
        </row>
       </tbody>
      </tgroup>
     </table>

     <para>
      Not all client <acronym>API</acronym>s support all the listed character sets. For example, the
      <productname>PostgreSQL</productname>
      JDBC driver does not support <literal>MULE_INTERNAL</literal>, <literal>LATIN6</literal>,
      <literal>LATIN8</literal>, and <literal>LATIN10</literal>.
     </para>

     <para>
      The <literal>SQL_ASCII</literal> setting behaves considerably differently
      from the other settings.  When the server character set is
      <literal>SQL_ASCII</literal>, the server interprets byte values 0&ndash;127
      according to the ASCII standard, while byte values 128&ndash;255 are taken
      as uninterpreted characters.  No encoding conversion will be done when
      the setting is <literal>SQL_ASCII</literal>.  Thus, this setting is not so
      much a declaration that a specific encoding is in use, as a declaration
      of ignorance about the encoding.  In most cases, if you are
      working with any non-ASCII data, it is unwise to use the
      <literal>SQL_ASCII</literal> setting because
      <productname>PostgreSQL</productname> will be unable to help you by
      converting or validating non-ASCII characters.
     </para>
    </sect2>

   <sect2 id="multibyte-setting">
    <title>Setting the Character Set</title>

    <para>
     <command>initdb</command> defines the default character set (encoding)
     for a <productname>PostgreSQL</productname> cluster. For example,

<screen>
initdb -E EUC_JP
</screen>

     sets the default character set to
     <literal>EUC_JP</literal> (Extended Unix Code for Japanese).  You
     can use <option>--encoding</option> instead of
     <option>-E</option> if you prefer longer option strings.
     If no <option>-E</option> or <option>--encoding</option> option is
     given, <command>initdb</command> attempts to determine the appropriate
     encoding to use based on the specified or default locale.
    </para>

    <para>
     You can specify a non-default encoding at database creation time,
     provided that the encoding is compatible with the selected locale:

<screen>
createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
</screen>

     This will create a database named <literal>korean</literal> that
     uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
     Another way to accomplish this is to use this SQL command:

<programlisting>
CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
</programlisting>

     Notice that the above commands specify copying the <literal>template0</literal>
     database.  When copying any other database, the encoding and locale
     settings cannot be changed from those of the source database, because
     that might result in corrupt data.  For more information see
     <xref linkend="manage-ag-templatedbs"/>.
    </para>

    <para>
     The encoding for a database is stored in the system catalog
     <literal>pg_database</literal>.  You can see it by using the
     <command>psql</command> <option>-l</option> option or the
 

Title: Character Set Settings and Limitations
Summary
The document discusses character set settings in PostgreSQL, including the SQL_ASCII setting, which treats byte values 0-127 as ASCII and 128-255 as uninterpreted characters, and how to set the default character set for a PostgreSQL cluster using initdb and create a database with a non-default encoding using createdb or SQL commands.