Home Explore Blog CI



postgresql

13th chunk of `doc/src/sgml/ref/copy.sgml`
c1ebd8e6483fe1ebf18f37d5566c05450e52be56438fc50b0000000100000fa0
 <literal>CSV</literal>
     lines with white space out to some fixed width. If such a situation
     arises you might need to preprocess the <literal>CSV</literal> file to remove
     the trailing white space, before importing the data into
     <productname>PostgreSQL</productname>.
    </para>
   </note>

   <note>
    <para>
     <literal>CSV</literal> format will both recognize and produce <literal>CSV</literal> files with quoted
     values containing embedded carriage returns and line feeds. Thus
     the files are not strictly one line per table row like text-format
     files.
    </para>
   </note>

   <note>
    <para>
     Many programs produce strange and occasionally perverse <literal>CSV</literal> files,
     so the file format is more a convention than a standard. Thus you
     might encounter some files that cannot be imported using this
     mechanism, and <command>COPY</command> might produce files that other
     programs cannot process.
    </para>
   </note>

  </refsect2>

  <refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
   <title>Binary Format</title>

   <para>
    The <literal>binary</literal> format option causes all data to be
    stored/read as binary format rather than as text.  It is
    somewhat faster than the text and <literal>CSV</literal> formats,
    but a binary-format file is less portable across machine architectures and
    <productname>PostgreSQL</productname> versions.
    Also, the binary format is very data type specific; for example
    it will not work to output binary data from a <type>smallint</type> column
    and read it into an <type>integer</type> column, even though that would work
    fine in text format.
   </para>

   <para>
    The <literal>binary</literal> file format consists
    of a file header, zero or more tuples containing the row data, and
    a file trailer.  Headers and data are in network byte order.
   </para>

   <note>
    <para>
     <productname>PostgreSQL</productname> releases before 7.4 used a
     different binary file format.
    </para>
   </note>

   <refsect3>
    <title>File Header</title>

    <para>
     The file header consists of 15 bytes of fixed fields, followed
     by a variable-length header extension area.  The fixed fields are:

    <variablelist>
     <varlistentry>
      <term>Signature</term>
      <listitem>
       <para>
11-byte sequence <literal>PGCOPY\n\377\r\n\0</literal> &mdash; note that the zero byte
is a required part of the signature.  (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
transfer.  This signature will be changed by end-of-line-translation
filters, dropped zero bytes, dropped high bits, or parity changes.)
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>Flags field</term>
      <listitem>
       <para>
32-bit integer bit mask to denote important aspects of the file format. Bits
are numbered from 0 (<acronym>LSB</acronym>) to 31 (<acronym>MSB</acronym>).  Note that
this field is stored in network byte order (most significant byte first),
as are all the integer fields used in the file format.  Bits
16&ndash;31 are reserved to denote critical file format issues; a reader
should abort if it finds an unexpected bit set in this range. Bits 0&ndash;15
are reserved to signal backwards-compatible format issues; a reader
should simply ignore any unexpected bits set in this range. Currently
only one flag bit is defined, and the rest must be zero:
        <variablelist>
         <varlistentry>
          <term>Bit 16</term>
          <listitem>
           <para>
            If 1, OIDs are included in the data; if 0, not. Oid system columns
            are not supported in <productname>PostgreSQL</productname>
            anymore, but the format still contains the indicator.
           </para>
          </listitem>
         </varlistentry>
        </variablelist></para>
      </listitem>
     </varlistentry>


Title: CSV Quirks and Binary Format for COPY
Summary
This section contains notes on the CSV format, highlighting issues with whitespace padding, carriage returns, line feeds, and the lack of a strict standard. It then transitions to the binary format for COPY, describing it as faster but less portable than text or CSV, and very data-type specific. The binary format consists of a file header, tuples of row data, and a file trailer, all in network byte order. A compatibility note mentions that PostgreSQL releases before 7.4 used a different binary file format. The section details the structure of the file header, including a signature for file identification and a flags field used to denote aspects of the file format, especially the inclusion of OIDs.