<literal>CSV</literal>
lines with white space out to some fixed width. If such a situation
arises you might need to preprocess the <literal>CSV</literal> file to remove
the trailing white space, before importing the data into
<productname>PostgreSQL</productname>.
</para>
</note>
<note>
<para>
<literal>CSV</literal> format will both recognize and produce <literal>CSV</literal> files with quoted
values containing embedded carriage returns and line feeds. Thus
the files are not strictly one line per table row like text-format
files.
</para>
</note>
<note>
<para>
Many programs produce strange and occasionally perverse <literal>CSV</literal> files,
so the file format is more a convention than a standard. Thus you
might encounter some files that cannot be imported using this
mechanism, and <command>COPY</command> might produce files that other
programs cannot process.
</para>
</note>
</refsect2>
<refsect2 id="sql-copy-binary-format" xreflabel="Binary Format">
<title>Binary Format</title>
<para>
The <literal>binary</literal> format option causes all data to be
stored/read as binary format rather than as text. It is
somewhat faster than the text and <literal>CSV</literal> formats,
but a binary-format file is less portable across machine architectures and
<productname>PostgreSQL</productname> versions.
Also, the binary format is very data type specific; for example
it will not work to output binary data from a <type>smallint</type> column
and read it into an <type>integer</type> column, even though that would work
fine in text format.
</para>
<para>
The <literal>binary</literal> file format consists
of a file header, zero or more tuples containing the row data, and
a file trailer. Headers and data are in network byte order.
</para>
<note>
<para>
<productname>PostgreSQL</productname> releases before 7.4 used a
different binary file format.
</para>
</note>
<refsect3>
<title>File Header</title>
<para>
The file header consists of 15 bytes of fixed fields, followed
by a variable-length header extension area. The fixed fields are:
<variablelist>
<varlistentry>
<term>Signature</term>
<listitem>
<para>
11-byte sequence <literal>PGCOPY\n\377\r\n\0</literal> — note that the zero byte
is a required part of the signature. (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
transfer. This signature will be changed by end-of-line-translation
filters, dropped zero bytes, dropped high bits, or parity changes.)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Flags field</term>
<listitem>
<para>
32-bit integer bit mask to denote important aspects of the file format. Bits
are numbered from 0 (<acronym>LSB</acronym>) to 31 (<acronym>MSB</acronym>). Note that
this field is stored in network byte order (most significant byte first),
as are all the integer fields used in the file format. Bits
16–31 are reserved to denote critical file format issues; a reader
should abort if it finds an unexpected bit set in this range. Bits 0–15
are reserved to signal backwards-compatible format issues; a reader
should simply ignore any unexpected bits set in this range. Currently
only one flag bit is defined, and the rest must be zero:
<variablelist>
<varlistentry>
<term>Bit 16</term>
<listitem>
<para>
If 1, OIDs are included in the data; if 0, not. Oid system columns
are not supported in <productname>PostgreSQL</productname>
anymore, but the format still contains the indicator.
</para>
</listitem>
</varlistentry>
</variablelist></para>
</listitem>
</varlistentry>