COPY Text Format: Backslash Sequences, Newlines, and CSV Format Introduction

<entry>Backslash <literal>x</literal> followed by one or two hex digits specifies the byte with that numeric code</entry> </row> </tbody> </tgroup> </informaltable> Presently, <command>COPY TO</command> will never emit an octal or hex-digits backslash sequence, but it does use the other sequences listed above for those control characters. </para> <para> Any other backslashed character that is not mentioned in the above table will be taken to represent itself. However, beware of adding backslashes unnecessarily, since that might accidentally produce a string matching the end-of-data marker (<literal>\.</literal>) or the null string (<literal>\N</literal> by default). These strings will be recognized before any other backslash processing is done. </para> <para> It is strongly recommended that applications generating <command>COPY</command> data convert data newlines and carriage returns to the <literal>\n</literal> and <literal>\r</literal> sequences respectively. At present it is possible to represent a data carriage return by a backslash and carriage return, and to represent a data newline by a backslash and newline. However, these representations might not be accepted in future releases. They are also highly vulnerable to corruption if the <command>COPY</command> file is transferred across different machines (for example, from Unix to Windows or vice versa). </para> <para> All backslash sequences are interpreted after encoding conversion. The bytes specified with the octal and hex-digit backslash sequences must form valid characters in the database encoding. </para> <para> <command>COPY TO</command> will terminate each row with a Unix-style newline (<quote><literal>\n</literal></quote>). Servers running on Microsoft Windows instead output carriage return/newline (<quote><literal>\r\n</literal></quote>), but only for <command>COPY</command> to a server file; for consistency across platforms, <command>COPY TO STDOUT</command> always sends <quote><literal>\n</literal></quote> regardless of server platform. <command>COPY FROM</command> can handle lines ending with newlines, carriage returns, or carriage return/newlines. To reduce the risk of error due to un-backslashed newlines or carriage returns that were meant as data, <command>COPY FROM</command> will complain if the line endings in the input are not all alike. </para> </refsect2> <refsect2 id="sql-copy-csv-format" xreflabel="CSV Format"> <title>CSV Format</title> <para> This format option is used for importing and exporting the Comma- Separated Value (<literal>CSV</literal>) file format used by many other programs, such as spreadsheets. Instead of the escaping rules used by <productname>PostgreSQL</productname>'s standard text format, it produces and recognizes the common <literal>CSV</literal> escaping mechanism. </para> <para> The values in each record are separated by the <literal>DELIMITER</literal> character. If the value contains the delimiter character, the <literal>QUOTE</literal> character, the <literal>NULL</literal> string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the <literal>QUOTE</literal> character, and any occurrence within the value of a <literal>QUOTE</literal> character or the <literal>ESCAPE</literal> character is preceded by the escape character. You can also use <literal>FORCE_QUOTE</literal> to force quotes when outputting non-<literal>NULL</literal> values in specific columns. </para> <para> The <literal>CSV</literal> format has no standard way to distinguish a <literal>NULL</literal> value from an empty string. <productname>PostgreSQL</productname>'s <command>COPY</command> handles this by quoting. A <literal>NULL</literal>

This section continues the discussion on the text format for the COPY command, clarifying that unlisted backslashed characters represent themselves, but advises caution to avoid accidental end-of-data markers or null strings. It emphasizes converting newlines and carriage returns to \n and \r and warns against relying on direct backslash representations due to potential corruption across different operating systems. It notes that backslash sequences are interpreted post-encoding conversion, with octal and hex byte codes required to form valid characters. COPY TO uses Unix-style newlines, while COPY FROM handles various line endings but complains if they aren't uniform. The section then introduces the CSV format for compatibility with other programs like spreadsheets, noting its distinct escaping mechanism where values are enclosed in quote characters if they contain delimiters, quote characters, null strings, carriage returns, or line feeds.