Home Explore Blog CI



postgresql

18th chunk of `doc/src/sgml/datatype.sgml`
e36a0447a59e8afb488b5e162ed9b5e3cefe20baa108b0620000000100000fa3
 <entry>1 or 4 bytes plus the actual binary string</entry>
       <entry>variable-length binary string</entry>
      </row>
     </tbody>
    </tgroup>
   </table>

   <para>
    A binary string is a sequence of octets (or bytes).  Binary
    strings are distinguished from character strings in two
    ways.  First, binary strings specifically allow storing
    octets of value zero and other <quote>non-printable</quote>
    octets (usually, octets outside the decimal range 32 to 126).
    Character strings disallow zero octets, and also disallow any
    other octet values and sequences of octet values that are invalid
    according to the database's selected character set encoding.
    Second, operations on binary strings process the actual bytes,
    whereas the processing of character strings depends on locale settings.
    In short, binary strings are appropriate for storing data that the
    programmer thinks of as <quote>raw bytes</quote>, whereas character
    strings are appropriate for storing text.
   </para>

   <para>
    The <type>bytea</type> type supports two
    formats for input and output: <quote>hex</quote> format
    and <productname>PostgreSQL</productname>'s historical
    <quote>escape</quote> format.  Both
    of these are always accepted on input.  The output format depends
    on the configuration parameter <xref linkend="guc-bytea-output"/>;
    the default is hex.  (Note that the hex format was introduced in
    <productname>PostgreSQL</productname> 9.0; earlier versions and some
    tools don't understand it.)
   </para>

   <para>
    The <acronym>SQL</acronym> standard defines a different binary
    string type, called <type>BLOB</type> or <type>BINARY LARGE
    OBJECT</type>.  The input format is different from
    <type>bytea</type>, but the provided functions and operators are
    mostly the same.
   </para>

  <sect2 id="datatype-binary-bytea-hex-format">
   <title><type>bytea</type> Hex Format</title>

   <para>
    The <quote>hex</quote> format encodes binary data as 2 hexadecimal digits
    per byte, most significant nibble first.  The entire string is
    preceded by the sequence <literal>\x</literal> (to distinguish it
    from the escape format).  In some contexts, the initial backslash may
    need to be escaped by doubling it
    (see <xref linkend="sql-syntax-strings"/>).
    For input, the hexadecimal digits can
    be either upper or lower case, and whitespace is permitted between
    digit pairs (but not within a digit pair nor in the starting
    <literal>\x</literal> sequence).
    The hex format is compatible with a wide
    range of external applications and protocols, and it tends to be
    faster to convert than the escape format, so its use is preferred.
   </para>

   <para>
    Example:
<programlisting>
SET bytea_output = 'hex';

SELECT '\xDEADBEEF'::bytea;
   bytea
------------
 \xdeadbeef
</programlisting>
   </para>
  </sect2>

  <sect2 id="datatype-binary-bytea-escape-format">
   <title><type>bytea</type> Escape Format</title>

   <para>
    The <quote>escape</quote> format is the traditional
    <productname>PostgreSQL</productname> format for the <type>bytea</type>
    type.  It
    takes the approach of representing a binary string as a sequence
    of ASCII characters, while converting those bytes that cannot be
    represented as an ASCII character into special escape sequences.
    If, from the point of view of the application, representing bytes
    as characters makes sense, then this representation can be
    convenient.  But in practice it is usually confusing because it
    fuzzes up the distinction between binary strings and character
    strings, and also the particular escape mechanism that was chosen is
    somewhat unwieldy.  Therefore, this format should probably be avoided
    for most new applications.
   </para>

   <para>
    When entering <type>bytea</type> values in escape format,
    octets of certain
    values <emphasis>must</emphasis> be escaped,

Title: Binary Data Type Formats in PostgreSQL
Summary
PostgreSQL's binary data type, bytea, supports two formats for input and output: hex and escape. The hex format encodes binary data as 2 hexadecimal digits per byte, while the escape format represents binary strings as a sequence of ASCII characters. The hex format is preferred due to its compatibility with external applications and faster conversion. The escape format, although traditional, is considered confusing and should be avoided for new applications. Both formats have specific rules for input and output, and the choice of format depends on the configuration parameter bytea_output.