Home Explore Blog CI



postgresql

5th chunk of `doc/src/sgml/storage.sgml`
a346fe6d35a6cacc9944a6f30ad178d44c88d3ddff7a700b0000000100000fa3

<productname>PostgreSQL</productname> uses a fixed page size (commonly
8 kB), and does not allow tuples to span multiple pages.  Therefore, it is
not possible to store very large field values directly.  To overcome
this limitation, large field values are compressed and/or broken up into
multiple physical rows.  This happens transparently to the user, with only
small impact on most of the backend code.  The technique is affectionately
known as <acronym>TOAST</acronym> (or <quote>the best thing since sliced bread</quote>).
The <acronym>TOAST</acronym> infrastructure is also used to improve handling of
large data values in-memory.
</para>

<para>
Only certain data types support <acronym>TOAST</acronym> &mdash; there is no need to
impose the overhead on data types that cannot produce large field values.
To support <acronym>TOAST</acronym>, a data type must have a variable-length
(<firstterm>varlena</firstterm>) representation, in which, ordinarily, the first
four-byte word of any stored value contains the total length of the value in
bytes (including itself).  <acronym>TOAST</acronym> does not constrain the rest
of the data type's representation.  The special representations collectively
called <firstterm><acronym>TOAST</acronym>ed values</firstterm> work by modifying or
reinterpreting this initial length word.  Therefore, the C-level functions
supporting a <acronym>TOAST</acronym>-able data type must be careful about how they
handle potentially <acronym>TOAST</acronym>ed input values: an input might not
actually consist of a four-byte length word and contents until after it's
been <firstterm>detoasted</firstterm>.  (This is normally done by invoking
<function>PG_DETOAST_DATUM</function> before doing anything with an input value,
but in some cases more efficient approaches are possible.
See <xref linkend="xtypes-toast"/> for more detail.)
</para>

<para>
<acronym>TOAST</acronym> usurps two bits of the varlena length word (the high-order
bits on big-endian machines, the low-order bits on little-endian machines),
thereby limiting the logical size of any value of a <acronym>TOAST</acronym>-able
data type to 1 GB (2<superscript>30</superscript> - 1 bytes).  When both bits are zero,
the value is an ordinary un-<acronym>TOAST</acronym>ed value of the data type, and
the remaining bits of the length word give the total datum size (including
length word) in bytes.  When the highest-order or lowest-order bit is set,
the value has only a single-byte header instead of the normal four-byte
header, and the remaining bits of that byte give the total datum size
(including length byte) in bytes.  This alternative supports space-efficient
storage of values shorter than 127 bytes, while still allowing the data type
to grow to 1 GB at need.  Values with single-byte headers aren't aligned on
any particular boundary, whereas values with four-byte headers are aligned on
at least a four-byte boundary; this omission of alignment padding provides
additional space savings that is significant compared to short values.
As a special case, if the remaining bits of a single-byte header are all
zero (which would be impossible for a self-inclusive length), the value is
a pointer to out-of-line data, with several possible alternatives as
described below.  The type and size of such a <firstterm>TOAST pointer</firstterm>
are determined by a code stored in the second byte of the datum.
Lastly, when the highest-order or lowest-order bit is clear but the adjacent
bit is set, the content of the datum has been compressed and must be
decompressed before use.  In this case the remaining bits of the four-byte
length word give the total size of the compressed datum, not the
original data.  Note that compression is also possible for out-of-line data
but the varlena header does not tell whether it has occurred &mdash;
the content of the <acronym>TOAST</acronym> pointer tells that, instead.
</para>

<para>
The compression technique used for either in-line or out-of-line compressed

Title: TOAST Details: Data Types, Representations, and Compression
Summary
This section delves into the specifics of TOAST, explaining that only certain data types with variable-length (varlena) representations support it. It describes how TOAST modifies the initial length word of stored values, using two bits to indicate whether the value is un-TOASTed, has a single-byte header for space efficiency, or is a pointer to out-of-line data. It also covers the use of compression, noting that the varlena header indicates whether in-line data is compressed, while the TOAST pointer indicates compression for out-of-line data. The logical size of a TOAST-able data type is limited to 1 GB due to the length word modification.