Home Explore Blog CI



postgresql

16th chunk of `doc/src/sgml/wal.sgml`
0fe9e7cc4f2854801a392923222f11736eb2d807cf9b97820000000100000f54
 <para>
   <acronym>WAL</acronym> records are appended to the <acronym>WAL</acronym>
   files as each new record is written. The insert position is described by
   a Log Sequence Number (<acronym>LSN</acronym>) that is a byte offset into
   the WAL, increasing monotonically with each new record.
   <acronym>LSN</acronym> values are returned as the datatype
   <link linkend="datatype-pg-lsn"><type>pg_lsn</type></link>. Values can be
   compared to calculate the volume of <acronym>WAL</acronym> data that
   separates them, so they are used to measure the progress of replication
   and recovery.
  </para>

  <para>
   <acronym>WAL</acronym> files are stored in the directory
   <filename>pg_wal</filename> under the data directory, as a set of
   segment files, normally each 16 MB in size (but the size can be changed
   by altering the <option>--wal-segsize</option> <application>initdb</application> option).  Each segment is
   divided into pages, normally 8 kB each (this size can be changed via the
   <option>--with-wal-blocksize</option> configure option).  The WAL record headers
   are described in <filename>access/xlogrecord.h</filename>; the record
   content is dependent on the type of event that is being logged.  Segment
   files are given ever-increasing numbers as names, starting at
   <filename>000000010000000000000001</filename>.  The numbers do not wrap,
   but it will take a very, very long time to exhaust the
   available stock of numbers.
  </para>

  <para>
   It is advantageous if the WAL is located on a different disk from the
   main database files.  This can be achieved by moving the
   <filename>pg_wal</filename> directory to another location (while the server
   is shut down, of course) and creating a symbolic link from the
   original location in the main data directory to the new location.
  </para>

  <para>
   The aim of <acronym>WAL</acronym> is to ensure that the log is
   written before database records are altered, but this can be subverted by
   disk drives<indexterm><primary>disk drive</primary></indexterm> that falsely report a
   successful write to the kernel,
   when in fact they have only cached the data and not yet stored it
   on the disk.  A power failure in such a situation might lead to
   irrecoverable data corruption.  Administrators should try to ensure
   that disks holding <productname>PostgreSQL</productname>'s
   <acronym>WAL</acronym> files do not make such false reports.
   (See <xref linkend="wal-reliability"/>.)
  </para>

  <para>
   After a checkpoint has been made and the WAL flushed, the
   checkpoint's position is saved in the file
   <filename>pg_control</filename>. Therefore, at the start of recovery,
   the server first reads <filename>pg_control</filename> and
   then the checkpoint record; then it performs the REDO operation by
   scanning forward from the WAL location indicated in the checkpoint
   record.  Because the entire content of data pages is saved in the
   WAL on the first page modification after a checkpoint (assuming
   <xref linkend="guc-full-page-writes"/> is not disabled), all pages
   changed since the checkpoint will be restored to a consistent
   state.
  </para>

  <para>
   To deal with the case where <filename>pg_control</filename> is
   corrupt, we should support the possibility of scanning existing WAL
   segments in reverse order &mdash; newest to oldest &mdash; in order to find the
   latest checkpoint.  This has not been implemented yet.
   <filename>pg_control</filename> is small enough (less than one disk page)
   that it is not subject to partial-write problems, and as of this writing
   there have been no reports of database failures due solely to the inability
   to read <filename>pg_control</filename> itself.  So while it is
   theoretically a weak spot, <filename>pg_control</filename> does not
   seem to be a problem in practice.
  </para>
 </sect1>

</chapter>

Title: WAL Files and Storage
Summary
WAL records are stored in segment files in the pg_wal directory, with each segment divided into pages, and it's recommended to store WAL on a separate disk from the main database files to improve performance and reliability, with considerations for disk caching and corruption prevention.