WAL Management and Restartpoints in PostgreSQL

system recycles enough WAL files to cover the estimated need until the next checkpoint, and removes the rest. The estimate is based on a moving average of the number of WAL files used in previous checkpoint cycles. The moving average is increased immediately if the actual usage exceeds the estimate, so it accommodates peak usage rather than average usage to some extent. <varname>min_wal_size</varname> puts a minimum on the amount of WAL files recycled for future usage; that much WAL is always recycled for future use, even if the system is idle and the WAL usage estimate suggests that little WAL is needed. </para> <para> Independently of <varname>max_wal_size</varname>, the most recent <xref linkend="guc-wal-keep-size"/> megabytes of WAL files plus one additional WAL file are kept at all times. Also, if WAL archiving is used, old segments cannot be removed or recycled until they are archived. If WAL archiving cannot keep up with the pace that WAL is generated, or if <varname>archive_command</varname> or <varname>archive_library</varname> fails repeatedly, old WAL files will accumulate in <filename>pg_wal</filename> until the situation is resolved. A slow or failed standby server that uses a replication slot will have the same effect (see <xref linkend="streaming-replication-slots"/>). Similarly, if <link linkend="runtime-config-wal-summarization"> WAL summarization</link> is enabled, old segments are kept until they are summarized. </para> <para> In archive recovery or standby mode, the server periodically performs <firstterm>restartpoints</firstterm>,<indexterm><primary>restartpoint</primary></indexterm> which are similar to checkpoints in normal operation: the server forces all its state to disk, updates the <filename>pg_control</filename> file to indicate that the already-processed WAL data need not be scanned again, and then recycles any old WAL segment files in the <filename>pg_wal</filename> directory. Restartpoints can't be performed more frequently than checkpoints on the primary because restartpoints can only be performed at checkpoint records. A restartpoint can be demanded by a schedule or by an external request. The <structfield>restartpoints_timed</structfield> counter in the <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link> view counts the first ones while the <structfield>restartpoints_req</structfield> the second. A restartpoint is triggered by schedule when a checkpoint record is reached if at least <xref linkend="guc-checkpoint-timeout"/> seconds have passed since the last performed restartpoint or when the previous attempt to perform the restartpoint has failed. In the last case, the next restartpoint will be scheduled in 15 seconds. A restartpoint is triggered by request due to similar reasons like checkpoint but mostly if WAL size is about to exceed <xref linkend="guc-max-wal-size"/> However, because of limitations on when a restartpoint can be performed, <varname>max_wal_size</varname> is often exceeded during recovery, by up to one checkpoint cycle's worth of WAL. (<varname>max_wal_size</varname> is never a hard limit anyway, so you should always leave plenty of headroom to avoid running out of disk space.) The <structfield>restartpoints_done</structfield> counter in the <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link> view counts the restartpoints that have really been performed. </para> <para> In some cases, when the WAL size on the primary increases quickly, for instance during massive <command>INSERT</command>, the <structfield>restartpoints_req</structfield> counter on the standby may demonstrate a peak growth. This occurs because requests to create a new restartpoint due to increased WAL consumption cannot

PostgreSQL manages WAL files by recycling them based on estimated need, with parameters like min_wal_size and wal_keep_size controlling the process, and restartpoints are used in archive recovery or standby mode to force state to disk and update the pg_control file, with counters in pg_stat_checkpointer tracking restartpoint activity.