Continuous Archiving and Failover Procedures

<indexterm> <primary>continuous archiving</primary> <secondary>in standby</secondary> </indexterm> <para> When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set <varname>archive_mode</varname> to <literal>always</literal>, and the standby will call the archive command for every WAL segment it receives, whether it's by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the <varname>archive_command</varname> or <varname>archive_library</varname> must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the <varname>archive_command</varname> or <varname>archive_library</varname>, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time. </para> <para> If <varname>archive_mode</varname> is set to <literal>on</literal>, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL or timeline history files that it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication is enabled. When a server is not in recovery mode, there is no difference between <literal>on</literal> and <literal>always</literal> modes. </para> </sect2> </sect1> <sect1 id="warm-standby-failover"> <title>Failover</title> <para> If the primary server fails then the standby server should begin failover procedures. </para> <para> If the standby server fails then no failover need take place. If the standby server can be restarted, even some time later, then the recovery process can also be restarted immediately, taking advantage of restartable recovery. If the standby server cannot be restarted, then a full new standby server instance should be created. </para> <para> If the primary server fails and the standby server becomes the new primary, and then the old primary restarts, you must have a mechanism for informing the old primary that it is no longer the primary. This is sometimes known as <acronym>STONITH</acronym> (Shoot The Other Node In The Head), which is necessary to avoid situations where both systems think they are the primary, which will lead to confusion and ultimately data loss. </para> <para> Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary. It is also possible to use a third system (called a witness server) to prevent some cases of inappropriate failover, but the additional complexity might not be worthwhile unless it is set up with sufficient care and rigorous testing. </para> <para> <productname>PostgreSQL</productname> does not provide the system software required to identify a failure on the primary and notify the standby database server. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address migration. </para> <para> Once failover to the standby

This section discusses continuous archiving in standby servers, including the use of shared or separate WAL archives, and the considerations for archive command handling to prevent data loss. It also introduces failover procedures, including the roles of primary and standby servers, the importance of preventing split-brain scenarios, and the use of heartbeat mechanisms and witness servers to facilitate smooth failover, highlighting that PostgreSQL does not provide built-in failover software, relying on external tools for this purpose.