pg_rewind: Synchronize PostgreSQL Data Directories

<refentry id="app-pgrewind"> <indexterm zone="app-pgrewind"> <primary>pg_rewind</primary> </indexterm> <refmeta> <refentrytitle><application>pg_rewind</application></refentrytitle> <manvolnum>1</manvolnum> <refmiscinfo>Application</refmiscinfo> </refmeta> <refnamediv> <refname>pg_rewind</refname> <refpurpose>synchronize a <productname>PostgreSQL</productname> data directory with another data directory that was forked from it</refpurpose> </refnamediv> <refsynopsisdiv> <cmdsynopsis> <command>pg_rewind</command> <arg rep="repeat"><replaceable>option</replaceable></arg> <group choice="plain"> <group choice="req"> <arg choice="plain"><option>-D</option></arg> <arg choice="plain"><option>--target-pgdata</option></arg> </group> <replaceable> directory</replaceable> <group choice="req"> <arg choice="plain"><option>--source-pgdata=<replaceable>directory</replaceable></option></arg> <arg choice="plain"><option>--source-server=<replaceable>connstr</replaceable></option></arg> </group> </group> </cmdsynopsis> </refsynopsisdiv> <refsect1> <title>Description</title> <para> <application>pg_rewind</application> is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster, after the clusters' timelines have diverged. A typical scenario is to bring an old primary server back online after failover as a standby that follows the new primary. </para> <para> After a successful rewind, the state of the target data directory is analogous to a base backup of the source data directory. Unlike taking a new base backup or using a tool like <application>rsync</application>, <application>pg_rewind</application> does not require comparing or copying unchanged relation blocks in the cluster. Only changed blocks from existing relation files are copied; all other files, including new relation files, configuration files, and WAL segments, are copied in full. As such the rewind operation is significantly faster than other approaches when the database is large and only a small fraction of blocks differ between the clusters. </para> <para> <application>pg_rewind</application> examines the timeline histories of the source and target clusters to determine the point where they diverged, and expects to find WAL in the target cluster's <filename>pg_wal</filename> directory reaching all the way back to the point of divergence. The point of divergence can be found either on the target timeline, the source timeline, or their common ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the target cluster ran for a long time after the divergence, its old WAL files might no longer be present. In this case, you can manually copy them from the WAL archive to the <filename>pg_wal</filename> directory, or run <application>pg_rewind</application> with the <literal>-c</literal> option to automatically retrieve them from the WAL archive. The use of <application>pg_rewind</application> is not limited to failover, e.g., a standby server can be promoted, run some write transactions, and then rewound to become a standby again. </para> <para> After running <application>pg_rewind</application>, WAL replay needs to complete for the data directory to be in a consistent state. When the target server is started again it will enter archive recovery and replay all WAL generated in the source server from the last checkpoint before the point of divergence. If some of the WAL was no longer available in the source server when <application>pg_rewind</application> was run, and therefore could not be copied by the <application>pg_rewind</application> session, it must be made available when the target server is started.

pg_rewind synchronizes a PostgreSQL data directory with another that was forked from it, typically used to bring an old primary server back online as a standby after a failover. It copies only the changed blocks and other necessary files, making it faster than a base backup or rsync. pg_rewind examines timeline histories and requires WAL to be available up to the point of divergence. After running pg_rewind, WAL replay needs to complete for the data directory to be in a consistent state.