Replication Origins and Progress Tracking

<chapter id="replication-origins"> <title>Replication Progress Tracking</title> <indexterm zone="replication-origins"> <primary>Replication Progress Tracking</primary> </indexterm> <indexterm zone="replication-origins"> <primary>Replication Origins</primary> </indexterm> <para> Replication origins are intended to make it easier to implement logical replication solutions on top of <link linkend="logicaldecoding">logical decoding</link>. They provide a solution to two common problems: <itemizedlist> <listitem> <para>How to safely keep track of replication progress</para> </listitem> <listitem> <para>How to change replication behavior based on the origin of a row; for example, to prevent loops in bi-directional replication setups</para> </listitem> </itemizedlist> </para> <para> Replication origins have just two properties, a name and an ID. The name, which is what should be used to refer to the origin across systems, is free-form <type>text</type>. It should be used in a way that makes conflicts between replication origins created by different replication solutions unlikely; e.g., by prefixing the replication solution's name to it. The ID is used only to avoid having to store the long version in situations where space efficiency is important. It should never be shared across systems. </para> <para> Replication origins can be created using the function <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>; dropped using <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>; and seen in the <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link> system catalog. </para> <para> One nontrivial part of building a replication solution is to keep track of replay progress in a safe manner. When the applying process, or the whole cluster, dies, it needs to be possible to find out up to where data has successfully been replicated. Naive solutions to this, such as updating a row in a table for every replayed transaction, have problems like run-time overhead and database bloat. </para>

Replication origins are a solution for tracking replication progress and managing replication behavior in logical replication setups, providing a way to safely keep track of progress and change behavior based on row origin.