Introduction to Logical Replication in PostgreSQL

<chapter id="logical-replication"> <title>Logical Replication</title> <para> Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). We use the term logical in contrast to physical replication, which uses exact block addresses and byte-by-byte replication. PostgreSQL supports both mechanisms concurrently, see <xref linkend="high-availability"/>. Logical replication allows fine-grained control over both data replication and security. </para> <para> Logical replication uses a <firstterm>publish</firstterm> and <firstterm>subscribe</firstterm> model with one or more <firstterm>subscribers</firstterm> subscribing to one or more <firstterm>publications</firstterm> on a <firstterm>publisher</firstterm> node. Subscribers pull data from the publications they subscribe to and may subsequently re-publish data to allow cascading replication or more complex configurations. </para> <para> When logical replication of a table typically starts, PostgreSQL takes a snapshot of the table's data on the publisher database and copies it to the subscriber. Once complete, changes on the publisher since the initial copy are sent continually to the subscriber. The subscriber applies the data in the same order as the publisher so that transactional consistency is guaranteed for publications within a single subscription. This method of data replication is sometimes referred to as transactional replication. </para> <para> The typical use-cases for logical replication are: <itemizedlist> <listitem> <para> Sending incremental changes in a single database or a subset of a database to subscribers as they occur. </para> </listitem> <listitem> <para> Firing triggers for individual changes as they arrive on the subscriber. </para> </listitem> <listitem> <para> Consolidating multiple databases into a single one (for example for analytical purposes). </para> </listitem> <listitem> <para> Replicating between different major versions of PostgreSQL. </para> </listitem> <listitem> <para> Replicating between PostgreSQL instances on different platforms (for example Linux to Windows) </para> </listitem> <listitem> <para> Giving access to replicated data to different groups of users. </para> </listitem> <listitem> <para> Sharing a subset of the database between multiple databases. </para> </listitem> </itemizedlist> </para> <para> The subscriber database behaves in the same way as any other PostgreSQL instance and can be used as a publisher for other databases by defining its own publications. When the subscriber is treated as read-only by application, there will be no conflicts from a single subscription. On the other hand, if there are other writes done either by an application or by other subscribers to the same set of tables, conflicts can arise. </para> <sect1 id="logical-replication-publication"> <title>Publication</title> <para> A <firstterm>publication</firstterm> can be defined on any physical replication primary. The node where a publication is defined is referred to as <firstterm>publisher</firstterm>. A publication is a set of changes generated from a table or a group of tables, and might also be described as a change set or replication set. Each publication exists in only one database. </para> <para> Publications are different from schemas and do not affect how the table is accessed. Each table can be added to multiple publications if needed. Publications may currently only contain tables and all tables in schema. Objects must be added explicitly, except when a publication is created for <literal>ALL TABLES</literal>. </para> <para>

This chapter introduces logical replication in PostgreSQL, a method of replicating data objects and their changes based on their replication identity. It operates on a publish and subscribe model, where subscribers pull data from publications on a publisher node. Logical replication involves taking a snapshot of the table's data on the publisher and copying it to the subscriber, then continuously sending changes. Typical use cases include sending incremental changes, firing triggers, consolidating databases, replicating between different PostgreSQL versions or platforms, providing access to replicated data, and sharing a subset of the database. Publications, which are defined on a physical replication primary (the publisher), represent a set of changes generated from tables. Publications do not affect how the table is accessed.