Logical replication is an advanced feature in PostgreSQL that allows the replication of database changes at a logical level, meaning it replicates the changes made to data rather than replicating the physical data itself.
Introduced in PostgreSQL 10, logical replication offers a flexible and efficient way to manage data across different PostgreSQL instances.
This contrasts with physical replication, which creates an exact copy of the primary server’s data, typically used for failover and disaster recovery.
In this article, we will explore what logical replication is, how it works, its advantages, key considerations, and the new features introduced in PostgreSQL 16.
Get Your Free Linux training!
Join our free Linux training and discover the power of open-source technology. Enhance your skills and boost your career! Learn Linux for Free!Table of Contents
What is Logical Replication in Postgresql?
Logical replication works by streaming changes at the logical level, enabling replication of a subset of data, such as specific tables, rather than the entire database or cluster. This provides a much more granular control over what data is replicated.
Logical replication allows replication between different PostgreSQL versions and platforms, and can even be used to implement more complex replication topologies, such as multi-master replication, without needing to replicate the entire database.
The main advantage of logical replication over physical replication lies in its flexibility. It provides a way to control replication on a per-table or per-database basis, enabling replication across different versions of PostgreSQL. It also allows fine-grained control of the data that gets replicated.
Advantages of Logical Replication
Logical replication offers several advantages compared to physical replication:
- Selective Replication: Logical replication allows you to replicate only specific databases or tables rather than the entire database cluster. This makes it highly efficient when you need to replicate just a subset of your data for a particular use case, such as replicating specific tables for reporting or analytics.
- Cross-Version Compatibility: Since logical replication operates at the logical level (i.e., changes are recorded in terms of SQL operations), it can work across different versions of PostgreSQL. This makes it an ideal solution for upgrading PostgreSQL with minimal downtime or performing replication between different major releases.
- Reduced Bloat on Standby Nodes: As logical replication only sends the data changes (INSERTs, UPDATEs, DELETEs), the standby nodes do not need to replicate the entire dataset. This reduces unnecessary data duplication and can help in managing the storage footprint on standby systems.
- Flexibility in Standby Nodes: Logical replication allows flexibility in the configuration of the standby node. For example, you can use temporary tables, custom triggers, or different user accounts on the subscriber side. This can be useful for use cases such as offloading analytical queries from the primary database.
How Logical Replication Works in Postgres?
Logical replication uses a publish/subscribe model to stream changes from a publisher node to a subscriber node. Here’s an overview of the key components and how the process works:
- Logical Decoding: The publisher node first captures logical changes using logical decoding, which decodes the Write-Ahead Logs (WAL) into a stream of logical changes. This is the process that converts the raw WAL entries (which are physical and low-level) into logical operations that can be understood and applied by the subscriber.
- Publication: A publication is created on the publisher, which defines the specific tables or schemas whose changes will be replicated. A publication can contain one or more tables, and these tables are the ones that will have their changes sent to the subscriber.
- Subscription: On the subscriber node, a subscription is created. A subscription specifies which publication to subscribe to and includes the necessary connection details for the publisher. This means the subscriber node is essentially requesting to receive the changes defined by the publication.
- Data Transfer: Once the subscription is established, the publisher starts streaming changes to the subscriber. These changes include any inserts, updates, or deletes that occur in the tables defined in the publication.
- Change Application: The subscriber applies the changes to its own database, effectively keeping it in sync with the publisher.
Considerations for Logical Replication in Postgresql
While logical replication offers many advantages, it is essential to keep a few considerations in mind when setting it up:
- Replica Identity: Logical replication relies on a mechanism to uniquely identify rows, ensuring that updates and deletes can be applied correctly on the subscriber node. This “replica identity” can be a primary key or a unique index. If a table lacks a primary key, a unique index must be defined to serve as the replica identity.
- Data Consistency: Logical replication is asynchronous by default, meaning that there can be a delay between when data is written on the publisher and when it is applied on the subscriber. While PostgreSQL ensures that the subscriber eventually catches up with the publisher, temporary inconsistencies might occur, especially in high-throughput environments.
- DDL Replication: Unlike physical replication, logical replication does not replicate Data Definition Language (DDL) statements such as
CREATE
,ALTER
, orDROP
commands. DDL operations need to be manually applied to the subscriber database. However, PostgreSQL 16 introduces features that can help streamline this process. - Conflicts: Logical replication typically does not handle conflicts automatically. If both the publisher and the subscriber are allowed to modify the same data, conflicts may arise. Proper conflict resolution mechanisms should be put in place to ensure data integrity.
New Logical Replication Features in PostgreSQL 16
PostgreSQL 16 brings several enhancements to logical replication, making it even more powerful:
- Logical Replication from Physical Standby: One of the most exciting features in PostgreSQL 16 is the ability to create a logical replication slot on a physical standby node. This means you can now replicate data from a standby server, allowing for a more flexible setup where you can replicate a subset of the data without affecting the primary node.
- Parallel Apply: To improve performance, PostgreSQL 16 introduces parallel apply, where large transactions can be applied in parallel by multiple worker processes on the subscriber side. This significantly enhances the performance of applying large batches of data changes and can help reduce replication lag.
Use Cases for Logical Replication
Logical replication in PostgreSQL is incredibly versatile and can be used for a variety of purposes, including:
- Read-Only Replicas: Logical replication is often used to create read-only replicas that are used for reporting or analytics. By replicating only the necessary tables, you can offload heavy queries from the primary database, thus improving overall system performance.
- Upgrading PostgreSQL: When upgrading PostgreSQL to a new major version, logical replication offers a way to minimize downtime. You can set up a logical replication stream between the old and new versions, allowing data to be transferred in the background while the old system continues to operate.
- Multi-Master Replication: Using tools like EDB Postgres Distributed (PGD), logical replication can also be used to set up multi-master replication, where changes to data are propagated across multiple PostgreSQL instances. This provides high availability and horizontal scaling.
Conclusion
Logical replication in PostgreSQL is a powerful and flexible feature that enables selective, version-independent data replication. It offers significant advantages over physical replication, particularly in scenarios that require replication of specific tables or data subsets.
With enhancements in PostgreSQL 16, including the ability to replicate from physical standbys and parallel apply, logical replication has become even more efficient and adaptable to a variety of use cases.
By understanding the workings of logical replication, its advantages, and the considerations involved, you can leverage this feature to optimize data replication, support system upgrades, and enhance the scalability and availability of your PostgreSQL databases.