In PostgreSQL, logical replication allows for selective replication of database objects like tables, allowing changes in one database to be replicated to another in real-time.
One important feature in logical replication is the concept of replica identity, which defines how PostgreSQL tracks and identifies rows for replication, especially when handling DELETE operations. In this article, we’ll explore what REPLICA IDENTITY FULL
is, how it works, and provide a practical example using a test schema.
Table of Contents
What is REPLICA IDENTITY?
The replica identity in PostgreSQL is a setting that determines how the database tracks rows for the purposes of logical replication. Specifically, it defines what columns in a table will be used to uniquely identify rows during replication, particularly for DELETE
operations.
Replica Identity Options
PostgreSQL offers several options for configuring the replica identity:
Get Your Linux Course!
Join our Linux Course and discover the power of open-source technology. Enhance your skills and boost your career! Learn Linux today!REPLICA IDENTITY DEFAULT
:
This is the default behavior. PostgreSQL uses the primary key (or unique constraint) of the table to uniquely identify rows for replication. If a row is deleted, only the primary key (or the unique index) is logged for that row.REPLICA IDENTITY NOTHING
:
With this setting, PostgreSQL does not track any identity for rows, meaning thatDELETE
operations will not be logged or replicated. This is typically used for read-only tables or tables where deletions do not need to be replicated.REPLICA IDENTITY FULL
:
This option tracks all columns of a row during replication. When a row is deleted, the entire row’s data (not just the primary key) is logged and replicated. This is useful for tables without primary keys or for tables where you want the entire row to be replicated, including deletions.REPLICA IDENTITY USING INDEX
:
This allows you to specify a custom index (usually a unique index) to be used for tracking rows. It provides flexibility if the table has unique constraints or other indexes that you want to use instead of the primary key.
Why Use REPLICA IDENTITY FULL?
Tracking Deletes Without a Primary Key
The main use case for REPLICA IDENTITY FULL
is when you have a table that does not have a primary key, or the primary key is not suitable for identifying rows for replication. By using FULL
, PostgreSQL logs the entire row, ensuring that even if a row is deleted, its data can still be replicated accurately to the target system.
Tables Without Primary Keys
In some cases, tables may not have a primary key (e.g., for logging or historical tables). If logical replication is configured on such a table, using FULL
will ensure that the system replicates all changes, including deletions, without needing a unique key.
Redundant Replication of Full Row Data
If you want to replicate the entire row (not just the primary key or index) for DELETE
operations, setting the replica identity to FULL
ensures that PostgreSQL tracks every column of the table, which could be critical for systems requiring full data integrity or audits.
Example: Configuring REPLICA IDENTITY FULL
To better understand how this works, let’s look at a practical example. Suppose you have a test schema named test
with a table named user_sessions
. This table does not have a primary key but still needs to be included in a logical replication setup.
Step 1: Creating the user_sessions
Table in the test
Schema
Let’s create a table in the test
schema:
CREATE SCHEMA test;
CREATE TABLE test.user_sessions (
session_id SERIAL,
user_id INT,
session_start TIMESTAMP,
session_end TIMESTAMP,
last_activity TIMESTAMP
);
In this case, user_sessions
does not have a primary key, but we still want to track all changes, including deletions, for replication.
Step 2: Setting the Replica Identity to FULL
To configure the table for replication, we set the replica identity to FULL
, which will ensure all columns are tracked during replication:
ALTER TABLE test.user_sessions REPLICA IDENTITY FULL;
Explanation:
REPLICA IDENTITY FULL
: This ensures that when a row in theuser_sessions
table is deleted, the entire row (with all columns) will be logged and replicated to the target system, even though there is no primary key in the table.- This is particularly important for maintaining data consistency across systems when replication includes
DELETE
operations.
Step 3: Verifying the Replica Identity Setting
You can check the current replica identity setting using the following SQL command:
SELECT relname, replica_identity
FROM pg_catalog.pg_class
WHERE relnamespace = 'test'::regnamespace;
This will return the table name and its associated replica identity setting.
When to Use REPLICA IDENTITY FULL
You would typically use REPLICA IDENTITY FULL
in the following scenarios:
- No Primary Key: If your table does not have a primary key or any unique constraints.
- Full Row Replication: If you want to track the full content of the rows for replication, including
DELETE
operations. - Custom Replication Needs: If you have business logic or audit requirements that require full visibility of the rows, even during deletions.
Performance Considerations
While REPLICA IDENTITY FULL
provides comprehensive replication by tracking all column data, it can have an impact on performance:
- Increased Disk Usage: Since the entire row is logged for every change, this increases the amount of data that is stored for replication purposes.
- Higher Network Overhead: Replicating entire rows (especially if they are large) can lead to higher network usage during replication, which may affect performance.
- Slower Replication: For large tables with many columns, replication could be slower because the system must send more data over the network.
Conclusion
The REPLICA IDENTITY FULL
setting in PostgreSQL is an important feature for logical replication, especially for tables without primary keys or when you need to track and replicate all columns of a table, including during DELETE
operations. By using this feature, you ensure that even tables without a unique identifier can participate in logical replication, and all data is kept in sync across systems.
However, it’s important to carefully consider the performance implications of using REPLICA IDENTITY FULL
on large tables, as it can lead to higher storage and network usage. Always test your replication setup in a non-production environment to understand its impact on your system.