Understanding PostgreSQL's REPLICA IDENTITY FULL for Logical Replication

In PostgreSQL, logical replication allows for selective replication of database objects like tables, allowing changes in one database to be replicated to another in real-time.

One important feature in logical replication is the concept of replica identity, which defines how PostgreSQL tracks and identifies rows for replication, especially when handling DELETE operations. In this article, we’ll explore what REPLICA IDENTITY FULL is, how it works, and provide a practical example using a test schema.

Table of Contents

What is REPLICA IDENTITY?

The replica identity in PostgreSQL is a setting that determines how the database tracks rows for the purposes of logical replication. Specifically, it defines what columns in a table will be used to uniquely identify rows during replication, particularly for DELETE operations.

Replica Identity Options

PostgreSQL offers several options for configuring the replica identity:

Get Your Linux Course!

Join our Linux Course and discover the power of open-source technology. Enhance your skills and boost your career! Start learning Linux today for only $1!

REPLICA IDENTITY DEFAULT:
This is the default behavior. PostgreSQL uses the primary key (or unique constraint) of the table to uniquely identify rows for replication. If a row is deleted, only the primary key (or the unique index) is logged for that row.
REPLICA IDENTITY NOTHING:
With this setting, PostgreSQL does not track any identity for rows, meaning that DELETE operations will not be logged or replicated. This is typically used for read-only tables or tables where deletions do not need to be replicated.
REPLICA IDENTITY FULL:
This option tracks all columns of a row during replication. When a row is deleted, the entire row’s data (not just the primary key) is logged and replicated. This is useful for tables without primary keys or for tables where you want the entire row to be replicated, including deletions.
REPLICA IDENTITY USING INDEX:
This allows you to specify a custom index (usually a unique index) to be used for tracking rows. It provides flexibility if the table has unique constraints or other indexes that you want to use instead of the primary key.

Why Use REPLICA IDENTITY FULL?

Tracking Deletes Without a Primary Key

The main use case for REPLICA IDENTITY FULL is when you have a table that does not have a primary key, or the primary key is not suitable for identifying rows for replication. By using FULL, PostgreSQL logs the entire row, ensuring that even if a row is deleted, its data can still be replicated accurately to the target system.

Tables Without Primary Keys

In some cases, tables may not have a primary key (e.g., for logging or historical tables). If logical replication is configured on such a table, using FULL will ensure that the system replicates all changes, including deletions, without needing a unique key.

Redundant Replication of Full Row Data

If you want to replicate the entire row (not just the primary key or index) for DELETE operations, setting the replica identity to FULL ensures that PostgreSQL tracks every column of the table, which could be critical for systems requiring full data integrity or audits.

Example: Configuring REPLICA IDENTITY FULL

To better understand how this works, let’s look at a practical example. Suppose you have a test schema named test with a table named user_sessions. This table does not have a primary key but still needs to be included in a logical replication setup.

Step 1: Creating the `user_sessions` Table in the `test` Schema

Let’s create a table in the test schema:

CREATE SCHEMA test;

CREATE TABLE test.user_sessions (
    session_id SERIAL,
    user_id INT,
    session_start TIMESTAMP,
    session_end TIMESTAMP,
    last_activity TIMESTAMP
);

In this case, user_sessions does not have a primary key, but we still want to track all changes, including deletions, for replication.

Step 2: Setting the Replica Identity to `FULL`

To configure the table for replication, we set the replica identity to FULL, which will ensure all columns are tracked during replication:

ALTER TABLE test.user_sessions REPLICA IDENTITY FULL;

Explanation:

REPLICA IDENTITY FULL: This ensures that when a row in the user_sessions table is deleted, the entire row (with all columns) will be logged and replicated to the target system, even though there is no primary key in the table.
This is particularly important for maintaining data consistency across systems when replication includes DELETE operations.

Step 3: Verifying the Replica Identity Setting

You can check the current replica identity setting using the following SQL command:

SELECT relname, replica_identity
FROM pg_catalog.pg_class
WHERE relnamespace = 'test'::regnamespace;

This will return the table name and its associated replica identity setting.

When to Use REPLICA IDENTITY FULL

You would typically use REPLICA IDENTITY FULL in the following scenarios:

No Primary Key: If your table does not have a primary key or any unique constraints.
Full Row Replication: If you want to track the full content of the rows for replication, including DELETE operations.
Custom Replication Needs: If you have business logic or audit requirements that require full visibility of the rows, even during deletions.

Performance Considerations

While REPLICA IDENTITY FULL provides comprehensive replication by tracking all column data, it can have an impact on performance:

Increased Disk Usage: Since the entire row is logged for every change, this increases the amount of data that is stored for replication purposes.
Higher Network Overhead: Replicating entire rows (especially if they are large) can lead to higher network usage during replication, which may affect performance.
Slower Replication: For large tables with many columns, replication could be slower because the system must send more data over the network.

Conclusion

The REPLICA IDENTITY FULL setting in PostgreSQL is an important feature for logical replication, especially for tables without primary keys or when you need to track and replicate all columns of a table, including during DELETE operations. By using this feature, you ensure that even tables without a unique identifier can participate in logical replication, and all data is kept in sync across systems.

However, it’s important to carefully consider the performance implications of using REPLICA IDENTITY FULL on large tables, as it can lead to higher storage and network usage. Always test your replication setup in a non-production environment to understand its impact on your system.