In PostgreSQL, replication mechanisms—whether physical or logical—rely heavily on the Write-Ahead Logging (WAL) system.
WAL serves as a fundamental component to ensure data consistency, durability, and to support replication processes.
This article explains how WAL works in PostgreSQL, including its role in replication, file management, and system recovery.
Table of Contents
What is Write-Ahead Logging (WAL)?
Write-Ahead Logging (WAL) is a technique used in PostgreSQL to ensure data integrity and durability.
Get Your Linux Course!
Join our Linux Course and discover the power of open-source technology. Enhance your skills and boost your career! Learn Linux today!Whenever a transaction modifies the database—whether it involves an **INSERT**, **UPDATE**, **DELETE**, or a **COMMIT**—PostgreSQL first writes the changes to a log file before updating any data files on disk.
This process ensures that in the event of a system failure (such as a crash or power outage), no committed transactions are lost. The log file is referred to as the WAL log, and it plays a crucial role in the recovery and replication processes.
How Does WAL Work?
Each change made by a transaction is captured in a WAL record. These records are initially written to an in-memory buffer. Upon transaction commit, they are flushed to disk in a WAL segment file. The WAL record contains information about the transaction, such as the operation type (**INSERT**, **UPDATE**, **DELETE**), along with the data being modified.
A critical component of WAL is the Log Sequence Number (LSN). The LSN uniquely identifies the location of a WAL record within the log file.
It is a 64-bit number, represented as two 32-bit hexadecimal values separated by a “/”. For example, you can query the current WAL LSN with the following SQL command:
SELECT pg_current_wal_lsn();
The LSN helps PostgreSQL track where each change is stored in the WAL log, which is important for replication and recovery.
- The output shows the current LSN in the format <LSN High>/<LSN Low>, where both parts are hexadecimal numbers.
- In this case, 0/3000020 represents the position in the WAL log where the most recent record has been written.
Checkpoints in PostgreSQL
A checkpoint in PostgreSQL marks the point where all data files are updated to reflect the changes recorded in the WAL. The checkpoint ensures that if the system crashes, recovery can begin from a known, consistent point. Checkpoints occur periodically and are necessary for PostgreSQL to maintain a balance between system performance and data safety.
During a checkpoint, the system writes all WAL records up to that point into the data files, which reduces the recovery time after a crash. The checkpoint process also involves writing the WAL to disk in such a way that only committed transactions are included, ensuring data consistency.
WAL Segments and WAL File Management
The WAL log itself is managed through WAL segments, which are physical files where WAL records are stored. These segment files are typically 16 MB in size and are stored in the pg_wal directory of the PostgreSQL data directory. The names of these WAL segment files are 24-digit numbers that follow a sequential pattern.
For example, you can query the current WAL segment name using the following SQL:
SELECT pg_walfile_name(pg_current_wal_lsn());
WAL segment files are managed by PostgreSQL under several conditions:
- Segment file rotation: A new WAL segment is generated when the current segment file is filled.
- Manual switching: The pg_switch_wal() function can be issued to force a new WAL segment to begin.
- Archive mode: If the archive_mode setting is enabled, WAL segment files are archived after they are written.
Once a checkpoint occurs, PostgreSQL determines how many WAL segment files need to be retained. It bases this estimation on the number of files consumed in previous checkpoint cycles and ensures that between min_wal_size (default 80 MB) and max_wal_size (default 1 GB) worth of WAL files are kept.
WAL-based Replication in PostgreSQL
WAL forms the backbone of PostgreSQL’s replication features, particularly for physical replication and logical replication. These replication techniques leverage WAL to replicate data across multiple PostgreSQL instances.
WAL Archiving and Log Shipping
One common method of replication is log shipping, where WAL files are copied from the primary server to a secondary (standby) server. This is a form of file-based replication that allows the standby server to replay WAL files and mirror the primary server’s state.
To archive WAL files, PostgreSQL can be configured to execute a user-defined script that copies each WAL segment to a remote location. This process is referred to as WAL archiving. Once archived, these WAL files can be used for point-in-time recovery or to set up standby servers.
Log shipping can be divided into two types:
- File-level log shipping: WAL files are copied to the standby server, which operates in recovery mode and applies these WAL files as they arrive.
- Block-level log shipping (Streaming Replication): This is an improvement over file-level log shipping, where WAL records are streamed directly to the standby server in real-time, reducing replication lag.
Streaming replication involves the primary server sending WAL records directly to the standby server over a network connection, using the replication protocol. This method allows the standby server to be in sync with the primary server with minimal delay, making it ideal for high-availability configurations.
Replication Slots
In the context of replication, a replication slot is a mechanism that ensures WAL files are retained as long as they are needed by replication clients.
A replication slot tracks the oldest WAL segment file and oldest transaction ID that a standby server requires. This feature prevents the primary server from discarding WAL files that are still needed by a standby, avoiding replication failures.
Replication slots are especially useful when dealing with slow or intermittent connections between the primary and standby servers. PostgreSQL will retain the necessary WAL files for as long as they are required by the replication slots.
Here’s a more concise explanation of your query:
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '2113/49321D48'));
Key Components:
pg_current_wal_lsn()
: Returns the current WAL (Write-Ahead Log) Log Sequence Number.pg_wal_lsn_diff()
: Calculates the byte difference between two LSNs.pg_size_pretty()
: Converts the byte difference into a human-readable format (e.g., ‘1 MB’).
How It Works:
- The query compares the current WAL position (
pg_current_wal_lsn()
) with the LSN'2113/49321D48'
. pg_wal_lsn_diff()
gives the byte difference between these two positions.pg_size_pretty()
then converts that difference into a human-readable format (e.g., ’10 MB’).
Example Output:
If the byte difference is 1048576 bytes, the result would be:
1 MB
This query is useful for tracking WAL generation or replication progress.
Recovering from Crashes Using WAL
One of the key benefits of WAL is its role in crash recovery. In the event of a system crash, PostgreSQL uses the WAL to replay committed transactions that may not have been written to the data files. Upon restarting, PostgreSQL begins recovery from the last known redo point (or checkpoint), ensuring that no committed transactions are lost.
This recovery process uses the WAL to bring the database back to a consistent state. If the system crashes before a checkpoint, WAL records are used to replay any changes since the last checkpoint. This guarantees that the database can recover to a state where all committed transactions are present.
PostgreSQL sql related to WAL
Here are some important PostgreSQL objects related to WAL (Write-Ahead Logging), which can help you manage, monitor, and inspect WAL behavior:
1. pg_current_wal_lsn()
- Description: Returns the current WAL Log Sequence Number (LSN), representing the current position in the WAL stream.
- Usage:
SELECT pg_current_wal_lsn();
2. pg_wal_lsn_diff()
- Description: Computes the difference in bytes between two LSNs.
- Usage:
SELECT pg_wal_lsn_diff('A40/A559B428', '9F1/4F6F3A58');
3. pg_wal_replay_pause()
- Description: Pauses WAL replay on a standby server. This can be used in replication scenarios to stop the replay of WAL logs temporarily.
- Usage:
SELECT pg_wal_replay_pause();
4. pg_wal_replay_resume()
- Description: Resumes WAL replay on a standby server after it was paused.
- Usage:
SELECT pg_wal_replay_resume();
5. pg_wal_lsn_diff()
- Description: Computes the byte difference between two WAL LSNs.
- Usage:
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '2113/49321D48');
6. pg_xlog_location_diff()
(older PostgreSQL versions)
- Description: Similar to
pg_wal_lsn_diff()
but used in older PostgreSQL versions (pre-10), where WAL was referred to as XLOG. - Usage:
SELECT pg_xlog_location_diff('0/3000', '0/4000');
7. pg_last_wal_replay_lsn()
- Description: Returns the LSN of the last WAL replayed on a replica server. Useful for replication monitoring.
- Usage:
SELECT pg_last_wal_replay_lsn();
8. pg_last_wal_receive_lsn()
- Description: Returns the LSN of the last WAL segment received on the replica server.
- Usage:
SELECT pg_last_wal_receive_lsn()
9. pg_stat_wal
- Description: Provides various statistics about WAL generation and activity, such as the number of WAL files generated and their sizes.
- Usage:
SELECT * FROM pg_stat_wal;
PostgreSQL system views related to WAL
PostgreSQL provides several system views related to WAL (Write-Ahead Logging) that allow you to monitor and analyze WAL activity, replication, and archival.
These views are valuable for database administrators to track WAL generation, replication status, and performance.
Here’s a list of some key views related to WAL:
1. pg_stat_wal
- Description: Provides statistics about the WAL activity, such as the number of WAL files written and the number of bytes written to WAL.
- Key Columns:
write_lsn
: The log sequence number (LSN) at which the last write occurred.write_size
: The total number of bytes written to WAL.flush_lsn
: The LSN of the last flushed WAL record.flush_size
: The number of bytes flushed.
- Example Query:
SELECT * FROM pg_stat_wal;
2. pg_stat_bgwriter
- Description: Provides statistics related to PostgreSQL’s background writer, which writes dirty pages (including WAL data) to disk. It includes information about WAL writes, checkpoint activity, and buffers written to disk.
- Key Columns:
buffers_alloc
: Number of buffers allocated by the background writer.buffers_written
: Number of buffers written to disk.maxwritten_clean
: Number of times the background writer has written beyond the maximum dirty buffer threshold.write_time
: Time spent writing buffers (including WAL-related writes).
- Example Query:
SELECT * FROM pg_stat_bgwriter;
3. pg_stat_replication
- Description: Provides replication-related statistics, including information about the replication process, lag, and current WAL positions on the standby.
- Key Columns:
pid
: The process ID of the replication worker.application_name
: The name of the replication client (e.g., a standby server).state
: Current state of the replication (e.g.,streaming
,catchup
).sent_lsn
: The LSN up to which WAL has been sent to the replica.replay_lsn
: The LSN up to which WAL has been replayed on the replica.write_lsn
: The LSN up to which WAL has been written on the replica.flush_lsn
: The LSN up to which WAL has been flushed on the replica.
- Example Query:
SELECT * FROM pg_stat_replication;
4. pg_stat_wal_receiver
- Description: Provides information about the WAL receiver process on the standby server, including details about the WAL replication status and current LSN.
- Key Columns:
pid
: The process ID of the WAL receiver.status
: The current status of the receiver (e.g.,streaming
,catchup
).received_lsn
: The LSN of the last WAL segment received by the standby.last_msg_send_time
: The time the last message was sent from the primary to the standby.last_msg_receipt_time
: The time the last message was received by the standby.
- Example Query:
SELECT * FROM pg_stat_wal_receiver;
5. pg_stat_archiver
- Description: Provides information about the WAL archiver process, which is responsible for archiving WAL files for Point-in-Time Recovery (PITR).
- Key Columns:
last_archived_wal
: The last archived WAL file name.last_archived_time
: The timestamp when the last WAL file was archived.archived_count
: The number of WAL files archived.failed_count
: The number of failed attempts to archive WAL files.
- Example Query:
SELECT * FROM pg_stat_archiver;
6. pg_stat_database
(indirectly related to WAL)
- Description: Provides general statistics about database activity, including the number of WAL records generated as part of the transaction commit process.
- Key Columns:
xact_commit
: Number of transactions committed.xact_rollback
: Number of transactions rolled back.blks_read
: Number of disk blocks read.blks_hit
: Number of buffer hits (directly affecting WAL generation).
- Example Query:
SELECT datname, xact_commit, xact_rollback, blks_read, blks_hit FROM pg_stat_database;
7. pg_wal_lsn_diff
(function, not a view but relevant for WAL monitoring)
- Description: While this isn’t a view, it is a function related to WAL activity. It computes the difference in bytes between two LSNs, which can be useful for tracking WAL changes over time.
- Example Query:
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '2113/49321D48');
8. pg_wal_file
- Description: Returns the name of the WAL file corresponding to a given LSN (Log Sequence Number). This function is generally used in older PostgreSQL versions (before
pg_wal_lsn_*
functions were introduced). - Example Query:
SELECT pg_wal_file(pg_current_wal_lsn());
These system views are useful for monitoring WAL-related activity, especially in setups involving streaming replication, WAL archiving, and Point-in-Time Recovery (PITR).
pg_stat_wal
: WAL activity statistics.pg_stat_bgwriter
: Background writer statistics (including WAL writing).pg_stat_replication
: Replication statistics for primary and standby servers.pg_stat_wal_receiver
: WAL receiver status (for standby servers).pg_stat_archiver
: WAL archiving statistics.pg_stat_database
: General database statistics, indirectly related to WAL.
You can query these views to gather important insights into the state of WAL activity, replication progress, and backup/archive status in PostgreSQL.
Conclusion
In PostgreSQL, Write-Ahead Logging (WAL) is a critical feature that ensures data integrity, consistency, and durability. By writing all changes to a WAL log before modifying data files, PostgreSQL can recover from system crashes and failures while ensuring that no committed transactions are lost.
Furthermore, WAL plays a central role in replication, enabling both physical and logical replication mechanisms, including log shipping and streaming replication. Understanding how WAL works is essential for managing PostgreSQL clusters, ensuring data safety, and optimizing replication performance.
Monda
Sunday 1st of December 2024
I have to say, reading this post really helped me understand Write-Ahead Logging (WAL) in PostgreSQL better! When I first started using PostgreSQL in production, I wasn’t fully aware of how crucial WAL is to the database’s overall performance and reliability. Like many newcomers, I thought of WAL mainly as a mechanism for crash recovery, but I’ve come to realize that it's much more than that!