Chapter 15. Write-Ahead Logging (WAL)

General Description

Write Ahead Logging (WAL) is a standard approach to transaction logging. Its detailed description may be found in most books about transaction processing. Briefly, WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged - that is, when log records have been flushed to permanent storage. When we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log. Any changes that have not been applied to the data pages will first be redone from the log records (this is roll-forward recovery, also known as REDO) and then changes made by uncommitted transactions will be removed from the data pages (roll-backward recovery, also known as UNDO).

Benefits of WAL

WAL significantly reduces the number of disk writes, since only the log file needs to be flushed to disk at the time of transaction commit. In multi-user environments, commits of many transactions may be accomplished with a single fsync() of the log file. The log file is written sequentially, so the cost of syncing the log is much less than the cost of flushing the data pages.

Another benefit is consistency of the data pages. Without WAL, PostgreSQL would not be able to guarantee consistency in the case of a crash and any crash during writing could result in:

  1. Index tuples pointing to non-existent table rows

  2. Index tuples lost in split operations

  3. Totally corrupted table or index page content, due to of partially written data pages

Problems with indexes (problems 1 and 2) could possibly have been fixed by additional fsync() calls, but it is not obvious how to handle the last case without WAL; WAL saves the entire data page content in the log if that is required to ensure page consistency for after-crash recovery.

Future Benefits

In this release of WAL, the UNDO operation is not implemented. Therefore changes made by aborted transactions will still occupy disk space and that we still need a permanent pg_log file to hold the status of transactions (since we are not able to re-use transaction identifiers). Once UNDO is implemented, pg_log will no longer be required to be permanent.

With UNDO, it will also be possible to implement savepoints to allow partial rollback of invalid transaction operations (parser errors caused by mistyping commands, insertion of duplicate primary/unique keys and so on) with the ability to continue or commit valid operations made by the transaction before the error. At present, any error will invalidate the whole transaction and require a transaction abort.