Red Hat Database: Administrator and User's Guide
Prev	Chapter 15. Write-Ahead Logging (WAL)	Next

Implementation

WAL is automatically enabled, so no action is required from the administrator--with the exception of ensuring that the additional disk-space requirements of the WAL logs are met, and that any necessary tuning is done (see the section "WAL Configuration").

WAL logs are stored in the directory $PGDATA/pg_xlog, as a set of segment files, each 16 MB in size. Each segment is divided into 8 KB pages. The log record headers are described in access/xlog.h; record content is dependent on the type of event that is being logged. Segment files are given sequential numbers as names, starting at 0000000000000000. It will take a very long time to exhaust the available stock of numbers.

The WAL buffers and control structure are in shared memory, and are protected by spinlocks. The demand on shared memory is dependent on the number of buffers; where the default size of the WAL buffers is 64 KB.

It is of advantageous to locate the log on a disk different from that used for the main database files. This may be achieved by moving the directory, pg_xlog, to another location (while the postmaster is shut down, of course) and creating a symbolic link from the original location in $PGDATA to the new location.

The aim of WAL, which is to ensure that the log is written before database records are altered, may be subverted by disk drives that falsely report a successful write to the kernel, when, in fact, they have only cached the data and not yet stored it on the disk. A power failure in such a situation may still lead to irrecoverable data corruption. Administrators should try to ensure that disks holding PostgreSQL's data and log files do not make such false reports. Consult your disk drive vendor to determine how your disk drive reports successful writes to the kernel.

Database Recovery with WAL

After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file pg_control. When recovery is to be done, the backend first reads pg_control and then the checkpoint record. Next it reads the redo record, whose position is saved in the checkpoint, and begins the REDO operation. Because the entire content of the pages is saved in the log on the first page modification after a checkpoint, the pages will be restored to a consistent state.