Chapter 17. Write Barriers
Enabling write barriers incurs a substantial performance penalty for some applications. Specifically, applications that use fsync()
heavily or create and delete many small files will likely run much slower.
17.1. Importance of Write Barriers
File systems take great care to safely update metadata, ensuring consistency. Journalled file systems bundle metadata updates into transactions and send them to persistent storage in the following manner:
First, the file system sends the body of the transaction to the storage device.
Then, the file system sends a commit block.
If the transaction and its corresponding commit block are written to disk, the file system assumes that the transaction will survive any power failure.
However, file system integrity during power failure becomes more complex for storage devices with extra caches. Storage target devices like local S-ATA or SAS drives may have write caches ranging from 32MB to 64MB in size (with modern drives). Hardware RAID controllers often contain internal write caches. Further, high end arrays, like those from NetApp, IBM, Hitachi and EMC (among others), also have large caches.
Storage devices with write caches report I/O as "complete" when the data is in cache; if the cache loses power, it loses its data as well. Worse, as the cache de-stages to persistent storage, it may change the original metadata ordering. When this occurs, the commit block may be present on disk without having the complete, associated transaction in place. As a result, the journal may replay these uninitialized transaction blocks into the file system during post-power-loss recovery; this will cause data inconsistency and corruption.
Write barriers are implemented in the Linux kernel via storage write cache flushes before and after the I/O, which is order-critical. After the transaction is written, the storage cache is flushed, the commit block is written, and the cache is flushed again. This ensures that:
With barriers enabled, an fsync()
call will also issue a storage cache flush. This guarantees that file data is persistent on disk even if power loss occurs shortly after fsync()
returns.