Replication Features of a ZFS Storage Pool

ZFS provides two levels of data redundancy in a mirrored and a RAID-Z configuration.

Mirrored Storage Pool Configuration

A mirrored storage pool configuration requires at least two disks, preferrably on separate controllers. Many disks can be used in a mirrored configuration. In addition, you can create more than one mirror in each pool. Conceptually, a simple mirrored configuration would look similar to the following:

mirror c1t0d0 c2t0d0

Conceptually, a more complex mirrored configuration would look similar to the following:

mirror c1t0d0 c2t0d0 c3t0d0 mirror c4t0d0 c5t0d0 c6t0d0

For information about creating a mirrored storage pool, see Creating a Mirrored Storage Pool.

RAID-Z Storage Pool Configuration

In addition to a mirrored storage pool configuration, ZFS provides a RAID-Z configuration. RAID-Z is similar to RAID-5.

All traditional RAID-5-like algorithms (RAID-4. RAID-5. RAID-6, RDP, and EVEN-ODD, for example) suffer from a problem known as the “RAID-5 write hole.” If only part of a RAID-5 stripe is written, and power is lost before all blocks have made it to disk, the parity will remain out of sync with the data, and therefore useless, forever (unless a subsequent full-stripe write overwrites it). In RAID-Z, ZFS uses variable-width RAID stripes so that all writes are full-stripe writes. This design is only possible because ZFS integrates file system and device management in such a way that the file system's metadata has enough information about the underlying data replication model to handle variable-width RAID stripes. RAID-Z is the world's first software-only solution to the RAID-5 write hole.

You need at least two disks for a RAID-Z configuration. Otherwise, no special hardware is required to create a RAID-Z configuration. Currently, RAID-Z provides single parity. For example, if you have three disks in a RAID-Z configuration, parity data occupies space equal to one of the three disks.

Conceptually, RAID-Z configuration with three disks would look similar to the following:

raidz c1t0d0 c2t0d0 c3t0d0

A more complex conceptual RAID-Z configuration would look similar to the following:

raidz c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c7t0d0 raidz c8t0d0 c9t0d0 c10t0d0 c11t0d0 c12t0d0 c13t0d0 c14t0d0

If you are creating a RAID-Z configuration with many disks, as in this example, a RAID-Z configuration with 14 disks is better split into a two 7-disk groupings. RAID-Z configurations with single-digit groupings of disks should perform better.

For information about creating a RAID-Z storage pool, see Creating a RAID-Z Storage Pool.

Self-Healing Data in a Replicated Configuration

ZFS provides for self-healing data in a mirrored or RAID-Z configuration.

When a bad data block is detected, not only does ZFS fetch the correct data from another replicated copy, but it also repairs the bad data by replacing it with the good copy.

Dynamic Striping in a Storage Pool

For each virtual device that is added to the pool, ZFS dynamically stripes data across all available devices. The decision about where to place data is done at write time, so no fixed width stripes are created at allocation time.

When virtual devices are added to a pool, ZFS gradually allocates data to the new device in order to maintain performance and space allocation policies. Each virtual device can also be a mirror or a RAID-Z device that contains other disk devices or files. This configuration allows for flexibility in controlling the fault characteristics of your pool. For example, you could create the following configurations out of 4 disks:

  • Four disks using dynamic striping

  • One four-way RAID-Z configuration

  • Two two-way mirrors using dynamic striping

While ZFS supports combining different types of virtual devices within the same pool, this practice is not recommended. For example, you can create a pool with a two-way mirror and a three-way RAID-Z configuration. However, your fault tolerance is as good as your worst virtual device, RAID-Z in this case. The recommended practice is to use top-level virtual devices of the same type with the same replication level in each device.