The zfs
utility is responsible for
creating, destroying, and managing all ZFS
datasets that exist within a pool. The pool is managed using
zpool
.
Unlike traditional disks and volume managers, space in
ZFS is not
preallocated. With traditional file systems, after all of the
space is partitioned and assigned, there is no way to add an
additional file system without adding a new disk. With
ZFS, new file systems can be created at any
time. Each dataset
has properties including features like compression,
deduplication, caching, and quotas, as well as other useful
properties like readonly, case sensitivity, network file
sharing, and a mount point. Datasets can be nested inside
each other, and child datasets will inherit properties from
their parents. Each dataset can be administered,
delegated,
replicated,
snapshotted,
jailed, and destroyed as a
unit. There are many advantages to creating a separate
dataset for each different type or set of files. The only
drawbacks to having an extremely large number of datasets is
that some commands like zfs list
will be
slower, and the mounting of hundreds or even thousands of
datasets can slow the FreeBSD boot process.
Create a new dataset and enable LZ4 compression on it:
#
zfs list
NAME USED AVAIL REFER MOUNTPOINT mypool 781M 93.2G 144K none mypool/ROOT 777M 93.2G 144K none mypool/ROOT/default 777M 93.2G 777M / mypool/tmp 176K 93.2G 176K /tmp mypool/usr 616K 93.2G 144K /usr mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/ports 144K 93.2G 144K /usr/ports mypool/usr/src 144K 93.2G 144K /usr/src mypool/var 1.20M 93.2G 608K /var mypool/var/crash 148K 93.2G 148K /var/crash mypool/var/log 178K 93.2G 178K /var/log mypool/var/mail 144K 93.2G 144K /var/mail mypool/var/tmp 152K 93.2G 152K /var/tmp#
zfs create -o compress=lz4
mypool/usr/mydataset
#
zfs list
NAME USED AVAIL REFER MOUNTPOINT mypool 781M 93.2G 144K none mypool/ROOT 777M 93.2G 144K none mypool/ROOT/default 777M 93.2G 777M / mypool/tmp 176K 93.2G 176K /tmp mypool/usr 704K 93.2G 144K /usr mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/mydataset 87.5K 93.2G 87.5K /usr/mydataset mypool/usr/ports 144K 93.2G 144K /usr/ports mypool/usr/src 144K 93.2G 144K /usr/src mypool/var 1.20M 93.2G 610K /var mypool/var/crash 148K 93.2G 148K /var/crash mypool/var/log 178K 93.2G 178K /var/log mypool/var/mail 144K 93.2G 144K /var/mail mypool/var/tmp 152K 93.2G 152K /var/tmp
Destroying a dataset is much quicker than deleting all of the files that reside on the dataset, as it does not involve scanning all of the files and updating all of the corresponding metadata.
Destroy the previously-created dataset:
#
zfs list
NAME USED AVAIL REFER MOUNTPOINT mypool 880M 93.1G 144K none mypool/ROOT 777M 93.1G 144K none mypool/ROOT/default 777M 93.1G 777M / mypool/tmp 176K 93.1G 176K /tmp mypool/usr 101M 93.1G 144K /usr mypool/usr/home 184K 93.1G 184K /usr/home mypool/usr/mydataset 100M 93.1G 100M /usr/mydataset mypool/usr/ports 144K 93.1G 144K /usr/ports mypool/usr/src 144K 93.1G 144K /usr/src mypool/var 1.20M 93.1G 610K /var mypool/var/crash 148K 93.1G 148K /var/crash mypool/var/log 178K 93.1G 178K /var/log mypool/var/mail 144K 93.1G 144K /var/mail mypool/var/tmp 152K 93.1G 152K /var/tmp#
zfs destroy
mypool/usr/mydataset
#
zfs list
NAME USED AVAIL REFER MOUNTPOINT mypool 781M 93.2G 144K none mypool/ROOT 777M 93.2G 144K none mypool/ROOT/default 777M 93.2G 777M / mypool/tmp 176K 93.2G 176K /tmp mypool/usr 616K 93.2G 144K /usr mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/ports 144K 93.2G 144K /usr/ports mypool/usr/src 144K 93.2G 144K /usr/src mypool/var 1.21M 93.2G 612K /var mypool/var/crash 148K 93.2G 148K /var/crash mypool/var/log 178K 93.2G 178K /var/log mypool/var/mail 144K 93.2G 144K /var/mail mypool/var/tmp 152K 93.2G 152K /var/tmp
In modern versions of ZFS,
zfs destroy
is asynchronous, and the free
space might take several minutes to appear in the pool. Use
zpool get freeing
to see the
poolname
freeing
property, indicating how many
datasets are having their blocks freed in the background.
If there are child datasets, like
snapshots or other
datasets, then the parent cannot be destroyed. To destroy a
dataset and all of its children, use -r
to
recursively destroy the dataset and all of its children.
Use -n
-v
to list datasets
and snapshots that would be destroyed by this operation, but
do not actually destroy anything. Space that would be
reclaimed by destruction of snapshots is also shown.
A volume is a special type of dataset. Rather than being
mounted as a file system, it is exposed as a block device
under
/dev/zvol/
.
This allows the volume to be used for other file systems, to
back the disks of a virtual machine, or to be exported using
protocols like iSCSI or
HAST.poolname
/dataset
A volume can be formatted with any file system, or used without a file system to store raw data. To the user, a volume appears to be a regular disk. Putting ordinary file systems on these zvols provides features that ordinary disks or file systems do not normally have. For example, using the compression property on a 250 MB volume allows creation of a compressed FAT file system.
#
zfs create -V 250m -o compression=on tank/fat32
#
zfs list tank
NAME USED AVAIL REFER MOUNTPOINT tank 258M 670M 31K /tank#
newfs_msdos -F32 /dev/zvol/tank/fat32
#
mount -t msdosfs /dev/zvol/tank/fat32 /mnt
#
df -h /mnt | grep fat32
Filesystem Size Used Avail Capacity Mounted on /dev/zvol/tank/fat32 249M 24k 249M 0% /mnt#
mount | grep fat32
/dev/zvol/tank/fat32 on /mnt (msdosfs, local)
Destroying a volume is much the same as destroying a regular file system dataset. The operation is nearly instantaneous, but it may take several minutes for the free space to be reclaimed in the background.
The name of a dataset can be changed with
zfs rename
. The parent of a dataset can
also be changed with this command. Renaming a dataset to be
under a different parent dataset will change the value of
those properties that are inherited from the parent dataset.
When a dataset is renamed, it is unmounted and then remounted
in the new location (which is inherited from the new parent
dataset). This behavior can be prevented with
-u
.
Rename a dataset and move it to be under a different parent dataset:
#
zfs list
NAME USED AVAIL REFER MOUNTPOINT mypool 780M 93.2G 144K none mypool/ROOT 777M 93.2G 144K none mypool/ROOT/default 777M 93.2G 777M / mypool/tmp 176K 93.2G 176K /tmp mypool/usr 704K 93.2G 144K /usr mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/mydataset 87.5K 93.2G 87.5K /usr/mydataset mypool/usr/ports 144K 93.2G 144K /usr/ports mypool/usr/src 144K 93.2G 144K /usr/src mypool/var 1.21M 93.2G 614K /var mypool/var/crash 148K 93.2G 148K /var/crash mypool/var/log 178K 93.2G 178K /var/log mypool/var/mail 144K 93.2G 144K /var/mail mypool/var/tmp 152K 93.2G 152K /var/tmp#
zfs rename
mypool/usr/mydataset
mypool/var/newname
#
zfs list
NAME USED AVAIL REFER MOUNTPOINT mypool 780M 93.2G 144K none mypool/ROOT 777M 93.2G 144K none mypool/ROOT/default 777M 93.2G 777M / mypool/tmp 176K 93.2G 176K /tmp mypool/usr 616K 93.2G 144K /usr mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/ports 144K 93.2G 144K /usr/ports mypool/usr/src 144K 93.2G 144K /usr/src mypool/var 1.29M 93.2G 614K /var mypool/var/crash 148K 93.2G 148K /var/crash mypool/var/log 178K 93.2G 178K /var/log mypool/var/mail 144K 93.2G 144K /var/mail mypool/var/newname 87.5K 93.2G 87.5K /var/newname mypool/var/tmp 152K 93.2G 152K /var/tmp
Snapshots can also be renamed like this. Due to the
nature of snapshots, they cannot be renamed into a different
parent dataset. To rename a recursive snapshot, specify
-r
, and all snapshots with the same name in
child datasets with also be renamed.
#
zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool/var/newname@first_snapshot 0 - 87.5K -#
zfs rename
mypool/var/newname@first_snapshot
new_snapshot_name
#
zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool/var/newname@new_snapshot_name 0 - 87.5K -
Each ZFS dataset has a number of
properties that control its behavior. Most properties are
automatically inherited from the parent dataset, but can be
overridden locally. Set a property on a dataset with
zfs set
. Most
properties have a limited set of valid values,
property
=value
dataset
zfs get
will display each possible property
and valid values. Most properties can be reverted to their
inherited values using zfs inherit
.
User-defined properties can also be set. They become part
of the dataset configuration and can be used to provide
additional information about the dataset or its contents. To
distinguish these custom properties from the ones supplied as
part of ZFS, a colon (:
)
is used to create a custom namespace for the property.
#
zfs set
custom
:costcenter
=1234
tank
#
zfs get
NAME PROPERTY VALUE SOURCE tank custom:costcenter 1234 localcustom
:costcenter
tank
To remove a custom property, use
zfs inherit
with -r
. If
the custom property is not defined in any of the parent
datasets, it will be removed completely (although the changes
are still recorded in the pool's history).
#
zfs inherit -r
custom
:costcenter
tank
#
zfs get
NAME PROPERTY VALUE SOURCE tank custom:costcenter - -custom
:costcenter
tank
#
zfs get all
tank
| grepcustom
:costcenter
#
Snapshots are one
of the most powerful features of ZFS. A
snapshot provides a read-only, point-in-time copy of the
dataset. With Copy-On-Write (COW),
snapshots can be created quickly by preserving the older
version of the data on disk. If no snapshots exist, space is
reclaimed for future use when data is rewritten or deleted.
Snapshots preserve disk space by recording only the
differences between the current dataset and a previous
version. Snapshots are allowed only on whole datasets, not on
individual files or directories. When a snapshot is created
from a dataset, everything contained in it is duplicated.
This includes the file system properties, files, directories,
permissions, and so on. Snapshots use no additional space
when they are first created, only consuming space as the
blocks they reference are changed. Recursive snapshots taken
with -r
create a snapshot with the same name
on the dataset and all of its children, providing a consistent
moment-in-time snapshot of all of the file systems. This can
be important when an application has files on multiple
datasets that are related or dependent upon each other.
Without snapshots, a backup would have copies of the files
from different points in time.
Snapshots in ZFS provide a variety of features that even other file systems with snapshot functionality lack. A typical example of snapshot use is to have a quick way of backing up the current state of the file system when a risky action like a software installation or a system upgrade is performed. If the action fails, the snapshot can be rolled back and the system has the same state as when the snapshot was created. If the upgrade was successful, the snapshot can be deleted to free up space. Without snapshots, a failed upgrade often requires a restore from backup, which is tedious, time consuming, and may require downtime during which the system cannot be used. Snapshots can be rolled back quickly, even while the system is running in normal operation, with little or no downtime. The time savings are enormous with multi-terabyte storage systems and the time required to copy the data from backup. Snapshots are not a replacement for a complete backup of a pool, but can be used as a quick and easy way to store a copy of the dataset at a specific point in time.
Snapshots are created with zfs snapshot
.
Adding dataset
@snapshotname
-r
creates a snapshot recursively,
with the same name on all child datasets.
Create a recursive snapshot of the entire pool:
#
zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT mypool 780M 93.2G 144K none mypool/ROOT 777M 93.2G 144K none mypool/ROOT/default 777M 93.2G 777M / mypool/tmp 176K 93.2G 176K /tmp mypool/usr 616K 93.2G 144K /usr mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/ports 144K 93.2G 144K /usr/ports mypool/usr/src 144K 93.2G 144K /usr/src mypool/var 1.29M 93.2G 616K /var mypool/var/crash 148K 93.2G 148K /var/crash mypool/var/log 178K 93.2G 178K /var/log mypool/var/mail 144K 93.2G 144K /var/mail mypool/var/newname 87.5K 93.2G 87.5K /var/newname mypool/var/newname@new_snapshot_name 0 - 87.5K - mypool/var/tmp 152K 93.2G 152K /var/tmp#
zfs snapshot -r
mypool@my_recursive_snapshot
#
zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool@my_recursive_snapshot 0 - 144K - mypool/ROOT@my_recursive_snapshot 0 - 144K - mypool/ROOT/default@my_recursive_snapshot 0 - 777M - mypool/tmp@my_recursive_snapshot 0 - 176K - mypool/usr@my_recursive_snapshot 0 - 144K - mypool/usr/home@my_recursive_snapshot 0 - 184K - mypool/usr/ports@my_recursive_snapshot 0 - 144K - mypool/usr/src@my_recursive_snapshot 0 - 144K - mypool/var@my_recursive_snapshot 0 - 616K - mypool/var/crash@my_recursive_snapshot 0 - 148K - mypool/var/log@my_recursive_snapshot 0 - 178K - mypool/var/mail@my_recursive_snapshot 0 - 144K - mypool/var/newname@new_snapshot_name 0 - 87.5K - mypool/var/newname@my_recursive_snapshot 0 - 87.5K - mypool/var/tmp@my_recursive_snapshot 0 - 152K -
Snapshots are not shown by a normal
zfs list
operation. To list snapshots,
-t snapshot
is appended to
zfs list
. -t all
displays both file systems and snapshots.
Snapshots are not mounted directly, so path is shown in
the MOUNTPOINT
column. There is no
mention of available disk space in the
AVAIL
column, as snapshots cannot be
written to after they are created. Compare the snapshot
to the original dataset from which it was created:
#
zfs list -rt all
NAME USED AVAIL REFER MOUNTPOINT mypool/usr/home 184K 93.2G 184K /usr/home mypool/usr/home@my_recursive_snapshot 0 - 184K -mypool/usr/home
Displaying both the dataset and the snapshot together reveals how snapshots work in COW fashion. They save only the changes (delta) that were made and not the complete file system contents all over again. This means that snapshots take little space when few changes are made. Space usage can be made even more apparent by copying a file to the dataset, then making a second snapshot:
#
cp
/etc/passwd
/var/tmp
#
zfs snapshotmypool/var/tmp
@after_cp
#
zfs list -rt all
NAME USED AVAIL REFER MOUNTPOINT mypool/var/tmp 206K 93.2G 118K /var/tmp mypool/var/tmp@my_recursive_snapshot 88K - 152K - mypool/var/tmp@after_cp 0 - 118K -mypool/var/tmp
The second snapshot contains only the changes to the
dataset after the copy operation. This yields enormous
space savings. Notice that the size of the snapshot
mypool/var/tmp@my_recursive_snapshot
also changed in the USED
column to indicate the changes between itself and the
snapshot taken afterwards.
ZFS provides a built-in command to compare the
differences in content between two snapshots. This is
helpful when many snapshots were taken over time and the
user wants to see how the file system has changed over time.
For example, zfs diff
lets a user find
the latest snapshot that still contains a file that was
accidentally deleted. Doing this for the two snapshots that
were created in the previous section yields this
output:
#
zfs list -rt all
NAME USED AVAIL REFER MOUNTPOINT mypool/var/tmp 206K 93.2G 118K /var/tmp mypool/var/tmp@my_recursive_snapshot 88K - 152K - mypool/var/tmp@after_cp 0 - 118K -mypool/var/tmp
#
zfs diff
M /var/tmp/ + /var/tmp/passwdmypool/var/tmp@my_recursive_snapshot
The command lists the changes between the specified
snapshot (in this case
)
and the live file system. The first column shows the
type of change:mypool/var/tmp@my_recursive_snapshot
+ | The path or file was added. |
- | The path or file was deleted. |
M | The path or file was modified. |
R | The path or file was renamed. |
Comparing the output with the table, it becomes clear
that
was added after the snapshot
passwd
was created. This also resulted in a modification to the
parent directory mounted at
mypool/var/tmp@my_recursive_snapshot
./var/tmp
Comparing two snapshots is helpful when using the ZFS replication feature to transfer a dataset to a different host for backup purposes.
Compare two snapshots by providing the full dataset name and snapshot name of both datasets:
#
cp /var/tmp/passwd /var/tmp/passwd.copy
#
zfs snapshot
mypool/var/tmp@diff_snapshot
#
zfs diff
M /var/tmp/ + /var/tmp/passwd + /var/tmp/passwd.copymypool/var/tmp@my_recursive_snapshot
mypool/var/tmp@diff_snapshot
#
zfs diff
M /var/tmp/ + /var/tmp/passwdmypool/var/tmp@my_recursive_snapshot
mypool/var/tmp@after_cp
A backup administrator can compare two snapshots received from the sending host and determine the actual changes in the dataset. See the Replication section for more information.
When at least one snapshot is available, it can be
rolled back to at any time. Most of the time this is the
case when the current state of the dataset is no longer
required and an older version is preferred. Scenarios such
as local development tests have gone wrong, botched system
updates hampering the system's overall functionality, or the
requirement to restore accidentally deleted files or
directories are all too common occurrences. Luckily,
rolling back a snapshot is just as easy as typing
zfs rollback
.
Depending on how many changes are involved, the operation
will finish in a certain amount of time. During that time,
the dataset always remains in a consistent state, much like
a database that conforms to ACID principles is performing a
rollback. This is happening while the dataset is live and
accessible without requiring a downtime. Once the snapshot
has been rolled back, the dataset has the same state as it
had when the snapshot was originally taken. All other data
in that dataset that was not part of the snapshot is
discarded. Taking a snapshot of the current state of the
dataset before rolling back to a previous one is a good idea
when some data is required later. This way, the user can
roll back and forth between snapshots without losing data
that is still valuable.snapshotname
In the first example, a snapshot is rolled back because
of a careless rm
operation that removes
too much data than was intended.
#
zfs list -rt all
NAME USED AVAIL REFER MOUNTPOINT mypool/var/tmp 262K 93.2G 120K /var/tmp mypool/var/tmp@my_recursive_snapshot 88K - 152K - mypool/var/tmp@after_cp 53.5K - 118K - mypool/var/tmp@diff_snapshot 0 - 120K -mypool/var/tmp
%
ls /var/tmp
passwd passwd.copy%
rm /var/tmp/passwd*
%
ls /var/tmp
vi.recover%
At this point, the user realized that too many files were deleted and wants them back. ZFS provides an easy way to get them back using rollbacks, but only when snapshots of important data are performed on a regular basis. To get the files back and start over from the last snapshot, issue the command:
#
zfs rollback
mypool/var/tmp@diff_snapshot
%
ls /var/tmp
passwd passwd.copy vi.recover
The rollback operation restored the dataset to the state of the last snapshot. It is also possible to roll back to a snapshot that was taken much earlier and has other snapshots that were created after it. When trying to do this, ZFS will issue this warning:
#
zfs list -rt snapshot
AME USED AVAIL REFER MOUNTPOINT mypool/var/tmp@my_recursive_snapshot 88K - 152K - mypool/var/tmp@after_cp 53.5K - 118K - mypool/var/tmp@diff_snapshot 0 - 120K -mypool/var/tmp
#
zfs rollback
cannot rollback to 'mypool/var/tmp@my_recursive_snapshot': more recent snapshots exist use '-r' to force deletion of the following snapshots: mypool/var/tmp@after_cp mypool/var/tmp@diff_snapshotmypool/var/tmp@my_recursive_snapshot
This warning means that snapshots exist between the
current state of the dataset and the snapshot to which the
user wants to roll back. To complete the rollback, these
snapshots must be deleted. ZFS cannot
track all the changes between different states of the
dataset, because snapshots are read-only.
ZFS will not delete the affected
snapshots unless the user specifies -r
to
indicate that this is the desired action. If that is the
intention, and the consequences of losing all intermediate
snapshots is understood, the command can be issued:
#
zfs rollback -r
mypool/var/tmp@my_recursive_snapshot
#
zfs list -rt snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool/var/tmp@my_recursive_snapshot 8K - 152K -mypool/var/tmp
%
ls /var/tmp
vi.recover
The output from zfs list -t snapshot
confirms that the intermediate snapshots
were removed as a result of
zfs rollback -r
.
Snapshots are mounted in a hidden directory under the
parent dataset:
.zfs/snapshots/
.
By default, these directories will not be displayed even
when a standard snapshotname
ls -a
is issued.
Although the directory is not displayed, it is there
nevertheless and can be accessed like any normal directory.
The property named snapdir
controls
whether these hidden directories show up in a directory
listing. Setting the property to visible
allows them to appear in the output of ls
and other commands that deal with directory contents.
#
zfs get snapdir
NAME PROPERTY VALUE SOURCE mypool/var/tmp snapdir hidden defaultmypool/var/tmp
%
ls -a /var/tmp
. .. passwd vi.recover#
zfs set snapdir=visible
mypool/var/tmp
%
ls -a /var/tmp
. .. .zfs passwd vi.recover
Individual files can easily be restored to a previous
state by copying them from the snapshot back to the parent
dataset. The directory structure below
.zfs/snapshot
has a directory named
exactly like the snapshots taken earlier to make it easier
to identify them. In the next example, it is assumed that a
file is to be restored from the hidden
.zfs
directory by copying it from the
snapshot that contained the latest version of the
file:
#
rm /var/tmp/passwd
%
ls -a /var/tmp
. .. .zfs vi.recover#
ls /var/tmp/.zfs/snapshot
after_cp my_recursive_snapshot#
ls /var/tmp/.zfs/snapshot/
passwd vi.recoverafter_cp
#
cp /var/tmp/.zfs/snapshot/
after_cp/passwd
/var/tmp
When ls .zfs/snapshot
was issued, the
snapdir
property might have been set to
hidden, but it would still be possible to list the contents
of that directory. It is up to the administrator to decide
whether these directories will be displayed. It is possible
to display these for certain datasets and prevent it for
others. Copying files or directories from this hidden
.zfs/snapshot
is simple enough. Trying
it the other way around results in this error:
#
cp
cp: /var/tmp/.zfs/snapshot/after_cp/rc.conf: Read-only file system/etc/rc.conf
/var/tmp/.zfs/snapshot/after_cp/
The error reminds the user that snapshots are read-only and can not be changed after creation. No files can be copied into or removed from snapshot directories because that would change the state of the dataset they represent.
Snapshots consume space based on how much the parent
file system has changed since the time of the snapshot. The
written
property of a snapshot tracks how
much space is being used by the snapshot.
Snapshots are destroyed and the space reclaimed with
zfs destroy
.
Adding dataset
@snapshot
-r
recursively removes all snapshots
with the same name under the parent dataset. Adding
-n -v
to the command displays a list of the
snapshots that would be deleted and an estimate of how much
space would be reclaimed without performing the actual
destroy operation.
A clone is a copy of a snapshot that is treated more like
a regular dataset. Unlike a snapshot, a clone is not read
only, is mounted, and can have its own properties. Once a
clone has been created using zfs clone
, the
snapshot it was created from cannot be destroyed. The
child/parent relationship between the clone and the snapshot
can be reversed using zfs promote
. After a
clone has been promoted, the snapshot becomes a child of the
clone, rather than of the original parent dataset. This will
change how the space is accounted, but not actually change the
amount of space consumed. The clone can be mounted at any
point within the ZFS file system hierarchy,
not just below the original location of the snapshot.
To demonstrate the clone feature, this example dataset is used:
#
zfs list -rt all
NAME USED AVAIL REFER MOUNTPOINT camino/home/joe 108K 1.3G 87K /usr/home/joe camino/home/joe@plans 21K - 85.5K - camino/home/joe@backup 0K - 87K -camino/home/joe
A typical use for clones is to experiment with a specific dataset while keeping the snapshot around to fall back to in case something goes wrong. Since snapshots can not be changed, a read/write clone of a snapshot is created. After the desired result is achieved in the clone, the clone can be promoted to a dataset and the old file system removed. This is not strictly necessary, as the clone and dataset can coexist without problems.
#
zfs clone
camino/home/joe
@backup
camino/home/joenew
#
ls /usr/home/joe*
/usr/home/joe: backup.txz plans.txt /usr/home/joenew: backup.txz plans.txt#
df -h /usr/home
Filesystem Size Used Avail Capacity Mounted on usr/home/joe 1.3G 31k 1.3G 0% /usr/home/joe usr/home/joenew 1.3G 31k 1.3G 0% /usr/home/joenew
After a clone is created it is an exact copy of the state
the dataset was in when the snapshot was taken. The clone can
now be changed independently from its originating dataset.
The only connection between the two is the snapshot.
ZFS records this connection in the property
origin
. Once the dependency between the
snapshot and the clone has been removed by promoting the clone
using zfs promote
, the
origin
of the clone is removed as it is now
an independent dataset. This example demonstrates it:
#
zfs get origin
NAME PROPERTY VALUE SOURCE camino/home/joenew origin camino/home/joe@backup -camino/home/joenew
#
zfs promote
camino/home/joenew
#
zfs get origin
NAME PROPERTY VALUE SOURCE camino/home/joenew origin - -camino/home/joenew
After making some changes like copying
loader.conf
to the promoted clone, for
example, the old directory becomes obsolete in this case.
Instead, the promoted clone can replace it. This can be
achieved by two consecutive commands: zfs
destroy
on the old dataset and zfs
rename
on the clone to name it like the old
dataset (it could also get an entirely different name).
#
cp
/boot/defaults/loader.conf
/usr/home/joenew
#
zfs destroy -f
camino/home/joe
#
zfs rename
camino/home/joenew
camino/home/joe
#
ls /usr/home/joe
backup.txz loader.conf plans.txt#
df -h
Filesystem Size Used Avail Capacity Mounted on usr/home/joe 1.3G 128k 1.3G 0% /usr/home/joe/usr/home
The cloned snapshot is now handled like an ordinary
dataset. It contains all the data from the original snapshot
plus the files that were added to it like
loader.conf
. Clones can be used in
different scenarios to provide useful features to ZFS users.
For example, jails could be provided as snapshots containing
different sets of installed applications. Users can clone
these snapshots and add their own applications as they see
fit. Once they are satisfied with the changes, the clones can
be promoted to full datasets and provided to end users to work
with like they would with a real dataset. This saves time and
administrative overhead when providing these jails.
Keeping data on a single pool in one location exposes
it to risks like theft and natural or human disasters. Making
regular backups of the entire pool is vital.
ZFS provides a built-in serialization
feature that can send a stream representation of the data to
standard output. Using this technique, it is possible to not
only store the data on another pool connected to the local
system, but also to send it over a network to another system.
Snapshots are the basis for this replication (see the section
on ZFS
snapshots). The commands used for replicating data
are zfs send
and
zfs receive
.
These examples demonstrate ZFS replication with these two pools:
#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backup 960M 77K 896M 0% 1.00x ONLINE - mypool 984M 43.7M 940M 4% 1.00x ONLINE -
The pool named mypool
is the
primary pool where data is written to and read from on a
regular basis. A second pool,
backup
is used as a standby in case
the primary pool becomes unavailable. Note that this
fail-over is not done automatically by ZFS,
but must be manually done by a system administrator when
needed. A snapshot is used to provide a consistent version of
the file system to be replicated. Once a snapshot of
mypool
has been created, it can be
copied to the backup
pool. Only
snapshots can be replicated. Changes made since the most
recent snapshot will not be included.
#
zfs snapshot
mypool
@backup1
#
zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool@backup1 0 - 43.6M -
Now that a snapshot exists, zfs send
can be used to create a stream representing the contents of
the snapshot. This stream can be stored as a file or received
by another pool. The stream is written to standard output,
but must be redirected to a file or pipe or an error is
produced:
#
zfs send
Error: Stream can not be written to a terminal. You must redirect standard output.mypool
@backup1
To back up a dataset with zfs send
,
redirect to a file located on the mounted backup pool. Ensure
that the pool has enough free space to accommodate the size of
the snapshot being sent, which means all of the data contained
in the snapshot, not just the changes from the previous
snapshot.
#
zfs send
mypool
@backup1
>/backup/backup1
#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backup 960M 63.7M 896M 6% 1.00x ONLINE - mypool 984M 43.7M 940M 4% 1.00x ONLINE -
The zfs send
transferred all the data
in the snapshot called backup1
to
the pool named backup
. Creating
and sending these snapshots can be done automatically with a
cron(8) job.
Instead of storing the backups as archive files,
ZFS can receive them as a live file system,
allowing the backed up data to be accessed directly. To get
to the actual data contained in those streams,
zfs receive
is used to transform the
streams back into files and directories. The example below
combines zfs send
and
zfs receive
using a pipe to copy the data
from one pool to another. The data can be used directly on
the receiving pool after the transfer is complete. A dataset
can only be replicated to an empty dataset.
#
zfs snapshot
mypool
@replica1
#
zfs send -v
send from @ to mypool@replica1 estimated size is 50.1M total estimated size is 50.1M TIME SENT SNAPSHOTmypool
@replica1
| zfs receivebackup/mypool
#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backup 960M 63.7M 896M 6% 1.00x ONLINE - mypool 984M 43.7M 940M 4% 1.00x ONLINE -
zfs send
can also determine the
difference between two snapshots and send only the
differences between the two. This saves disk space and
transfer time. For example:
#
zfs snapshot
mypool
@replica2
#
zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool@replica1 5.72M - 43.6M - mypool@replica2 0 - 44.1M -#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backup 960M 61.7M 898M 6% 1.00x ONLINE - mypool 960M 50.2M 910M 5% 1.00x ONLINE -
A second snapshot called
replica2
was created. This
second snapshot contains only the changes that were made to
the file system between now and the previous snapshot,
replica1
. Using
zfs send -i
and indicating the pair of
snapshots generates an incremental replica stream containing
only the data that has changed. This can only succeed if
the initial snapshot already exists on the receiving
side.
#
zfs send -v -i
send from @replica1 to mypool@replica2 estimated size is 5.02M total estimated size is 5.02M TIME SENT SNAPSHOTmypool
@replica1
mypool
@replica2
| zfs receive/backup/mypool
#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT backup 960M 80.8M 879M 8% 1.00x ONLINE - mypool 960M 50.2M 910M 5% 1.00x ONLINE -#
zfs list
NAME USED AVAIL REFER MOUNTPOINT backup 55.4M 240G 152K /backup backup/mypool 55.3M 240G 55.2M /backup/mypool mypool 55.6M 11.6G 55.0M /mypool#
zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT backup/mypool@replica1 104K - 50.2M - backup/mypool@replica2 0 - 55.2M - mypool@replica1 29.9K - 50.0M - mypool@replica2 0 - 55.0M -
The incremental stream was successfully transferred.
Only the data that had changed was replicated, rather than
the entirety of replica1
. Only
the differences were sent, which took much less time to
transfer and saved disk space by not copying the complete
pool each time. This is useful when having to rely on slow
networks or when costs per transferred byte must be
considered.
A new file system,
backup/mypool
, is available with
all of the files and data from the pool
mypool
. If -P
is specified, the properties of the dataset will be copied,
including compression settings, quotas, and mount points.
When -R
is specified, all child datasets of
the indicated dataset will be copied, along with all of
their properties. Sending and receiving can be automated so
that regular backups are created on the second pool.
Sending streams over the network is a good way to keep a remote backup, but it does come with a drawback. Data sent over the network link is not encrypted, allowing anyone to intercept and transform the streams back into data without the knowledge of the sending user. This is undesirable, especially when sending the streams over the internet to a remote host. SSH can be used to securely encrypt data send over a network connection. Since ZFS only requires the stream to be redirected from standard output, it is relatively easy to pipe it through SSH. To keep the contents of the file system encrypted in transit and on the remote system, consider using PEFS.
A few settings and security precautions must be
completed first. Only the necessary steps required for the
zfs send
operation are shown here. For
more information on SSH, see
Section 14.8, “OpenSSH”.
This configuration is required:
Passwordless SSH access between sending and receiving host using SSH keys
Normally, the privileges of the
root
user are
needed to send and receive streams. This requires
logging in to the receiving system as
root
.
However, logging in as
root
is
disabled by default for security reasons. The
ZFS Delegation
system can be used to allow a
non-root
user
on each system to perform the respective send and
receive operations.
On the sending system:
#
zfs allow -u someuser send,snapshot
mypool
To mount the pool, the unprivileged user must own the directory, and regular users must be allowed to mount file systems. On the receiving system:
#
sysctl vfs.usermount=1 vfs.usermount: 0 -> 1#
echo vfs.usermount=1 >> /etc/sysctl.conf#
zfs create
recvpool/backup
#
zfs allow -u
someuser
create,mount,receiverecvpool/backup
#
chownsomeuser
/recvpool/backup
The unprivileged user now has the ability to receive and
mount datasets, and the home
dataset can be replicated to the remote system:
%
zfs snapshot -r
mypool/home
@monday
%
zfs send -R
mypool/home
@monday
| sshsomeuser@backuphost
zfs recv -dvurecvpool/backup
A recursive snapshot called
monday
is made of the file system
dataset home
that resides on the
pool mypool
. Then it is sent
with zfs send -R
to include the dataset,
all child datasets, snaphots, clones, and settings in the
stream. The output is piped to the waiting
zfs receive
on the remote host
backuphost
through
SSH. Using a fully qualified
domain name or IP address is recommended. The receiving
machine writes the data to the
backup
dataset on the
recvpool
pool. Adding
-d
to zfs recv
overwrites the name of the pool on the receiving side with
the name of the snapshot. -u
causes the
file systems to not be mounted on the receiving side. When
-v
is included, more detail about the
transfer is shown, including elapsed time and the amount of
data transferred.
Dataset quotas are used to restrict the amount of space that can be consumed by a particular dataset. Reference Quotas work in very much the same way, but only count the space used by the dataset itself, excluding snapshots and child datasets. Similarly, user and group quotas can be used to prevent users or groups from using all of the space in the pool or dataset.
To enforce a dataset quota of 10 GB for
storage/home/bob
:
#
zfs set quota=10G storage/home/bob
To enforce a reference quota of 10 GB for
storage/home/bob
:
#
zfs set refquota=10G storage/home/bob
To remove a quota of 10 GB for
storage/home/bob
:
#
zfs set quota=none storage/home/bob
The general format is
userquota@
,
and the user's name must be in one of these formats:user
=size
POSIX compatible name such as
joe
.
POSIX numeric ID such as
789
.
SID name
such as
[email protected]
.
SID
numeric ID such as
S-1-123-456-789
.
For example, to enforce a user quota of 50 GB for the
user named joe
:
#
zfs set userquota@joe=50G
To remove any quota:
#
zfs set userquota@joe=none
User quota properties are not displayed by
zfs get all
.
Non-root
users can
only see their own quotas unless they have been granted the
userquota
privilege. Users with this
privilege are able to view and set everyone's quota.
The general format for setting a group quota is:
groupquota@
.group
=size
To set the quota for the group
firstgroup
to 50 GB,
use:
#
zfs set groupquota@firstgroup=50G
To remove the quota for the group
firstgroup
, or to make sure that
one is not set, instead use:
#
zfs set groupquota@firstgroup=none
As with the user quota property,
non-root
users can
only see the quotas associated with the groups to which they
belong. However,
root
or a user with
the groupquota
privilege can view and set
all quotas for all groups.
To display the amount of space used by each user on
a file system or snapshot along with any quotas, use
zfs userspace
. For group information, use
zfs groupspace
. For more information about
supported options or how to display only specific options,
refer to zfs(1).
Users with sufficient privileges, and
root
, can list the
quota for storage/home/bob
using:
#
zfs get quota storage/home/bob
Reservations guarantee a minimum amount of space will always be available on a dataset. The reserved space will not be available to any other dataset. This feature can be especially useful to ensure that free space is available for an important dataset or log files.
The general format of the reservation
property is
reservation=
,
so to set a reservation of 10 GB on
size
storage/home/bob
, use:
#
zfs set reservation=10G storage/home/bob
To clear any reservation:
#
zfs set reservation=none storage/home/bob
The same principle can be applied to the
refreservation
property for setting a
Reference
Reservation, with the general format
refreservation=
.size
This command shows any reservations or refreservations
that exist on storage/home/bob
:
#
zfs get reservation storage/home/bob
#
zfs get refreservation storage/home/bob
ZFS provides transparent compression. Compressing data at the block level as it is written not only saves space, but can also increase disk throughput. If data is compressed by 25%, but the compressed data is written to the disk at the same rate as the uncompressed version, resulting in an effective write speed of 125%. Compression can also be a great alternative to Deduplication because it does not require additional memory.
ZFS offers several different compression algorithms, each with different trade-offs. With the introduction of LZ4 compression in ZFS v5000, it is possible to enable compression for the entire pool without the large performance trade-off of other algorithms. The biggest advantage to LZ4 is the early abort feature. If LZ4 does not achieve at least 12.5% compression in the first part of the data, the block is written uncompressed to avoid wasting CPU cycles trying to compress data that is either already compressed or uncompressible. For details about the different compression algorithms available in ZFS, see the Compression entry in the terminology section.
The administrator can monitor the effectiveness of compression using a number of dataset properties.
#
zfs get used,compressratio,compression,logicalused
NAME PROPERTY VALUE SOURCE mypool/compressed_dataset used 449G - mypool/compressed_dataset compressratio 1.11x - mypool/compressed_dataset compression lz4 local mypool/compressed_dataset logicalused 496G -mypool/compressed_dataset
The dataset is currently using 449 GB of space (the
used property). Without compression, it would have taken
496 GB of space (the logicallyused
property). This results in the 1.11:1 compression
ratio.
Compression can have an unexpected side effect when
combined with
User Quotas.
User quotas restrict how much space a user can consume on a
dataset, but the measurements are based on how much space is
used after compression. So if a user has
a quota of 10 GB, and writes 10 GB of compressible
data, they will still be able to store additional data. If
they later update a file, say a database, with more or less
compressible data, the amount of space available to them will
change. This can result in the odd situation where a user did
not increase the actual amount of data (the
logicalused
property), but the change in
compression caused them to reach their quota limit.
Compression can have a similar unexpected interaction with backups. Quotas are often used to limit how much data can be stored to ensure there is sufficient backup space available. However since quotas do not consider compression, more data may be written than would fit with uncompressed backups.
When enabled, deduplication uses the checksum of each block to detect duplicate blocks. When a new block is a duplicate of an existing block, ZFS writes an additional reference to the existing data instead of the whole duplicate block. Tremendous space savings are possible if the data contains many duplicated files or repeated information. Be warned: deduplication requires an extremely large amount of memory, and most of the space savings can be had without the extra cost by enabling compression instead.
To activate deduplication, set the
dedup
property on the target pool:
#
zfs set dedup=on
pool
Only new data being written to the pool will be deduplicated. Data that has already been written to the pool will not be deduplicated merely by activating this option. A pool with a freshly activated deduplication property will look like this example:
#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT pool 2.84G 2.19M 2.83G 0% 1.00x ONLINE -
The DEDUP
column shows the actual rate
of deduplication for the pool. A value of
1.00x
shows that data has not been
deduplicated yet. In the next example, the ports tree is
copied three times into different directories on the
deduplicated pool created above.
#
zpool list
for d in dir1 dir2 dir3; do for> mkdir $d && cp -R /usr/ports $d & for> done
Redundant data is detected and deduplicated:
#
zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT pool 2.84G 20.9M 2.82G 0% 3.00x ONLINE -
The DEDUP
column shows a factor of
3.00x
. Multiple copies of the ports tree
data was detected and deduplicated, using only a third of the
space. The potential for space savings can be enormous, but
comes at the cost of having enough memory to keep track of the
deduplicated blocks.
Deduplication is not always beneficial, especially when the data on a pool is not redundant. ZFS can show potential space savings by simulating deduplication on an existing pool:
#
zdb -S
Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 2.58M 289G 264G 264G 2.58M 289G 264G 264G 2 206K 12.6G 10.4G 10.4G 430K 26.4G 21.6G 21.6G 4 37.6K 692M 276M 276M 170K 3.04G 1.26G 1.26G 8 2.18K 45.2M 19.4M 19.4M 20.0K 425M 176M 176M 16 174 2.83M 1.20M 1.20M 3.33K 48.4M 20.4M 20.4M 32 40 2.17M 222K 222K 1.70K 97.2M 9.91M 9.91M 64 9 56K 10.5K 10.5K 865 4.96M 948K 948K 128 2 9.50K 2K 2K 419 2.11M 438K 438K 256 5 61.5K 12K 12K 1.90K 23.0M 4.47M 4.47M 1K 2 1K 1K 1K 2.98K 1.49M 1.49M 1.49M Total 2.82M 303G 275G 275G 3.20M 319G 287G 287G dedup = 1.05, compress = 1.11, copies = 1.00, dedup * compress / copies = 1.16pool
After zdb -S
finishes analyzing the
pool, it shows the space reduction ratio that would be
achieved by activating deduplication. In this case,
1.16
is a very poor space saving ratio that
is mostly provided by compression. Activating deduplication
on this pool would not save any significant amount of space,
and is not worth the amount of memory required to enable
deduplication. Using the formula
ratio = dedup * compress / copies,
system administrators can plan the storage allocation,
deciding whether the workload will contain enough duplicate
blocks to justify the memory requirements. If the data is
reasonably compressible, the space savings may be very good.
Enabling compression first is recommended, and compression can
also provide greatly increased performance. Only enable
deduplication in cases where the additional savings will be
considerable and there is sufficient memory for the DDT.
zfs jail
and the corresponding
jailed
property are used to delegate a
ZFS dataset to a
Jail.
zfs jail
attaches a dataset to the specified jail, and
jailid
zfs unjail
detaches it. For the dataset to
be controlled from within a jail, the
jailed
property must be set. Once a
dataset is jailed, it can no longer be mounted on the
host because it may have mount points that would compromise
the security of the host.
All FreeBSD documents are available for download at http://ftp.FreeBSD.org/pub/FreeBSD/doc/
Questions that are not answered by the
documentation may be
sent to <[email protected]>.
Send questions about this document to <[email protected]>.