Regression tests are used to exercise a particular bit of the system to check that it works as expected, and to make sure that old bugs are not reintroduced.
The FreeBSD regression testing tools can be found in the FreeBSD
source tree in the directory src/tools/regression
.
This section contains hints for doing proper micro-benchmarking on FreeBSD or of FreeBSD itself.
It is not possible to use all of the suggestions below every single time, but the more used, the better the benchmark's ability to test small differences will be.
Disable APM and any other kind of clock fiddling (ACPI ?).
Run in single user mode. E.g., cron(8), and other
daemons only add noise. The sshd(8) daemon can also
cause problems. If ssh access is required during testing
either disable the SSHv1 key regeneration, or kill the
parent sshd
daemon during the
tests.
Do not run ntpd(8).
If syslog(3) events are generated, run
syslogd(8) with an empty
/etc/syslogd.conf
, otherwise, do not
run it.
Minimize disk-I/O, avoid it entirely if possible.
Do not mount file systems that are not needed.
Mount /
,
/usr
, and any other
file system as read-only if possible. This removes atime
updates to disk (etc.) from the I/O picture.
Reinitialize the read/write test file system with
newfs(8) and populate it from a tar(1) or
dump(8) file before every run. Unmount and mount it
before starting the test. This results in a consistent file
system layout. For a worldstone test this would apply to
/usr/obj
(just
reinitialize with newfs
and mount). To
get 100% reproducibility, populate the file system from a
dd(1) file (i.e.: dd
if=myimage of=/dev/ad0s1h
bs=1m
)
Use malloc backed or preloaded md(4) partitions.
Reboot between individual iterations of the test, this gives a more consistent state.
Remove all non-essential device drivers from the kernel. For instance if USB is not needed for the test, do not put USB in the kernel. Drivers which attach often have timeouts ticking away.
Unconfigure hardware that are not in use. Detach disks with atacontrol(8) and camcontrol(8) if the disks are not used for the test.
Do not configure the network unless it is being tested, or wait until after the test has been performed to ship the results off to another computer.
If the system must be connected to a public network, watch out for spikes of broadcast traffic. Even though it is hardly noticeable, it will take up CPU cycles. Multicast has similar caveats.
Put each file system on its own disk. This minimizes jitter from head-seek optimizations.
Minimize output to serial or VGA consoles. Running output into files gives less jitter. (Serial consoles easily become a bottleneck.) Do not touch keyboard while the test is running, even space or back-space shows up in the numbers.
Make sure the test is long enough, but not too long. If the test is too short, timestamping is a problem. If it is too long temperature changes and drift will affect the frequency of the quartz crystals in the computer. Rule of thumb: more than a minute, less than an hour.
Try to keep the temperature as stable as possible around
the machine. This affects both quartz crystals and disk
drive algorithms. To get real stable clock, consider
stabilized clock injection. E.g., get a OCXO + PLL, inject
output into clock circuits instead of motherboard xtal.
Contact Poul-Henning Kamp <[email protected]>
for more information about this.
Run the test at least 3 times but it is better to run more than 20 times both for “before” and “after” code. Try to interleave if possible (i.e.: do not run 20 times before then 20 times after), this makes it possible to spot environmental effects. Do not interleave 1:1, but 3:3, this makes it possible to spot interaction effects.
A good pattern is: bababa{bbbaaa}*
.
This gives hint after the first 1+1 runs (so it is possible
to stop the test if it goes entirely the wrong way), a
standard deviation after the first 3+3 (gives a good
indication if it is going to be worth a long run) and
trending and interaction numbers later on.
Use ministat(1) to see if the numbers are significant. Consider buying “Cartoon guide to statistics” ISBN: 0062731025, highly recommended, if you have forgotten or never learned about standard deviation and Student's T.
Do not use background fsck(8) unless the test is a
benchmark of background fsck
. Also,
disable background_fsck
in
/etc/rc.conf
unless the benchmark is
not started at least 60+“fsck
runtime” seconds after the boot, as rc(8)
wakes up and checks if fsck
needs to run
on any file systems when background fsck
is enabled. Likewise, make sure there are no snapshots
lying around unless the benchmark is a test with
snapshots.
If the benchmark show unexpected bad performance, check
for things like high interrupt volume from an unexpected
source. Some versions of ACPI have been
reported to “misbehave” and generate excess
interrupts. To help diagnose odd test results, take a few
snapshots of vmstat -i
and look for
anything unusual.
Make sure to be careful about optimization parameters for kernel and userspace, likewise debugging. It is easy to let something slip through and realize later the test was not comparing the same thing.
Do not ever benchmark with the
WITNESS
and INVARIANTS
kernel options enabled unless the test is interested to
benchmarking those features. WITNESS
can
cause 400%+ drops in performance. Likewise, userspace
malloc(3) parameters default differently in -CURRENT
from the way they ship in production releases.
All FreeBSD documents are available for download at http://ftp.FreeBSD.org/pub/FreeBSD/doc/
Questions that are not answered by the
documentation may be
sent to <[email protected]>.
Send questions about this document to <[email protected]>.