It has been our experience that when a drive is about to fail, error messages will spew into /var/log/kern.log. There is a script called swift-drive-audit that can be run via cron to watch for bad drives. If errors are detected, it will unmount the bad drive, so that Object Storage can work around it. The script takes a configuration file with the following settings:
Configuration option = Default value | Description |
---|---|
device_dir = /srv/node | Directory devices are mounted under |
log_facility = LOG_LOCAL0 | Syslog log facility |
log_level = INFO | Logging level |
log_address = /dev/log | Location where syslog sends the logs to |
minutes = 60 | Number of minutes to look back in `/var/log/kern.log` |
error_limit = 1 | Number of errors to find before a device is unmounted |
log_file_pattern = /var/log/kern* | Location of the log file with globbing pattern to check against device errors locate device blocks with errors in the log file |
regex_pattern_1 = \berror\b.*\b(dm-[0-9]{1,2}\d?)\b | No help text available for this option. |
This script has only been tested on Ubuntu 10.04, so if you are using a different distro or OS, some care should be taken before using in production.