Table of Contents
In this chapter it is described how you can monitor the performance of your LEAF system in near real-time using SNMP and RRD.
The setup that is described here assumes that you have at least two systems, the LEAF system that you want to monitor and a system that will collect, store and present the performance data. In the rest of this chapter these systems will be indicated as the LEAF system and the RRD system.
The RRD system will query the LEAF system on regular intervals via snmp. The collected data is stored in an RRD database. The performance data can be presented in a number of ways. Here it will be presented using a webserver with php scripts containing rrdtool functions.
The setup and configuration of the LEAF system is simple compared to the setup and configuration of the RRD system. All that is needed on the LEAF system is an SNMP agent. The RRD system can be made as simple or advanced as desired by the user. At least the following functionalities must be present on the RRD system
SNMP client to query the SNMP agent in the LEAF system
Database to store and retrieve the measured data
The SNMP client and agent functions in this sample are provided by the Net-SNMP package. The database for storing the measured data is based on RRDTool. In the next sections a short overview of these toolkits is given.
The Net-SNMP toolkit provides a suite of client and server applications that communicate with each other using the Simple Network Management Protocol (SNMP).
One of the server applications is snmpd
,
which is an SNMP Agent. snmpd
listens for SNMP
requests. A typical SNMP agent allows a client to query information
about the device running the SNMP agent. Some devices also allow
configuration to be set via SNMP.
The Net-SNMP agent can be built to monitor things such as network traffic,disk space, disk IO, CPU usage and more.
Next to the server part, the client part is needed. In this example the Perl libraries of Net-SNMP are used for the client part. Perl scripts on the RRD system are used to collect the performance data from the LEAF system.
RRD is the Acronym for Round Robin Database. RRD is a system to store and display time-series data (i.e. network bandwidth, machine-room temperature, server load average). It stores the data in a very compact way that will not expand over time, and it presents useful graphs by processing the data to enforce a certain data density. It can be used either via simple wrapper scripts (from shell or Perl) or via front-ends that poll network devices and put a friendly user interface on it.
In the rest of this document it is assumed that you have at least read the "RRD Beginners Guide" and the "RRD Tutorial" from the RRDTool documentation page.
Edit leaf.cfg
and add
snmpd
, libsnmp
and
libm
to the packages list:
root,config,etc, ... , libm,libsnmp,netsnmpd
Either reboot the system or load the new packages manually.
Edit the configuration file
/etc/snmp/snmpd.conf
. A sample configuration is
given below. This sample does not contain all the helpful comments from
the original configuration file, so I suggest you use this to edit your
existing configuration file.
# # snmpd.conf # syscontact "Root <[email protected]>" syslocation "At the end of the Universe" sysname leafhost sysservices 15 rocommunity zaphod default com2sec readonly default zaphod group RO_Group usm readonly group RO_Group v1 readonly group RO_Group v2c readonly view all included .1 access RO_Group "" any noauth exact all none none #
Now backup the netsnmpd
package and (re)start
start snmpd with svi snmpd restart.
For the examples given here the following items must be installed on the RRD system.
Perl::SNMP - Net-SNMP module for Perl (source: netsnmp.sourceforge.net)
Perl::RRDs - RRDTool module for Perl, use perl-shared not perl-piped (source: people.ee.ethz.ch/~oetiker/webtools/rrdtool)
Apache with PHP4 - Webserver for presentation of the performance data (source: www.apache.org, www.php.org)
Php4-rrdtool - RRDTool module for PHP4 (source: www.joeym.net)
For the rest of this document it is assumed that you are running Linux on your RRD system. This is not the only possible option, the necessary items are also available for other types of systems. It is beyond the scope of this document to describe where to get the above mentioned items precompiled for your system and how to install them. Refer to the documentation of your distribution and/or the documentation of the individual sources for more information.
In this chapter the terms collector and database will be used frequently. The collector is the script that queries the LEAF system via SNMP and stores the retrieved values in a database, in this case an RRD database.
An RRD database can be defined to contain all sorts of information, datasets, in any combination you like. It is in general good practice to keep information of different types in different databases, but you'll have to find out for yourself which dataset definition will give you the most flexible solution for your situation.
In the following examples two datasets will be defined, one for network traffic statistics and one for cpuload.
Personally I like to structure the RRD related directories in such a way that there is a clear distinction between collectors and databases, and also between databases belonging to different hosts. In these examples the following directory structure is assumed:
/home/rrd/ | +--- collectors/ | +--- databases/ | +--- leafhost/ | +--- host2/ | ... etc ...
After defining a database and creating the corresponding
collector, the collector must be scheduled to run at regular
intervals. This must be done for each collector/database. Cron is your
friend here. An option that I favor myself is to have only one entry
in /etc/crontab
. This entry calls the overall
collector script, which in turn calls each of the individual collector
scripts. This avoids that for each new collector the system crontab
file must be edited. In this case your
/etc/crontab
would have the following
entry:
# /etc/crontab ... # overall collector script */5 * * * * rrd /home/rrd/collectors/collect-all #
This means that the overall collector script is started every 5
minutes. The overall collector file
/home/rrd/collectors/collect-all
could look
like:
#!/bin/sh # Overall collector script # Script for collecting interface statistics /home/rrd/collectors/interface.pl # Script for collecting cpu load /home/rrd/collectors/cpuload.pl
If the number of interfaces on the LEAF system is fixed and will never change, you may choose to keep the traffic statistics of both interfaces in one database. If not, it's probably easier to define a database per interface. This makes it easier extend your RRD system for more interfaces that you may get on your LEAF system. Here a database for only one interface is created.
To create a new database, go to the data directory for the targeted host and create the dataset with the options as described below:
cd /home/rrd/databases/leafhost rrdtool create eth0.rrd \ -step 300 \ DS:bytes_in:COUNTER:600:U:U \ DS:bytes_out:COUNTER:600:U:U \ RRA:AVERAGE:0.5:1:864 \ RRA:AVERAGE:0.5:6:672 \ RRA:AVERAGE:0.5:24:744 \ RRA:AVERAGE:0.5:288:730
This has created a new database named
eth0.rrd
which expects new data every 300
seconds (step size). This is exactly the same as the schedule
defined in the crontab file above.
The database contains two datasets, i.e.
bytes_in
and bytes_out
,
both of the type COUNTER.
Three round robin archives are defined containing avaraged values:
864 samples of 1 step (5 seconds). This is a period of 3 days. Since the step size is one the actual value is stored and no average is calculated.
672 averaged samples over 6 steps (30 minutes). This is a period of 2 weeks.
744 averaged samples over 24 steps (2 hours). This a period of 2 weeks.
730 averaged samples over 288 steps (1 day). This is a period of 2 years.
The data that can be retrieved from an SNMP agent is defined in a Management Information Base MIB). The objects in the MIB containing the interface traffic counters that are necessary for this example are:
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifNumber = .1.3.6.1.2.1.2.1
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr = .1.3.6.1.2.1.2.2.1.2
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets = .1.3.6.1.2.1.2.2.1.10
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifOutOctets = .1.3.6.1.2.1.2.2.1.16
In the sample script below the LEAF system is queried for the number of interfaces. The correct interface is selected based on the interface name and then the counters for bytes_in and bytes_out are read. Finally this information is stored into the database.
#!/usr/bin/perl # interface.pl use SNMP; use RRDs; $oid_ifNumber = ".1.3.6.1.2.1.2.1"; $oid_ifDescr = ".1.3.6.1.2.1.2.2.1.2"; $oid_ifInOctets = ".1.3.6.1.2.1.2.2.1.10"; $oid_ifOutOctets = ".1.3.6.1.2.1.2.2.1.16"; $database = "/home/rrd/databases/leafhost/eth0.rrd"; # # Open snmp session and get interface data # $session = new SNMP::Session( DestHost => "leafhost", Community => "zaphod", Version => '2'); die "SNMP session creation error: $SNMP::Session::ErrorStr" unless (defined $session); $numInts = $session->get($oid_ifNumber . ".0"); for $i (1..$numInts) { $name = $session->get($oid_ifDescr . "." . $i); if ( $name eq "eth0" ) { $in = $session->get($oid_ifInOctets . "." . $i); $out = $session->get($oid_ifOutOctets . "." . $i); } } die $session->{ErrorStr} if ($session->{ErrorStr}); # # Update the database # RRDs::update ($database, "N:".$in.":".$out); my $Err = RRDs::error; die "Error while updating: $Err\n" if $Err; #
Ofcourse this is only an example. You can use this to extend it to your own needs.
On Linux systems three types of cpu load (process time) exist, i.e. user, system, nice and idle. We will now define a database in which to store this information.
cd /home/rrd/databases/leafhost rrdtool create cpuload.rrd \ --step 300 \ DS:user:COUNTER:600:0:100 \ DS:system:COUNTER:600:0:100 \ DS:nice:COUNTER:600:0:100 \ DS:idle:COUNTER:600:0:100 \ RRA:AVERAGE:0.5:1:864 \ RRA:AVERAGE:0.5:6:672 \ RRA:AVERAGE:0.5:24:744 \ RRA:AVERAGE:0.5:288:730
The definition of this database has much in common with the previous database. Now four datasets have been defined instead of two. The definition of the round robin archives is the same.
The cpu load information is represented by the following objects in the MIB:
.iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawUser = .1.3.6.1.4.1.2021.11.50
.iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawNice = .1.3.6.1.4.1.2021.11.51
.iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawSystem = .1.3.6.1.4.1.2021.11.52
.iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawIdle = .1.3.6.1.4.1.2021.11.53
And this information can be retrieved and stored with the following script:
#!/usr/bin/perl # cpuload.pl use SNMP; use RRDs; $oid_ssCpuRawUser = ".1.3.6.1.4.1.2021.11.50"; $oid_ssCpuRawSystem = ".1.3.6.1.4.1.2021.11.51"; $oid_ssCpuRawNice = ".1.3.6.1.4.1.2021.11.52"; $oid_ssCpuRawIdle = ".1.3.6.1.4.1.2021.11.53"; $database = "/home/rrd/databases/leafhost/cpuload.rrd"; # # Open snmp session and get interface data # $session = new SNMP::Session( DestHost => "leafhost", Community => "zaphod", Version => '2'); die "SNMP session creation error: $SNMP::Session::ErrorStr" unless (defined $session); $cpuUser = $session->get($oid_ssCpuRawUser . ".0"); $cpuSystem = $session->get($oid_ssCpuRawSystem . ".0"); $cpuNice = $session->get($oid_ssCpuRawNice . ".0"); $cpuIdle = $session->get($oid_ssCpuRawIdle . ".0"); # # Update the database # RRDs::update ($database, "N:".$cpuUser.":".$cpuSystem.":".$cpuNice.":".$cpuIdle); my $Err = RRDs::error; die "Error while updating: $Err\n" if $Err; #
After you finished the scripts and the overall collector has been called a few times by cron, it's time to make some graphics.
The follwoing assumptions are made with respect to the configuration of the webserver:
An alias /images/
is defined for
/var/www/images/
The images directory has a subdirectory
rrdimg
in which the rrd graphs will be
created
For ease of reuse a separate php file is used in which the generic functions for drawing graphs are defined. This file is included by the other scripts.
First a file graphs.php
is defined that
contains the functions to draw the graphs.
<?php ## graphs.php ## ## A set of php functions to create rrd graphs function interface ($start) { $database = "/home/rrd/databases/leafhost/eth0.rrd"; $imgfile = "eth0$start.gif"; $opts = array( "--start", "$start", "--vertical-label", "Bytes/sec", "--width", "400", "DEF:in=$database:bytes_in:AVERAGE", "DEF:out=$database:bytes_out:AVERAGE", "LINE2:in#00ff00:In", "LINE2:out#ff0000:Out" ); make_graph ($imgfile, $opts); } function make_graph ($file, $options) { $ret = rrd_graph("/var/www/images/rrdimg/$file", $options, count($options)); ## if $ret is an array, then rrd_graph was successful ## if ( is_array($ret) ) { echo "<img src=\"/images/rrdimg/$file\" border=0>"; } else { $err = rrd_error(); echo "<p><b>$err</b></p>"; } } ?>
Then the actual page that contains the network traffic graphs can be created.
<html> <head> <title>Interface statistics</title> </head> <body> <h1>Interface statistics</h1> <?php require "graphs.php"; print "<h2>Daily graph</h2>\n"; interface ("-1d"); print "<h2>Weekly graph</h2>\n"; interface ("-1w"); print "<h2>Monthly graph</h2>\n"; interface ("-1m"); ?> </body> </html>
Now fire-up your browser and access the page that you just created. Sit back and enjoy !!
First we add a function to draw cpuload garphs to the file
graphs.php
.
<?php ## functions.php ## ## A set of php functions to create rrd graphs ... function cpuload ($start) { $database = "/home/rrd/databases/leafhost/cpuload.rrd"; $imgfile = "cpu$start.gif"; $opts = array( "--start", "$start", "--vertical-label", "Load (%)", "--width", "400", "DEF:user=$database:user:AVERAGE", "DEF:nice=$database:nice:AVERAGE", "DEF:system=$database:system:AVERAGE", "AREA:system#00ffff:System", "STACK:user#00ff00:User", "STACK:nice#0000ff:Nice", ); make_graph ($imgfile, $opts); } ?>
And then the actual CPU load page is created. This is almost too
easy ;-)
<html> <head> <title>CPU Load statistics</title> </head> <body> <h1>CPU Load statistics</h1> <?php require "graphs.php"; print "<h2>Daily graph</h2>\n"; cpuload ("-1d"); print "<h2>Weekly graph</h2>\n"; cpuload ("-1w"); print "<h2>Monthly graph</h2>\n"; cpuload ("-1m"); ?> </body> </html>