Chapter 11. Using SNMP and RRD to monitor your LEAF system

Revision History
Revision 0.12004-10-18ET
Initial Document

Table of Contents

Introduction
Configure the LEAF system
Configure the RRD machine

Introduction

Objectives

In this chapter it is described how you can monitor the performance of your LEAF system in near real-time using SNMP and RRD.

Overview of the setup described here

The setup that is described here assumes that you have at least two systems, the LEAF system that you want to monitor and a system that will collect, store and present the performance data. In the rest of this chapter these systems will be indicated as the LEAF system and the RRD system.

The RRD system will query the LEAF system on regular intervals via snmp. The collected data is stored in an RRD database. The performance data can be presented in a number of ways. Here it will be presented using a webserver with php scripts containing rrdtool functions.

The setup and configuration of the LEAF system is simple compared to the setup and configuration of the RRD system. All that is needed on the LEAF system is an SNMP agent. The RRD system can be made as simple or advanced as desired by the user. At least the following functionalities must be present on the RRD system

  • SNMP client to query the SNMP agent in the LEAF system

  • Database to store and retrieve the measured data

The SNMP client and agent functions in this sample are provided by the Net-SNMP package. The database for storing the measured data is based on RRDTool. In the next sections a short overview of these toolkits is given.

About Net-SNMP

The Net-SNMP toolkit provides a suite of client and server applications that communicate with each other using the Simple Network Management Protocol (SNMP).

One of the server applications is snmpd, which is an SNMP Agent. snmpd listens for SNMP requests. A typical SNMP agent allows a client to query information about the device running the SNMP agent. Some devices also allow configuration to be set via SNMP.

The Net-SNMP agent can be built to monitor things such as network traffic,disk space, disk IO, CPU usage and more.

Next to the server part, the client part is needed. In this example the Perl libraries of Net-SNMP are used for the client part. Perl scripts on the RRD system are used to collect the performance data from the LEAF system.

About RRDTool

RRD is the Acronym for Round Robin Database. RRD is a system to store and display time-series data (i.e. network bandwidth, machine-room temperature, server load average). It stores the data in a very compact way that will not expand over time, and it presents useful graphs by processing the data to enforce a certain data density. It can be used either via simple wrapper scripts (from shell or Perl) or via front-ends that poll network devices and put a friendly user interface on it.

In the rest of this document it is assumed that you have at least read the "RRD Beginners Guide" and the "RRD Tutorial" from the RRDTool documentation page.

Configure the LEAF system

Load netsnmpd package

Edit leaf.cfg and add snmpd, libsnmp and libm to the packages list:

root,config,etc, ... , libm,libsnmp,netsnmpd

Either reboot the system or load the new packages manually.

Configure the snmp daemon

Edit the configuration file /etc/snmp/snmpd.conf. A sample configuration is given below. This sample does not contain all the helpful comments from the original configuration file, so I suggest you use this to edit your existing configuration file.

#
# snmpd.conf
#

syscontact  "Root <[email protected]>"
syslocation  "At the end of the Universe"
sysname leafhost
sysservices 15

rocommunity  zaphod     default
com2sec      readonly   default    zaphod
group        RO_Group   usm   readonly
group        RO_Group   v1    readonly
group        RO_Group   v2c   readonly
view         all    included   .1

access  RO_Group   ""       any        noauth     exact  all     none   none

# 

Now backup the netsnmpd package and (re)start start snmpd with svi snmpd restart.

Configure the RRD machine

Prerequisites

For the examples given here the following items must be installed on the RRD system.

For the rest of this document it is assumed that you are running Linux on your RRD system. This is not the only possible option, the necessary items are also available for other types of systems. It is beyond the scope of this document to describe where to get the above mentioned items precompiled for your system and how to install them. Refer to the documentation of your distribution and/or the documentation of the individual sources for more information.

Collecting and storing performance data

Introduction

In this chapter the terms collector and database will be used frequently. The collector is the script that queries the LEAF system via SNMP and stores the retrieved values in a database, in this case an RRD database.

An RRD database can be defined to contain all sorts of information, datasets, in any combination you like. It is in general good practice to keep information of different types in different databases, but you'll have to find out for yourself which dataset definition will give you the most flexible solution for your situation.

In the following examples two datasets will be defined, one for network traffic statistics and one for cpuload.

Personally I like to structure the RRD related directories in such a way that there is a clear distinction between collectors and databases, and also between databases belonging to different hosts. In these examples the following directory structure is assumed:

/home/rrd/
       |
       +--- collectors/
       |
       +--- databases/
                 |
                 +--- leafhost/
                 |
                 +--- host2/
                 |
                   ... etc ...

After defining a database and creating the corresponding collector, the collector must be scheduled to run at regular intervals. This must be done for each collector/database. Cron is your friend here. An option that I favor myself is to have only one entry in /etc/crontab. This entry calls the overall collector script, which in turn calls each of the individual collector scripts. This avoids that for each new collector the system crontab file must be edited. In this case your /etc/crontab would have the following entry:

# /etc/crontab

...

# overall collector script
*/5 *   * * *   rrd    /home/rrd/collectors/collect-all

#

This means that the overall collector script is started every 5 minutes. The overall collector file /home/rrd/collectors/collect-all could look like:

#!/bin/sh
# Overall collector script

# Script for collecting interface statistics
/home/rrd/collectors/interface.pl

# Script for collecting cpu load
/home/rrd/collectors/cpuload.pl

Example 1: network traffic

Define the RRD database

If the number of interfaces on the LEAF system is fixed and will never change, you may choose to keep the traffic statistics of both interfaces in one database. If not, it's probably easier to define a database per interface. This makes it easier extend your RRD system for more interfaces that you may get on your LEAF system. Here a database for only one interface is created.

To create a new database, go to the data directory for the targeted host and create the dataset with the options as described below:

cd /home/rrd/databases/leafhost
rrdtool create eth0.rrd \
        -step 300 \
        DS:bytes_in:COUNTER:600:U:U \
        DS:bytes_out:COUNTER:600:U:U \
        RRA:AVERAGE:0.5:1:864 \
        RRA:AVERAGE:0.5:6:672 \
        RRA:AVERAGE:0.5:24:744 \
        RRA:AVERAGE:0.5:288:730

This has created a new database named eth0.rrd which expects new data every 300 seconds (step size). This is exactly the same as the schedule defined in the crontab file above.

The database contains two datasets, i.e. bytes_in and bytes_out, both of the type COUNTER.

Three round robin archives are defined containing avaraged values:

  • 864 samples of 1 step (5 seconds). This is a period of 3 days. Since the step size is one the actual value is stored and no average is calculated.

  • 672 averaged samples over 6 steps (30 minutes). This is a period of 2 weeks.

  • 744 averaged samples over 24 steps (2 hours). This a period of 2 weeks.

  • 730 averaged samples over 288 steps (1 day). This is a period of 2 years.

Create the collector

The data that can be retrieved from an SNMP agent is defined in a Management Information Base MIB). The objects in the MIB containing the interface traffic counters that are necessary for this example are:

  • .iso.org.dod.internet.mgmt.mib-2.interfaces.ifNumber = .1.3.6.1.2.1.2.1

  • .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr = .1.3.6.1.2.1.2.2.1.2

  • .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets = .1.3.6.1.2.1.2.2.1.10

  • .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifOutOctets = .1.3.6.1.2.1.2.2.1.16

In the sample script below the LEAF system is queried for the number of interfaces. The correct interface is selected based on the interface name and then the counters for bytes_in and bytes_out are read. Finally this information is stored into the database.

#!/usr/bin/perl

# interface.pl

use SNMP;
use RRDs;

$oid_ifNumber    = ".1.3.6.1.2.1.2.1";
$oid_ifDescr     = ".1.3.6.1.2.1.2.2.1.2";
$oid_ifInOctets  = ".1.3.6.1.2.1.2.2.1.10";
$oid_ifOutOctets = ".1.3.6.1.2.1.2.2.1.16";

$database = "/home/rrd/databases/leafhost/eth0.rrd";


#
# Open snmp session and get interface data
#
$session = new SNMP::Session(
                        DestHost  => "leafhost",
                        Community => "zaphod",
                        Version   => '2');
die "SNMP session creation error: $SNMP::Session::ErrorStr" unless (defined $session);

$numInts = $session->get($oid_ifNumber . ".0");

for $i (1..$numInts) {
    $name = $session->get($oid_ifDescr . "." . $i);
    if ( $name eq "eth0" ) {
        $in = $session->get($oid_ifInOctets . "." . $i);
        $out = $session->get($oid_ifOutOctets . "." . $i);
    }
}

die $session->{ErrorStr} if ($session->{ErrorStr});


#
# Update the database
#
RRDs::update ($database, "N:".$in.":".$out);
my $Err = RRDs::error;
die "Error while updating: $Err\n" if $Err;

#

Ofcourse this is only an example. You can use this to extend it to your own needs.

Example 2: cpu load

Define the RRD database

On Linux systems three types of cpu load (process time) exist, i.e. user, system, nice and idle. We will now define a database in which to store this information.

cd /home/rrd/databases/leafhost
rrdtool create cpuload.rrd \
        --step 300 \
        DS:user:COUNTER:600:0:100 \
        DS:system:COUNTER:600:0:100 \
        DS:nice:COUNTER:600:0:100 \
        DS:idle:COUNTER:600:0:100 \
        RRA:AVERAGE:0.5:1:864 \
        RRA:AVERAGE:0.5:6:672 \
        RRA:AVERAGE:0.5:24:744 \
        RRA:AVERAGE:0.5:288:730

The definition of this database has much in common with the previous database. Now four datasets have been defined instead of two. The definition of the round robin archives is the same.

Create the collector

The cpu load information is represented by the following objects in the MIB:

  • .iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawUser = .1.3.6.1.4.1.2021.11.50

  • .iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawNice = .1.3.6.1.4.1.2021.11.51

  • .iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawSystem = .1.3.6.1.4.1.2021.11.52

  • .iso.org.dod.internet.private.enterprises.ucdavis.systemStats.ssCpuRawIdle = .1.3.6.1.4.1.2021.11.53

And this information can be retrieved and stored with the following script:

#!/usr/bin/perl

# cpuload.pl

use SNMP;
use RRDs;

$oid_ssCpuRawUser    = ".1.3.6.1.4.1.2021.11.50";
$oid_ssCpuRawSystem  = ".1.3.6.1.4.1.2021.11.51";
$oid_ssCpuRawNice    = ".1.3.6.1.4.1.2021.11.52";
$oid_ssCpuRawIdle    = ".1.3.6.1.4.1.2021.11.53";

$database = "/home/rrd/databases/leafhost/cpuload.rrd";


#
# Open snmp session and get interface data
#
$session = new SNMP::Session(
                        DestHost  => "leafhost",
                        Community => "zaphod",
                        Version   => '2');
die "SNMP session creation error: $SNMP::Session::ErrorStr" unless (defined $session);

$cpuUser   = $session->get($oid_ssCpuRawUser . ".0");
$cpuSystem = $session->get($oid_ssCpuRawSystem . ".0");
$cpuNice   = $session->get($oid_ssCpuRawNice . ".0");
$cpuIdle   = $session->get($oid_ssCpuRawIdle . ".0");


#
# Update the database
#
RRDs::update ($database, "N:".$cpuUser.":".$cpuSystem.":".$cpuNice.":".$cpuIdle);
my $Err = RRDs::error;
die "Error while updating: $Err\n" if $Err;

#

Retrieving and presenting performance data

Introduction

After you finished the scripts and the overall collector has been called a few times by cron, it's time to make some graphics.

The follwoing assumptions are made with respect to the configuration of the webserver:

  • An alias /images/ is defined for /var/www/images/

  • The images directory has a subdirectory rrdimg in which the rrd graphs will be created

For ease of reuse a separate php file is used in which the generic functions for drawing graphs are defined. This file is included by the other scripts.

Example 1: network traffic

First a file graphs.php is defined that contains the functions to draw the graphs.

<?php

## graphs.php
##
## A set of php functions to create rrd graphs


function interface ($start)
{
    $database = "/home/rrd/databases/leafhost/eth0.rrd";
    $imgfile = "eth0$start.gif";

    $opts = array( "--start", "$start",
        "--vertical-label", "Bytes/sec",
        "--width", "400",
        "DEF:in=$database:bytes_in:AVERAGE",
        "DEF:out=$database:bytes_out:AVERAGE",
        "LINE2:in#00ff00:In",
        "LINE2:out#ff0000:Out"
    );

    make_graph ($imgfile, $opts);
}


function make_graph ($file, $options)
{
    $ret = rrd_graph("/var/www/images/rrdimg/$file", $options, count($options));

    ## if $ret is an array, then rrd_graph was successful
    ##
    if ( is_array($ret) ) {
        echo "<img src=\"/images/rrdimg/$file\" border=0>";
    }
    else {
        $err = rrd_error();
        echo "<p><b>$err</b></p>";
    }
}

?>

Then the actual page that contains the network traffic graphs can be created.

<html>
    <head>
        <title>Interface statistics</title>
    </head>
    <body>
        <h1>Interface statistics</h1>
<?php
        require "graphs.php";

        print "<h2>Daily graph</h2>\n";
        interface ("-1d");
        print "<h2>Weekly graph</h2>\n";
        interface ("-1w");
        print "<h2>Monthly graph</h2>\n";
        interface ("-1m");
?>
    </body>
</html>

Now fire-up your browser and access the page that you just created. Sit back and enjoy !!

Example 2: cpu load

First we add a function to draw cpuload garphs to the file graphs.php.

<?php

## functions.php
##
## A set of php functions to create rrd graphs

...

function cpuload ($start)
{
    $database = "/home/rrd/databases/leafhost/cpuload.rrd";
    $imgfile = "cpu$start.gif";

    $opts = array( "--start", "$start",
        "--vertical-label", "Load (%)",
        "--width", "400",
        "DEF:user=$database:user:AVERAGE",
        "DEF:nice=$database:nice:AVERAGE",
        "DEF:system=$database:system:AVERAGE",
        "AREA:system#00ffff:System",
        "STACK:user#00ff00:User",
        "STACK:nice#0000ff:Nice",
    );

    make_graph ($imgfile, $opts);
}

?>

And then the actual CPU load page is created. This is almost too easy ;-)

<html>
    <head>
        <title>CPU Load statistics</title>
    </head>
    <body>
        <h1>CPU Load statistics</h1>
<?php
        require "graphs.php";

        print "<h2>Daily graph</h2>\n";
        cpuload ("-1d");
        print "<h2>Weekly graph</h2>\n";
        cpuload ("-1w");
        print "<h2>Monthly graph</h2>\n";
        cpuload ("-1m");
?>
    </body>
</html>