Basic usage

Basic usage
Prev	Chapter 24. Monitoring Plugin	Next

Basic usage

Standard metrics dashboard

The CloudBees Monitoring plugin provides a standard dashboard view that displays twelve of the more important metrics about a Jenkins installation:

The system load on the master as well as the number of CPU cores in use by the JVM on the master.

This graph should be compared with the number of CPU cores on the Jenkins master.
The percentage of the master’s JVM’s heap memory pool that is currently in use

Typically this will be a saw-tooth pattern for a steady state load. The graph being consistently above 80% is usually indicative of memory pressure.
The percentage of the master’s JVM’s non-heap memory pool that is currently in use

This graph being consistently above 80% is usually indicative of memory pressure.
The percentage of file descriptors in use by the Jenkins master

When the Jenkins master is running on a Unix-based operating system this should remain consistently low unless there is a file handle leak in one of the plugins installed in your Jenkins master. When Jenkins runs out of file handles jobs can start failing at random.
The rate of web requests against the Jenkins master UI

If the web request rate is at or near a limit for your deployment architecture then web requests may start to fail. The exact limit depends on the deployment architecture for your Jenkins master UI.
The response time distribution for web requests against the Jenkins master UI
The response status code breakdown for web requests against the Jenkins master UI
The length of the build queue

When there is a mismatch between the build capacity and the rate at which builds are being scheduled this metric provides a key indicator.
The amount of time a build takes to complete, including a breakdown for time spent queuing

Jobs are spending a large portion of time in the build queue, is an indicator of a Jenkins master that needs more build slaves.
The breakdown of the build queue based on the various reasons why a job can be in the build queue

A job is considered stuck if any of the following conditions is true: There are no build nodes with the labels required by the job The job is waiting for more than 10 times its estimated duration to build but all the nodes that it can build on are busy ** The job has never been built and is waiting for more than a day to build but all the nodes that it can build on are busy
The executors available for building jobs
The rate at which build jobs are scheduled

If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. Thus it can be normal for the scheduling rate to be higher than the build rate.

The standard metrics dashboard has the advantage that it does not support customization and as a result requires very little effort to set up.

Creating a Standard metrics dashboard

The following instructions will add a standard metrics dashboard view to the root of Jenkins. You can also add the dashboard to folders or other view container, though as the dashboard is not contextual the same information will be displayed.

If your Jenkins instance has some jobs, just click on the + tab on the top of the jobs list in Jenkins to get to the new view screen
If your Jenkins does not have any jobs yet, change the browser URL from JENKINS_URL/ to JENKINS_URL/newView
Enter a name for your new view, such as Dashboard.
Select the Jenkins Enterprise Metrics Dashboard radio button.
Click on the Ok button.
The view configuration screen should be displayed. There are only two options available for configuration on this screen: the view name and the view description. Click the Ok button to finish creating the view

You should now have a standard metrics dashboard view. When initially displayed the dashboard will progressively load the historical metrics maintained by the metrics plugin since Jenkins started. Once the historical data has been loaded the graphs will switch to live updating mode where they are updated every 10 seconds.

Metrics based alerts

This feature allows you to define different metrics based alerts and have Jenkins send emails when the alerts start and finish

When the feature is enabled it adds an Alerts action to the top level Jenkins actions. The Alerts action allows viewing the status of all the defined alerts as well as providing the ability to silence specific alerts.

Note

In order for the alerting via email to function, Jenkins must be configured to be able to send emails

Creating some basic alerts

The following instructions will create four basic alerts:

An alert that triggers if any of the health reports are failing
An alert that triggers if the file descriptor usage on the master goes above 80%
An alert that triggers if the JVM heap memory usage is over 80% for more than a minute
An alert that triggers if the 5 minute average of HTTP/404 responses goes above 10 per minute for more than five minutes

These instructions assume you have configured Jenkins with the SMTP settings required for sending emails.

Login as an administrator and navigate to the main Jenkins configuration screen.
Scroll down to the Alerts section.
Click the Add corresponding to the Conditions
Select the Health check score option
Specify Health checks as the Alert title. Leave the Alert after at 5 seconds. If you want to specify additional recipients for this health check only you can add them. Emails will be sent to the Global Recipients as well as any alert specific Recipients
Click the Add corresponding to the Conditions
Select the Local metric gauge within range option
Specify vm.file.descriptor.ratio as the Gauge. Specify 0.8 as Alert if above. Specify File descriptor usage below 80% as the Alert title. Leave the Alert after at 5 seconds.
Click the Add corresponding to the Conditions
Select the Local metric gauge within range option
Specify vm.memory.heap.usage as the Gauge. Specify 0.8 as Alert if above. Specify JVM heap memory usage below 80% as the Alert title. Specify the Alert after as 60 seconds.
Click the Add corresponding to the Conditions
Select the Local metric meter within range option
Specify http.responseCodes.badRequest as the Meter. Specify 5 minute average as the Value. Specify 0.16666666 as Alert if above
- the meter rates all report in events per second. Specify Less than 10 bad requests per minute as the Alert title. Specify the Alert after as 300 seconds.
Click the Add corresponding to the Global Recipients
Select the Email notifications option
Specify the alert email recipients as a whitespace or comma separated list in the Email addresses text box.
Save the configuration.
The main Jenkins root page should now have an Alerts action. Click on this action to view the alerts

Managing alerts

Each alert can be in one of four states:

Table 24.1. Alert states

Icon	State	When
	Failing	The alert condition is met for less than the Alert after duration
	Failed	The alert condition has been met for at least the Alert after duration
	Recovering	The alert condition is not met for less than the Alert after duration
	Recovered	The alert condition is not met for at least the Alert after duration

Notification emails will be sent for any alarms that are not silenced on either of the transitions:

Failing to Failed
Recovering to Recovered

The alerts are checked every 5 seconds. The Alerts page displays the current value of each alert condition. If the condition has changed in between these alert checks then the UI may show the alert in a mixed state such as in Figure 24.2, “An alert where the condition has changed prior to the periodic checks running”.

Figure 24.2. An alert where the condition has changed prior to the periodic checks running

However, once the periodic check runs, the condition will enter either the Failing or Recovering state.

Figure 24.3. An alert having entered the Failing state

If the condition changes before the condition’s Alert after time expires then no notifications will be sent.

Figure 24.4. An alert having entered the Recovering state

On the other hand, if the condition stays constant for the entire Alert after time then a notification will be sent.

Figure 24.5. An alert having entered the Failed state

The Silence button can be used to supress the sending of notifications for specific alerts. The alerts are re-enabled using the Enable button.

Figure 24.6. Some alerts having been silenced

Prev	Up	Next
Chapter 24. Monitoring Plugin	Home	Advanced usage