Basic usage

Standard metrics dashboard

The CloudBees Monitoring plugin provides a standard dashboard view that displays twelve of the more important metrics about a Jenkins installation:

  • The system load on the master as well as the number of CPU cores in use by the JVM on the master.

    gs fig1

    This graph should be compared with the number of CPU cores on the Jenkins master.

  • The percentage of the master’s JVM’s heap memory pool that is currently in use

    gs fig2

    Typically this will be a saw-tooth pattern for a steady state load. The graph being consistently above 80% is usually indicative of memory pressure.

  • The percentage of the master’s JVM’s non-heap memory pool that is currently in use

    gs fig3

    This graph being consistently above 80% is usually indicative of memory pressure.

  • The percentage of file descriptors in use by the Jenkins master

    gs fig4

    When the Jenkins master is running on a Unix-based operating system this should remain consistently low unless there is a file handle leak in one of the plugins installed in your Jenkins master. When Jenkins runs out of file handles jobs can start failing at random.

  • The rate of web requests against the Jenkins master UI

    gs fig5

    If the web request rate is at or near a limit for your deployment architecture then web requests may start to fail. The exact limit depends on the deployment architecture for your Jenkins master UI.

  • The response time distribution for web requests against the Jenkins master UI

    gs fig6
  • The response status code breakdown for web requests against the Jenkins master UI

    gs fig9
  • The length of the build queue

    gs fig7

    When there is a mismatch between the build capacity and the rate at which builds are being scheduled this metric provides a key indicator.

  • The amount of time a build takes to complete, including a breakdown for time spent queuing

    gs fig10

    Jobs are spending a large portion of time in the build queue, is an indicator of a Jenkins master that needs more build slaves.

  • The breakdown of the build queue based on the various reasons why a job can be in the build queue

    gs fig11

    A job is considered stuck if any of the following conditions is true: There are no build nodes with the labels required by the job The job is waiting for more than 10 times its estimated duration to build but all the nodes that it can build on are busy ** The job has never been built and is waiting for more than a day to build but all the nodes that it can build on are busy

  • The executors available for building jobs

    gs fig8
  • The rate at which build jobs are scheduled

    gs fig12

    If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. Thus it can be normal for the scheduling rate to be higher than the build rate.

The standard metrics dashboard has the advantage that it does not support customization and as a result requires very little effort to set up.

Creating a Standard metrics dashboard

The following instructions will add a standard metrics dashboard view to the root of Jenkins. You can also add the dashboard to folders or other view container, though as the dashboard is not contextual the same information will be displayed.

  1. If your Jenkins instance has some jobs, just click on the + tab on the top of the jobs list in Jenkins to get to the new view screen

    If your Jenkins does not have any jobs yet, change the browser URL from JENKINS_URL/ to JENKINS_URL/newView

  2. Enter a name for your new view, such as Dashboard.
  3. Select the Jenkins Enterprise Metrics Dashboard radio button.

    gs cd1
  4. Click on the Ok button.
  5. The view configuration screen should be displayed. There are only two options available for configuration on this screen: the view name and the view description. Click the Ok button to finish creating the view

    gs cd2

You should now have a standard metrics dashboard view. When initially displayed the dashboard will progressively load the historical metrics maintained by the metrics plugin since Jenkins started. Once the historical data has been loaded the graphs will switch to live updating mode where they are updated every 10 seconds.

gs cd3

Metrics based alerts

This feature allows you to define different metrics based alerts and have Jenkins send emails when the alerts start and finish

When the feature is enabled it adds an Alerts action to the top level Jenkins actions. The Alerts action allows viewing the status of all the defined alerts as well as providing the ability to silence specific alerts.

Note

In order for the alerting via email to function, Jenkins must be configured to be able to send emails

Creating some basic alerts

The following instructions will create four basic alerts:

  • An alert that triggers if any of the health reports are failing
  • An alert that triggers if the file descriptor usage on the master goes above 80%
  • An alert that triggers if the JVM heap memory usage is over 80% for more than a minute
  • An alert that triggers if the 5 minute average of HTTP/404 responses goes above 10 per minute for more than five minutes

These instructions assume you have configured Jenkins with the SMTP settings required for sending emails.

  1. Login as an administrator and navigate to the main Jenkins configuration screen.

    gs ca01
  2. Scroll down to the Alerts section.

    gs ca02
  3. Click the Add corresponding to the Conditions
  4. Select the Health check score option

    gs ca03
  5. Specify Health checks as the Alert title. Leave the Alert after at 5 seconds. If you want to specify additional recipients for this health check only you can add them. Emails will be sent to the Global Recipients as well as any alert specific Recipients

    gs ca04
  6. Click the Add corresponding to the Conditions
  7. Select the Local metric gauge within range option

    gs ca05
  8. Specify vm.file.descriptor.ratio as the Gauge. Specify 0.8 as Alert if above. Specify File descriptor usage below 80% as the Alert title. Leave the Alert after at 5 seconds.

    gs ca06
  9. Click the Add corresponding to the Conditions
  10. Select the Local metric gauge within range option

    gs ca07
  11. Specify vm.memory.heap.usage as the Gauge. Specify 0.8 as Alert if above. Specify JVM heap memory usage below 80% as the Alert title. Specify the Alert after as 60 seconds.

    gs ca08
  12. Click the Add corresponding to the Conditions
  13. Select the Local metric meter within range option

    gs ca09
  14. Specify http.responseCodes.badRequest as the Meter. Specify 5 minute average as the Value. Specify 0.16666666 as Alert if above

    • the meter rates all report in events per second. Specify Less than 10 bad requests per minute as the Alert title. Specify the Alert after as 300 seconds.

      gs ca10
  15. Click the Add corresponding to the Global Recipients
  16. Select the Email notifications option

    gs ca11
  17. Specify the alert email recipients as a whitespace or comma separated list in the Email addresses text box.

    gs ca12
  18. Save the configuration.
  19. The main Jenkins root page should now have an Alerts action. Click on this action to view the alerts

    gs ca13

Managing alerts

Each alert can be in one of four states:

Table 24.1. Alert states

Icon State When

icon

Failing

The alert condition is met for less than the Alert after duration

icon

Failed

The alert condition has been met for at least the Alert after duration

icon

Recovering

The alert condition is not met for less than the Alert after duration

icon

Recovered

The alert condition is not met for at least the Alert after duration


Notification emails will be sent for any alarms that are not silenced on either of the transitions:

  • Failing to Failed
  • Recovering to Recovered

The alerts are checked every 5 seconds. The Alerts page displays the current value of each alert condition. If the condition has changed in between these alert checks then the UI may show the alert in a mixed state such as in Figure 24.2, “An alert where the condition has changed prior to the periodic checks running”.

Figure 24.2. An alert where the condition has changed prior to the periodic checks running

gs ma01

However, once the periodic check runs, the condition will enter either the Failing or Recovering state.

Figure 24.3. An alert having entered the Failing state

gs ma02

If the condition changes before the condition’s Alert after time expires then no notifications will be sent.

Figure 24.4. An alert having entered the Recovering state

gs ma03

On the other hand, if the condition stays constant for the entire Alert after time then a notification will be sent.

Figure 24.5. An alert having entered the Failed state

gs ma04

The Silence button can be used to supress the sending of notifications for specific alerts. The alerts are re-enabled using the Enable button.

Figure 24.6. Some alerts having been silenced

gs ma05