The probes in this section may be applied to the RHN Satellite Server itself to monitor its health and performance. Since these probes run locally, no specific application or transport protocols are required.
The RHN Satellite Server::Check Alive probe is useful in ensuring the viability of your Monitoring-enabled Satellite. It reports the following metrics:
Probe Count — The number of probes configured on the Satellite.
Percent OK — The percent of probes in an OK state.
Percent WARNING — The percent of probes in a WARNING state.
Percent CRITICAL — The percent of probes in a CRITICAL state.
Percent PENDING — The percent of probes in a PENDING state.
Percent UNKNOWN — The percent of probes in an UNKNOWN state.
Recent State Changes — The number of state changes in the last hour (a measure of volatility).
Imminent Probes — The number of probes scheduled to run in the next 10 minutes (a measure of scheduler efficiency).
Execution Time — The maximum, minimum, and average last execution time in seconds.
Probe Latency — The average difference in seconds between when probes are scheduled to run and when they actually do run.
Satellite Latency — The amount of time in seconds since the Satellite last checked in.
If monitoring is critical to your infrastructure, consider setting the probe to run every five minutes, alert after a single failure, and renotify after 10 minutes to ensure this probe functions optimally.
The RHN Satellite Server::Disk Space probe monitors the free disk space on a Satellite and collects the following metrics:
File System Used — The percent of the current filesystem now in use.
Space Used — The file size used by the current filesystem.
Space Available — The file size available to the current filesystem.
The RHN Satellite Server::Execution Time probe monitors the execution time for probes run from a Satellite and collects the following metric:
Probe Execution Time Average — The seconds it takes to fully execute a probe.
The RHN Satellite Server::Interface Traffic probe monitors the interface traffic on a Satellite and collects the following metrics:
Input Rate — The amount of traffic in bytes per second the device receives.
Output Rate — The amount of traffic in bytes per second the device sends.
The RHN Satellite Server::Latency probe monitors the latency of probes on a Satellite and collects the following metric:
Probe Latency Average — The lag in seconds between the time a probe becomes ready to run and the time it is actually run. Under normal conditions, this will generally be less than a second. When a Satellite is overloaded (because it has too many probes with respect to their average execution time), the number goes up.
The RHN Satellite Server::Load probe monitors the CPU load on a Satellite and collects the following metric:
Load — The load average on the CPU for a 1-, 5-, and 15-minute period.
The RHN Satellite Server::Probe Count probe monitors the number of probes on a Satellite and collects the following metric:
Probes — The number of individual probes running on a Satellite.
The RHN Satellite Server::Process Counts probe monitors the number of processes on a Satellite and collects the following metrics:
Blocked — The number of processes that have been switched to the waiting queue and waiting state.
Child — The number of processes spawned by another process already running on the machine.
Defunct — The number of processes that have terminated (either because they have been killed by a signal or have called exit()) and whose parent processes have not yet received notification of their termination by executing (some form of) the wait() system call.
Stopped — The number of processes that have been stopped before their executions could be completed.
Swapped — The number of processes that have been written to disk, generally due to a severe memory shortfall.
Field | Value |
---|---|
Critical Maximum Blocked Processes | |
Warning Maximum Blocked Processes | |
Critical Maximum Child Processes | |
Warning Maximum Child Processes | |
Critical Maximum Defunct Processes | |
Warning Maximum Defunct Processes | |
Critical Maximum Stopped Processes | |
Warning Maximum Stopped Processes | |
Critical Maximum Swapped Processes | |
Warning Maximum Swapped Processes |
Table C-70. RHN Satellite Server::Process Counts settings
The RHN Satellite Server::Processes probe monitors the number of processes on a Satellite and collects the following metric:
Processes — The number of processes running simultaneously on the machine.
The RHN Satellite Server::Process Health probe monitors customer-specified processes and collects the following metrics:
CPU Usage — The CPU usage percent for a given process.
Child Process Groups — The number of child processes spawned from the specified parent process. A child process inherits most of its attributes, such as open files, from its parent.
Threads — The number of running threads for a given process. A thread is the basic unit of CPU utilization, and consists of a program counter, a register set, and a stack space. A thread is also called a lightweight process.
Physical Memory Used — The amount of physical memory in kilobytes being used by the specified process.
Virtual Memory Used — The amount of virtual memory in kilobytes being used by the specified process, or the size of the process in real memory plus swap.
Specify the process by either command name or process I.D. (PID). Entering a PID will override the entry of a command name. If no command name or PID is entered, the error Command not found will be displayed and the probe will be set to a CRITICAL state.
Field | Value |
---|---|
Command Name | |
Process ID (PID) file | |
Timeout* | 15 |
Critical Maximum CPU Usage | |
Warning Maximum CPU Usage | |
Critical Maximum Child Process Groups | |
Warning Maximum Child Process Groups | |
Critical Maximum Threads | |
Warning Maximum Threads | |
Critical Maximum Physical Memory Used | |
Warning Maximum Physical Memory Used | |
Critical Maximum Virtual Memory Used | |
Warning Maximum Virtual Memory Used |
Table C-72. RHN Satellite Server::Process Health settings
The RHN Satellite Server::Process Running probe verifies that the specified process is running. Specify the process by either command name or process I.D. (PID). Entering a PID will override the entry of a command name. A Critical status results if the probe cannot verify the command or PID.
The RHN Satellite Server::Swap probe monitors the percent of free swap space available on a Satellite. A CRITICAL status results if the value falls below the Critical threshold. A WARNING status results if the value falls below the Warning threshold.
The RHN Satellite Server::Users probe monitors the number of users currently logged into a Satellite. A CRITICAL status results if the value exceeds the Critical threshold. A WARNING status results if the value exceeds the Warning threshold.