System Health

System Health provides insight into the well-being of the cluster from both a broker and topic-centric perspective. The two sections share many common features with each other, which will be outlined below.

UI Commonalities

Chart Tooltips
Each chart displays a similarly styled tooltip when hovering. These tooltips can display multiple metrics at the same time which are each paired with an icon. The icon will either be a (good) check mark or a (bad) X symbol at each point.
Table Metric Validations
Similar to the chart tooltips, many of the table metrics will visually change to indiciate potential issues via a red underline. Hovering the mouse over text with a red underline will display an explanatory tooltip.

Produced and Fetched Charts

../../_images/brokers-produced-fetched-charts.png
Produced (left hand side)
  • Bytes – total number of bytes per second produced to this cluster
  • Requests – total number of successful / failed produce requests to this cluster
  • Latency – produce request latency across all brokers, at the median, 95th, 99th, or 99.9th percentile
Fetched (right hand side):
  • Bytes – total number of bytes per second fetched from this cluster
  • Requests – total number of successful / failed fetch requests to this cluster
  • Latency – fetch request latency across all brokers, at the median, 95th, 99th, or 99.9th percentile
Per broker or topic breakdown
Hovering the mouse cursor over an individual row of the broker or topic table will overlay the request statistics for that individual broker or topic in the chart.
Request latency (broker view only)

In the broker section, the request latency for each broker will appear as its own line in the bottom-most chart.

Hovering the mouse cursor over a specific line will highlight the corresponding row for that particular broker in the table.

Request lifecycle (broker view only)

Clicking a line inside the latency chart will display the breakdown of produce or fetch latency throughout the entire request lifecycle.

The request latency profile can shown at the median, 95th, 99th, or 99.9th percentile by selecting the corresponding header.

Broker Aggregate Metrics

Broker count
Number of brokers currently online
Zookeeper

ZooKeeper status - Up or Down

Expires

rate at which brokers are experiencing ZooKeeper session expirations (number of expirations per second)

Leader elections

Number of partition leader elections

Active controller
Number of brokers reporting as the active controller
Unclean elections

Number of unclean partition leader elections

When unclean leader election is held among out-of-sync replicas, there is a possibility of data loss if any messages were not synced prior to the loss of the former leader.

Network pool usage
Average network pool capacity usage across all brokers
Request pool usage
Average request handler capacity usage across all brokers (i.e. the fraction of time request handler threads are not sitting idle)
Disk usage

Disk usage distribution - indicates whether disk usage distribution is even or skewed across all brokers in a cluster

Disk usage is determined to be skewed if the relative mean absolute difference of all broker sizes exceeds 10%.

Online topic partitions
Total number of online topic partitions
Under replicated topic partitions
Total number of topic partitions that are under-replicated (i.e. partition with in-sync replicas < replication factor)
Offline topic partitions

Total number of partitions that are offline

A partition could be offline if for instance the number of available replicas for the partition is less than the configured minimum in-sync replicas.

Broker Metrics Table

Id
Id for this broker
Throughput
Bytes In / Bytes Out – Number of bytes per second produced to, or fetched from this broker (including from other brokers as part of replication)
Latency (produce)
Latency of produce requests to this broker at the median, 95th, 99th, or 99.9th percentile (in milliseconds)
Latency (fetched)
Latency of fetch requests to this broker at the median, 95th, 99th, or 99.9th percentile (in milliseconds)
Partition replicas
Total number of partition replicas served by this broker
Segment
Total size in bytes of the log segments served by this broker (excluding index size)
Rack
Rack Id for this broker

Topic Aggregate Metrics

Topic count
Total number of topics
In sync replicas
Total number of partition replicas that are in-sync
Out of sync replicas
Total number of partition replicas that are out of sync

Topic Metrics Table

Name
Topic name
Throughput
Bytes In / Bytes Out – Number of bytes per second produced to, or fetched from this topic (including from replicas)
Partition replicas

Total – Total number of partition replicas for this topic

In Sync – Total number of partition replicas that are in sync

Out of Sync – Total number of partition replicas that are in sync

Partitions

Total – Number of partitions for this topic

Under replicated – Number of partitions that are under replicated (i.e. partitions with in-sync replicas < replication factor)

Segment

Count – Number of log segments for this topic across all partition leaders.

Size – Size in bytes of the log for this topic (does not include replicas)

Offset

Start – Minimum offset across all partitions for this topic

End – Maximum offset across all partitions for this topic