.. _controlcenter_userguide_systemhealth: System Health ============= System Health provides insight into the well-being of the cluster from both a broker and topic-centric perspective. The two sections share many common features with each other, which will be outlined below. Navigation ---------- System Health is accessible by selecting the item labeled "System Health" from the main navigation. The main ways to adjust the data you're seeing are as follows: 1. Using the sub navigation menu to jump between broker and topic views: .. figure:: images/systemhealth/main-nav.png :scale: 50% :align: center 2. Clicking the KPI menu items on the left hand side of the page to see more detailed information: .. figure:: images/systemhealth/topics-menu-open.gif :scale: 50% :align: center 3. Using the table view selector to choose between raw, trend and bullet chart views: .. figure:: images/systemhealth/table-view-selector.gif :scale: 50% :align: center 4. Selecting *"View details"* from the "..." contextual menu on each row of the topics table which navigates to Topic Management: .. figure:: images/systemhealth/topics-table-menu-open.gif :scale: 50% :align: center 5. Selecting a percentile from the request latency percentile dropdown (broker view only): .. figure:: images/systemhealth/brokers-produced-fetched-charts-percentile.gif :scale: 50% :align: center .. note:: Request latency percentiles are only available in the brokers section of System Health. UI Commonalities ---------------- Chart Tooltips Each chart displays a similarly styled tooltip when hovering. These tooltips can display multiple metrics at the same time which are each paired with an icon. The icon will either be a (good) check mark or a (bad) X symbol at each point. Table Metric Validations Similar to the chart tooltips, many of the table metrics will visually change to indiciate potential issues via a red underline. Hovering the mouse over text with a red underline will display an explanatory tooltip. Produced and Fetched Charts --------------------------- .. figure:: images/systemhealth/brokers-produced-fetched-charts.png :scale: 50% :align: center Produced (left hand side) - **Bytes** – total number of bytes per second produced to this cluster - **Requests** – total number of successful / failed produce requests to this cluster - **Latency** – produce request latency across all brokers, at the median, 95th, 99th, or 99.9th percentile Fetched (right hand side): - **Bytes** – total number of bytes per second fetched from this cluster - **Requests** – total number of successful / failed fetch requests to this cluster - **Latency** – fetch request latency across all brokers, at the median, 95th, 99th, or 99.9th percentile Per broker or topic breakdown Hovering the mouse cursor over an individual row of the broker or topic table will overlay the request statistics for that individual broker or topic in the chart. Request latency (broker view only) In the broker section, the request latency for each broker will appear as its own line in the bottom-most chart. Hovering the mouse cursor over a specific line will highlight the corresponding row for that particular broker in the table. Request lifecycle (broker view only) Clicking a line inside the latency chart will display the breakdown of produce or fetch latency throughout the entire request lifecycle. The request latency profile can shown at the median, 95th, 99th, or 99.9th percentile by selecting the corresponding header. Broker Aggregate Metrics ------------------------ Broker count Number of brokers currently online Zookeeper ZooKeeper status - *Up* or *Down* **Expires** .. include:: fragments/brokerClusterZooKeeperExpires.rst **Leader elections** .. include:: fragments/brokerClusterLeaderElection.rst Active controller .. include:: fragments/brokerClusterActiveController.rst Unclean elections .. include:: fragments/brokerClusterUncleanCount.rst Network pool usage Average network pool capacity usage across all brokers Request pool usage Average request handler capacity usage across all brokers (i.e. the fraction of time request handler threads are not sitting idle) Disk usage Disk usage distribution - indicates whether disk usage distribution is even or skewed across all brokers in a cluster Disk usage is determined to be skewed if the relative mean absolute difference of all broker sizes exceeds 10%. Online topic partitions Total number of online topic partitions Under replicated topic partitions .. include:: fragments/brokerClusterUnderReplicated.rst Offline topic partitions .. include:: fragments/brokerClusterOfflineTopicPartitions.rst Broker Metrics Table -------------------- Id Id for this broker Throughput **Bytes In** / **Bytes Out** – Number of bytes per second produced to, or fetched from this broker (including from other brokers as part of replication) Latency (produce) .. include:: fragments/brokerProductionRequestLatency.rst Latency (fetched) .. include:: fragments/brokerFetchRequestLatency.rst Partition replicas Total number of partition replicas served by this broker Segment Total size in bytes of the log segments served by this broker (excluding index size) Rack Rack Id for this broker Topic Aggregate Metrics ----------------------- Topic count Total number of topics In sync replicas .. include:: fragments/topicInSyncReplica.rst Out of sync replicas Total number of partition replicas that are out of sync Topic Metrics Table ------------------- Name Topic name Throughput **Bytes In** / **Bytes Out** – Number of bytes per second produced to, or fetched from this topic (including from replicas) Partition replicas **Total** – Total number of partition replicas for this topic **In Sync** – Total number of partition replicas that are in sync **Out of Sync** – Total number of partition replicas that are in sync Partitions **Total** – Number of partitions for this topic **Under replicated** – Number of partitions that are under replicated (i.e. partitions with in-sync replicas < replication factor) Segment **Count** – Number of log segments for this topic across all partition leaders. **Size** – Size in bytes of the log for this topic (does not include replicas) Offset **Start** – Minimum offset across all partitions for this topic **End** – Maximum offset across all partitions for this topic