Cluster Status (Ganglia)

3.3. Cluster Status (Ganglia)

The webpages available from this link provide a graphical interface to live cluster information provided by Ganglia monitors running on each cluster node. The monitors gather values for various metrics such as CPU load, free Memory, disk usage, network I/O, operating system version, etc. These metrics are sent through the private cluster network and are used by the frontend node to generate the historical graphs you see on this page.

In addition to metric parameters, a heartbeat message from each node is collected by the ganglia monitors. When a number of heartbeats from any node are missed, this web page will declare it "dead". These dead nodes often have problems which require additional attention, and are marked with the Skull-and-Crossbones icon, or a red background.

This page has many options, most of which are hopefully somewhat self explanitory. There are numerous links and each page shows a myriad of information, so be sure to explore the site carefully. The data is very fresh (usually only a few seconds old), and is updated with each page load. See the ganglia website for more information about this powerful tool.

Tip

The Rocks Cluster Group maintains a similar web page called Meta that collects ganglia information from many clusters built with Rocks software. It may give you a glimpse of the power and scalability of the Ganglia monitors. The meta page is available at http://meta.rocksclusters.org/.

Ganglia was designed at Berkeley by Matt Massie ([email protected]) in 2000, and is currently developed by an open source partnership between Berkeley, SDSC, and others. It is distributed through Sourceforge.net under the GPL software liscence.