Rocks has introduced a simple alert system called Ganglia News that generates RSS items (blog-style) for significant events in the cluster.
Each type of event has a Python module called a Journalist that can detect it. This section describes how to write your own journalist.
Journalists are run by a nightly cron job. In this example, we setup a journalist that writes an item when the fan-speed for a node falls below a threshold level. It relies on the gmetric shown in the previous section.
Derive the news module from the Rocks class gmon.journalist. The run() method is called for you during the cron job. You should call the recordItem() function with a RSS-snippet for any relevant items, and a name for each of them. The item name is generally a host IP address.
The item name specifies a filename for the event, which is placed in:
/var/ganglia/news/[year]/[month]/[journalist-name]/[item-name].rss
The RSS news report is generated by concatenating all *.rss files in the current year and month. Therefore, news items will remain visible for the current month, then reset.
import gmon.journalist class FanSpeed(gmon.journalist.Journalist): """A news collector that detects low fan speeds.""" # When fan speed falls below this RPM level, we call it news. fanthresh = 500.0 def name(self): return "fan1-speed" def run(self): c = self.getGanglia().getCluster() for h in c.getHosts(): try: fanspeed = float(h.getMetricValue('fan1-speed')) if fanspeed < self.fanthresh: self.item(c, h, fanspeed) except: continue def item(self, cluster, host, fanspeed): s = '<item>\n' \ + ' <description>\n' \ + '%s. Node %s fan is malfunctioning: its speed is %s rpm.\n' \ % (self.getDate(), host.getName(), fanspeed) \ + '(Threshold is %s))\n' \ % (self.fanthresh) \ + ' </description>\n' \ + ' <link>%s</link>\n' % self.getHostPage(cluster, host) \ + ' <pubDate>%s</pubDate>\n' % self.getTime() \ + '</item>\n' self.recordItem(host.getIP(), s) def initEvents(): return FanSpeed |
We iterate over all nodes in cluster, looking for bad fans. The recordItem(), getTime(), getHostPage() methods are provided by the Journalist class. The RSS conforms to the 2.0 specification.
Make an RPM that places this file in:
/opt/ganglia/lib/python/gmon/news/fanspeed.py
Any name ending in *.py will work. Add the RPM to your roll and reinstall your compute nodes. See the ganglia-news package in Rocks Base for an example.
Point your RSS browser to the "News" link of your cluster (available on the cluster homepage). You should see a News item for each node that has a slow fan.