System Administration

7.3. System Administration

7.3.1. What security patches should I be running with Rocks?
7.3.2. Why can't I re-kickstart compute nodes? Shoot-node fails, and power cycling the machine doesn't force a re-install.
7.3.3. I see IP addresses not names in my Ganglia graphs. Why is this?
7.3.4. When looking at the Ganglia page, I dont see graphs, just the error:
7.3.5. How do I use user accounts from an external NIS server on my cluster?

7.3.1. What security patches should I be running with Rocks?

Every Rocks release is a snapshot of the most up to date software packages from Red Hat for whatever release Rocks is based on. For example, Rocks 2.2 included all the Red Hat 7.2 updates on the of our release. If you use rocks-dist to mirror your distribution from us you will get updates to our software and some Red Hat updates as we see fit to push them out. There is no automated method of updating the software packages on your frontend machine. This means you will have to treat it like any other machine on the Internet and track security updates yourself and updates vulnerable services. Only you can decide what level of security is appropriate for your site. Some sites prefer to secure only network services while others secure the network and local host services.

The default Rocks configuration sets up and a firewall on the frontend and permits only SSH traffic into the cluster. The minimum security requirement is to track SSH updates and apply them to your frontend when needed.

7.3.2. Why can't I re-kickstart compute nodes? Shoot-node fails, and power cycling the machine doesn't force a re-install.

Older BIOS versions required the boot image to reside in the first GB of the hard disk. If the boot image, in this case the kickstart kernel image, resides after the first GB of the disk the image will not be loded by LILO. Update your BIOS and this problem should get fixed. Our long term boot loader strategy is to move to GRUB which should fix this problem even on older machines.

7.3.3. I see IP addresses not names in my Ganglia graphs. Why is this?

The DNS system in the cluster sometimes causes Ganglia to record bogus node names (usually their IP addresses). To clear this situation, restart the "gmond" and "gmetad" services on the frontend. This action may be useful later, as it will flush any dead nodes from the Ganglia output.

# service gmond restart
# service gmetad restart

This method is also useful when replacing or renaming nodes in your cluster.

7.3.4. When looking at the Ganglia page, I dont see graphs, just the error:

There was an error collecting ganglia data (127.0.0.1:8652): XML error: not well-formed (invalid token) at xxx

This indicates a parse error in the Ganglia gmond XML output. It is generally caused by non-XML characters (& especially) in the cluster name or cluster owner fields, although any ganglia field (including node names) with these characters will cause this problem.

We hope future versions of Ganglia will correctly escape all names to make them XML safe. If you have a bad name, to edit /etc/gmond.conf on the frontend node, remove the offending characters, then restart gmond.

7.3.5. How do I use user accounts from an external NIS server on my cluster?

While there is no certain method to do this correctly, if necessary we recommend you use "ypcat" to periodically gather external NIS user accounts on the frontend, and let the default 411 system distribute the information inside the cluster.

The following cron script will collect NIS information from your external network onto the frontend. The login files created here will be automatically distributed to cluster nodes via 411. This code courtesy of Chris Dwan at the University of Minnesota.

(in /etc/cron.hourly/get-NIS on frontend)

#!/bin/sh
ypcat -k auto.master > /etc/auto.master
ypcat -k auto.home   > /etc/auto.home
ypcat -k auto.net    > /etc/auto.net
ypcat -k auto.web    > /etc/auto.web

ypcat passwd      > /etc/passwd.nis
cat   /etc/passwd.local /etc/passwd.nis > /etc/passwd.combined
cp    /etc/passwd.combined /etc/passwd

ypcat group       > /etc/group.nis
cat   /etc/group.local /etc/group.nis > /etc/group.combined
cp    /etc/group.combined /etc/group

Caution

There is no way to insure that UIDs GIDs from NIS will not conflict with those already present in the cluster. You must always be careful that such collisions do not occur, as unpredicatble and undefined behavior will result.