Installation

7.1. Installation

7.1.1. Insert-ethers never sees new compute nodes. I also don't see any DHCP messages from compute nodes on the frontend. What is wrong?
7.1.2. While trying to bring up a compute node, I boot it from the Rocks Boot CD, and when I plug a monitor into the compute node, I see the error message 'Error opening kickstart file /tmp/ks.cfg. No such file or directory' or I see a screen on the compute node asking me to select a language. What went wrong?
7.1.3. I successfully installed all the Rolls, but during the last stage after the machine reboots, the system hangs with the error: GRUB Loading Stage2.... What went wrong?
7.1.4. When I try to install a compute node, the error message on the compute node says, "Can't mount /tmp. Please press OK to restart". What should I do?
7.1.5. My compute nodes don't have a CD drive and my network cards don't PXE boot, but my compute nodes do have a floppy drive. How can I install the compute nodes?

7.1.1. Insert-ethers never sees new compute nodes. I also don't see any DHCP messages from compute nodes on the frontend. What is wrong?

Try bypassing the network switch connecting your nodes to the frontend. The swich may be configured to squash broadcast messages from unknown IP addresses, which drops DHCP messages from nodes. To verify your switch is indeed the problem:

  1. Connect a crossover cable (or a normal cable if you use Gigabit Ethernet) between a single compute node and the frontend's "eth0" interface.

  2. Install the compute node normally (install compute nodes). You should see the DHCP messages from the node at the frontend.

7.1.2. While trying to bring up a compute node, I boot it from the Rocks Boot CD, and when I plug a monitor into the compute node, I see the error message 'Error opening kickstart file /tmp/ks.cfg. No such file or directory' or I see a screen on the compute node asking me to select a language. What went wrong?

A compute node kickstart requires the following services to be running on the frontend:

  1. dhcpd

  2. httpd

  3. mysqld

  4. autofs

To check if httpd and mysqld are running:

# ps auwx | grep httpd
# ps auwx | grep mysqld

If either one is not running, restart them with:

# /etc/rc.d/init.d/httpd restart

and/or

# /etc/rc.d/init.d/mysqld restart

The autofs service is called 'automount'. To check if it is running:

# ps auwx | grep automount

If it isn't, restart it:

# /etc/rc.d/init.d/autofs restart

Finally, to test if the Rocks installation infrastructure is working:

# cd /home/install
# ./sbin/kickstart.cgi --client="compute-0-0"

This should return a kickstart file.

And to see if there are any errors associated with kickstart.cgi:

# ./sbin/kickstart.cgi --client="compute-0-0" > /dev/null

7.1.3. I successfully installed all the Rolls, but during the last stage after the machine reboots, the system hangs with the error: GRUB Loading Stage2.... What went wrong?

This is an intermittent problem we've seen in the lab as well. The installation is fine, except that the grub installation program, for an unknown reason, did not run correctly.

Here is a workaround:

  • Put the Rocks Boot Roll CD in the frontend and boot the frontend.

  • At the boot prompt, type:

    frontend rescue
  • A screen will appear, click the Continue button.

  • When you see the shell prompt, execute:

    # chroot /mnt/sysimage
  • Run the grub installation program:

    # /sbin/grub-install `awk -F= '/^#boot/ { print $2 }' /boot/grub/grub.conf`

    This should output something similar to:

    Installation finished. No error reported.
    This is the contents of the device map /boot/grub/device.map.
    Check if this is correct or not. If any of the lines is incorrect,
    fix it and re-run the script `grub-install'.
    
    # this device map was generated by anaconda
    (fd0)     /dev/fd0
    (hd0)     /dev/hda
  • Exit the chroot environment:

    # exit
  • Reboot the frontend.

  • Take the CD out of the drive and the frontend should come up cleanly.

7.1.4. When I try to install a compute node, the error message on the compute node says, "Can't mount /tmp. Please press OK to restart". What should I do?

Most likely, this situation arises due to the size of the disk drive on the compute node. The installation procedure for Rocks formats the disk on the compute node if Rocks has never been installed on the compute node before.

The fix requires changing the way Rocks partitions disk drives. See Partitioning for details.

7.1.5. My compute nodes don't have a CD drive and my network cards don't PXE boot, but my compute nodes do have a floppy drive. How can I install the compute nodes?

You will create a boot floppy that emulates the PXE protocol. This is accomplished by going to the web site:

ROM-o-matic.net

Then click on the version number under the Latest Production Release (as of this writing, this is version 5.4.0).

Select your device driver in item 1. Keep the default setting in item 2 (Floppy bootable ROM Image). Then click "Get ROM" in item 4.

We suggest using dd to copy the downloaded floppy image to the floppy media. For example:

# dd if=eb-5.4.0-pcnet32.zdsk of=/dev/fd0

Then run insert-ethers on your frontend and boot your compute node with the floppy.