Setting up an FOS Cluster

Red Hat Linux 6.2: The Official Red Hat High Availability Server Installation Guide
Prev	Chapter 7. Failover Services (FOS)	Next

Minimal Prerequisite Information

The information needed to set up an FOS cluster falls into two categories:

Cluster-wide data
Service-specific data

The following tables list the minimal information needed in both categories to set up an FOS cluster:

Table 7-3. Required Cluster-Wide Information

Item	Comment
Ability to gain root access	Required to edit and copy the configuration file, and to start/stop FOS (via the `pulse` daemon).
Host IP address	Each cluster node's IP address must be included in the configuration file.
`rsh` (or optionally, `ssh`)	Each time the configuration file is set up or changed, it must be copied to all nodes in the cluster. Either rcp (in the case of `rsh`) or scp (in the case of `ssh`) will be used to perform the actual copy. Note that this copy is done as root. You must decide whether to use `rsh` or `ssh`.
Proper configuration of `rsh` or `ssh`	After deciding whether to use `rsh` or `ssh`, you must then configure the primary and backup nodes such that the root account may copy files between the two nodes without being prompted for a password.

Table 7-4. Required Per-Service Information

Item	Description
Service name	A general name for the service (such as "ftp1"). This name will be used to identify the service in the configuration and system log files.
Port number	The TCP/IP port number used by this service. The port number will be used by FOS to test whether the service is functional. For http web services, this port is usually 80. For ftp, port 21 is normally used. The service monitoring tests are described in the section called Service Monitoring.
Start command	The command or script used to start the service. The full path must be specified, and parameters are allowed. For most Linux services, the start command will be `/etc/rc.d/init.d/xxxxx` start (where `xxxxx` is the name of a service script, such as `httpd`, or `inetd` (for services, such as ftp, that are controlled by inet). In some cases, you may want to create your own scripts and reference them.
Stop command	Similar in concept to the start command referenced above, except that this is the command or script to execute to stop the service. The full path must be specified, and parameters are allowed.
VIP address (and device)	The virtual IP address used by clients on the public network to access this service. This address must be different from the IP address of either of the cluster nodes. This address will be manipulated and broadcast on the public network as part of the failover process. Services do not failover individually, so typically you would define all of the services to use the same VIP address. Also needed is the network device name (visible to the public network) to be used by the VIP address. Typically, the cluster nodes are connected using device `eth0`, so the first VIP address on that interface would be placed on `eth0:1`, a second VIP address would use `eth0:2`, and so on.

In addition, the following optional parameters can be specified to increase the level of service testing:

Table 7-5. Optional Per-Service Information

Item	Description
Send string	A string of printable characters to send to the service's port as part of service monitoring. The string can contain spaces, \n, and \r. During service testing, if a connection attempt to the service's port is successful, the send string will then be sent to that port. If an error occurs, then the service is considered dysfunctional.
Expect string	A string of printable characters that are expected from the service's port [a] . The string can contain spaces, \n, and \r. The string can also be a single asterisk ("*") character, indicating that a response is mandatory, but can consist of any characters. If the number of characters received is less than the length of expect string, or if any of those characters do not match the corresponding character in the expect string, the service is considered dysfunctional.
Timeout	The maximum number of seconds to wait for a connection attempt to complete before considering the attempt a failure. Also used as the number of seconds to wait for a read attempt to complete (for comparison to the service's expect string) before considering the attempt a failure. Default value is 10 seconds.
Notes: a. Note that a service may send out a message in response to a connection, or in response to a specific message written to the service's port once a connection has been made.

The Piranha Configuration File (`lvs.cf`)

The Piranha configuration file is used in several different cluster configurations, so some portions of the file do not apply to FOS. The file is broken down into three major sections:

Global cluster information
LVS-specific information
FOS-specific information

Only the first and last sections are applicable to FOS. The default name and location of the configuration file is /etc/lvs.cf.

Each time the configuration file is modified, it must be copied to all nodes in the cluster. All nodes must use the same configuration data. Mis-matched configuration files are one of the easiest ways to cause a dysfunctional cluster.

Each line in the configuration file must follow strict formatting:

Only one space before and after the the equal sign ("=")
No missing mandatory information
Numeric fields must contain only numbers
All characters after a "#" are treated as a comment

Any violation of these formatting rules will cause the pulse program to fail (sometimes with a core dump). Using the Piranha Web Interface is highly recommended and will ensure a properly-formatted configuration file.

The piranha-docs RPM provides a sample configuration file. It can be found in /usr/doc/piranha-docs*/sample.cf. The lvs.cf man page provides details on every possible entry in the configuration file.

The following tables list the configuration file entries that are applicable to setting up FOS. All items are required unless otherwise noted.

Global Cluster Entries

This table lists the global settings needed to define the cluster:

Table 7-6. Piranha Configuration File Settings — GLOBAL

Entry	Description
service = fos	Indicates that this configuration file is for FOS rather than lvs.
primary = `nnn.nnn.nnn.nnn`	Host IP address of the primary node in the cluster.
backup_active = 1	Indicates the backup node will be a member of the cluster.
heartbeat = 1	Indicates that heartbeats will be used in the cluster.
heartbeat_port = `nnnnn`	UDP port number to use for heartbeat. Need only be changed if multiple clusters exist and you need to prevent conflicts.
keepalive = `nnnnn`	Number of seconds between heartbeats.
deadtime = `nnnnn`	Number of seconds that must elapse without seeing a heartbeat from the partner system before declaring that it has failed. This number should be a multiple of the keepalive value.
rsh_command = `xxx`	Specified the command to use to copy files between systems. Must be either rsh or ssh.
network = nat	This entry is not used by FOS but must be set to a valid value. Leave as nat.
nat_router = `nnn.nnn.nnn.nnn` eth`n`:`n` (Should be on one line)	This entry is not used by FOS, but must be set to a valid value. Place any IP address you want here, as long as it does not conflict with the host IP address of the cluster nodes, or the VIP addresses of any services. Also specify a valid device (such as eth0), but make sure the virtual interface number (:`n`) does not conflict with any of the device definitions for the FOS services.

Per-Service Settings

For each service to include in FOS, there must be a block of data, enclosed in braces ({}). Each service block begins with the word failover, followed by a user-defined service name. The name is only used to tell the services apart, and only appears in the log files.

Table 7-7. Piranha Configuration File Settings — PER-SERVICE

Entry	Description
failover `xxxx` {	Starts a block definition for a failover service. `xxxx` can be any name desired, as long as it contains no whitespace or quotes. NOTE THE OPENING BRACE ON THIS LINE!
active = 1	Indicates that this service is to be included in FOS. A 0 means that this definition block is to be ignored
address = `nnn.nnn.nnn.nnn` eth`n`:`n` (Should be on one line)	Specified the VIP address and device for this service. The VIP address is usually the same for all services (and if so, the device entry must also be the same), but it can be different if desired. The device consists of two parts: the eth`n` part indicates the ethernet interface to use (the first interface would be named eth0, the second would be named eth1, and so on), while the :`n` part is a number indicating the number of the VIP address for the specified interface (:1 for the first VIP, :2 for the second, and so on).
port = `nnnnn`	The TCP/IP port number of this service. http is usually 80, ftp is usually 21, etc. The port number is used to test whether the service is responding or failing.
send = "`xxxxx`"	OPTIONAL. If specified, this string will be sent to the port as part of service monitoring. See the `lvs.cf` and `nanny` man pages for details.
expect = "`xxxxx`"	OPTIONAL. If specified, this string is expected in response to connecting to the service's port and/or sending the send string. The expect string must follow the same rules as the send string; it may also contain a single asterisk (""), which means that any character(s) received will be considered as matching the expect* string. The response from the port is compared (up to the length of the expect string), and a match indicates that the service is functional. See the `lvs.cf` and `nanny` man pages for details.
timeout = `nn`	OPTIONAL. If an expect string is specified, this entry indicates the number of seconds to await a response from the port before assuming a timeout failure. Default is 10 seconds.
start_cmd = "`xxxx`"	The script or command to perform in order for FOS to start this service. The command must include the full pathname. Parameters to the command/script may also be included, but must be separated by single spaces only.
stop_cmd = "`xxxx`"	The script or command to perform in order for FOS to stop this service. The command must include the full pathname. Parameters to the command/script may also be included, but must be separated by single spaces only.
}	The closing brace indicates the end of this service definition.

The start_cmd and stop_cmd Entries

Both the start_cmd and the stop_cmd entries are mandatory, and must be enclosed in double quotes. The specified value is the command or script to be executed that will start or stop the service. The full path must be specified. Parameters may also be specified; each one must be separated by a single space. The command used should be repeatable, meaning that no problems (other than a possible error return value) should occur if the command is executed several times in a row.

The send and expect Strings

The send string (send = "xxxx") is a text string to send to the service port in order to test whether the service is really functioning. The length is limited to 255 printable, quoteable text characters. Also permitted are \n, \r, \t, and '.

The expect string is similar to the send string. If it is specified, a read will be attempted on the service port, and the resulting response compared to the expect string. If the read attempt times out, or the number of characters read are less than the length of the expect string, then the service has failed. If the read is successful, then the expect string is compared to the response and if the characters match up to the length of expect string, the service is considered functional; otherwise it is considered to have failed. If an expect string is not specified, no read attempt will occur.

The expect string must follow the same rules as the send string, except for one additional feature. If expect = "*" is specified, then the service must send a response, but that response can consist of any characters and be any length.

	Please Note
	If the send string is omitted, then no attempt to send data to the service will occur EXCEPT if port = 80, and the expect string is also omitted. In this case, a web service is assumed and an internal http test string will be sent and expected. This condition exists for backwards compatibility with previous releases of Piranha.

If both send and expect strings are specified, the send string will always be sent first.

The following table lists samples of send and expect strings for some of the most common IP services.

	Please Note
	These strings are samples only. They may not be the best choices for all circumstances, and may not work on all systems.

Table 7-8. Sample send and expect Strings

Service	Port	send String	expect String
http/www	80	HTTP / HEAD/1.0\n\n	`HTTP`
ftp	21	\n
telnet	23	\n	`*`
lpd	515	\n	`lpd:`
ssh	22		`SSH`
inetd	98	\n	`500`
login	513	\n
bind	952	\n
smtp/sendmail	25	\n	`220`

File Copying with `rsh` and `ssh`

Piranha (as well as the system administrator) needs to be able to copy files between the cluster nodes (usually as root). There are also non-FOS situations where Piranha needs to execute commands on cluster nodes as part of statistic gathering. These tasks can be accomplished using either rsh or ssh.

One of these two applications must be specified in the configuration file. Each application also has specific configuration requirements (software installation, creation/modification of .rhosts files, etc.) that must be met in order for it to work. Consult the rsh or ssh man pages for further information.

Setting up an FOS Cluster Step-by-Step

Here is a sample step-by-step procedure for installing, configuring, and starting a new Piranha FOS cluster.

Make sure you have the basic resources for setting up the cluster readily available. This includes:
- The TCP/IP addresses and hostnames for the network interfaces, and/or a functioning DHCP server environment.
- A list of the TCP/IP services, and their port numbers, that will be set up to failover. Some examples might include http on port 80, ftp on port 21, etc.
- Access to a nearby client system with a running web browser for setting up the Piranha software, and for testing the http service. You can also set up the server as a functioning workstation with web browser, but you will need to select CUSTOM during the product installation in order to install those components.
- The ability to use a text editor such as vi or emacs to edit configuration files.
Install the product by following the on-screen displays, and by following the instructions in the installation section of this document.
Log into both nodes as root and perform the following basic operations:
- Execute the /usr/sbin/piranha-passwd script to set an access password for the Piranha Web Interface.
- Edit the /etc/hosts files and add entries for each of the cluster nodes.
- Edit /etc/hosts.allow, and enable access for the appropriate service(s) for all cluster nodes. Use the commented examples in the file as a guideline.
- Edit the ~root/.rhosts files and add entries for each of the cluster nodes so that the root account can be used with rsh and rcp.
- If desired, you may also want to set up ssh and scp in addition to (or instead of) using rsh and rcp. Follow the appropriate instructions for that software.
Make sure that each system can ping the other by IP address and name. At this point, copying files between the systems using rcp (or scp if set up) when logged in as root should also work. As an example, the following command should work (assuming you will be using rcp):
rcp myfile node2:/tmp/myfile

Configure Apache on both nodes by editing /etc/httpd/conf/httpd.conf, and setting the ServerName parameter appropriately. Start Apache by using the /etc/rc.d/init.d/httpd script, and passing the start or restart parameter as appropriate.

Although the following should have been done for you after you've installed Red Hat High Availability Server, the following configuration sections must be set in order for the Piranha Web Interface to work properly. First, the following entry should be present in the /etc/httpd/conf/httpd.conf file:

<Directory /home/httpd/html/piranha>
  Options All
  AllowOverride All
</Directory>

You should also find the following entries in the same file:

LoadModule php3_module        modules/libphp3.so
AddModule mod_php3.c
…
<IfModule mod_php3.c>
  AddType application/x-httpd-php3 .php3
  AddType application/x-httpd-php3-source .phps
</IfModule>

Log into the client system and start up the web browser. Access the URL http://xxxxx/piranha/ where xxxxxx is the hostname or IP address of the PRIMARY node in the cluster. You should see the configuration page for the Piranha software.
Configure the software as needed for your setup, following the information detailed in other sections of this document. Your changes should be present in the file /etc/lvs.cf on that cluster node.
Make sure the configuration files on all nodes are identical by copying the new file on the primary node to the other node by using the rcp or scp command (for example):
rcp /etc/lvs.cf node:/etc/lvs.cf
If this does not work, you will have to investigate the configuration changes for the rsh or ssh software you made earlier.
Changes to the configuration file will require that the file be re-copied to all the nodes, and that Piranha be stopped and restarted. (Note: A future release of Piranha may automate this process.)
On each node, one at a time, disconnect the node from the network and try the following tests (detailed in later sections of this chapter).
- Start and stop the IP services you intend to use in FOS by typing the exact commands specified in the services' start and stop lines, as listed in the cluster configuration file.
- With the service running, use telnet and attempt to connect to that service's TCP/IP port (for example):
  telnet localhost 80
  If this connection is refused, then that service might not be working or the port number is incorrect. Note that if telnet does connect to the service, the telnet session may be "hung" afterwards and you may have to terminate the telnet process in order to disconnect.
With the services down and the system connected to the network, start the pulse program on the primary node by executing the following command:
/etc/rc.d/init.d/pulse start
After a time, the Piranha software and all the services should be running. Keep watch on this by using the ps -ax command, and by examining the tail end of the /var/log/messages log file.
Start the pulse program on the other cluster node. Both nodes should now be running and monitoring each other.
You can test failover by disconnecting the network connection on the active node, shutting down a monitored TCP/IP service on the active node, or by terminating a nanny process on the inactive node by using the command kill -s SIGTERM nnn (where nnn is the pid of a nanny process).

Your FOS cluster should now be completely operational.

Prev	Home	Next
FOS Architecture	Up	Testing FOS