9. LVS: LocalNode

We rarely hear of anyone using this to make a director function as a normal realserver. However more specialised roles have been found for localnode.

With localnode, the director machine can be a realserver too. This is convenient when only a small number of machines are available as servers.

To use localnode, with ipvsadm you add a realserver with IP 127.0.0.1 (or any local IP on your director). You then setup the service to listen to the VIP on the director, so that when the service replies to the client, the src_addr of the reply packets are from the VIP. The client is not connecting to a service on 127.0.0.1 (or a local IP on the director), despite ipvsadm installing a service with RIP=127.0.0.1.

Some services, e.g. telnet listen on all IP's on the machine and you won't have to do anything special for them, they will already be listening on the VIP. Other services, e.g. http, sshd, have to be specifically configured to listen to each IP.

Note
Configuring the service to listen to an IP which is not the VIP, is the most common mistake of people reporting problems with setting up LocalNode.

LocalNode operates independantly of NAT,TUN or DR modules (i.e. you have have LocalNode running on a director that is forwarding packets to realservers by any of the forwarding methods).

Horms 04 Mar 2003

from memory, this is what is going to happen: The connection will come in for VIP. LVS will pick this up and send it to the realserver (which happens to be a local address on the director e.g.192.168.0.1). As this address is a local IP address, the packet will be sent directly to the local port without any modification. That is, the destination IP address will still be the VIP, not 192.168.0.1. So I am guessing that an application that is only bound to 192.168.0.1 will not get this connection.

9.1. Two Box LVS

It's possible to have a fully failover LVS with just two boxes. The machine which is acting as director, also is acting as a realserver using localnode. The second box is a normal realserver. The two boxes run failover code to allow them to swap roles as directors. The two box machine is the minimal setup for an LVS with both director and realserver functions protected by failover.

An example two box LVS setup can be found at http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-eg.html. UltraMonkey uses LVS so this setup should be applicable to anyone else using LVS.

Salvatore D. Tepedino sal (at) tepedino (dot) org 21 Jan 2004

I've set one up before and it works well. Here's a page http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-overview.html that explains how it's done. You do not have to use the ultramonkey packages if you don't want to. I didn't and it worked fine.

In practice, having the director also function as a realserver, complicates failover. The realserver, which had a connection on VIP:port will have to release it before it can function as the director, which only forwards connections on VIP:port (but doesn't accept them). If after failover, the new active director is still listening on the LVS'ed port, it won't be able to forward connections.

Karl Kopper karl (at) gardengrown (dot) org 22 Jan 2004

At failover time, the open sockets on the backup Director may survive when the backup Director acquires the (now arp-able) VIP (of course the localnode connections to the primary director are dropped anyway), but that's not going to happen at failback time automatically. You may be able to rig something up with ipvsadm using the --start-daemon master/backup, but it is not supported "out-of-the-box" with Heartbeat+ldirectord. (I think this might be easier on the 2.6 kernel btw). Perhaps what you want to achieve is only possible with dedicated Directors not using LocalNode mode.

The "Two Box LVS" is only suitable for low loads and is more difficult to manage than a standard (non localnode) LVS.

Horms horms (at) verge (dot) net (dot) au 23 Jan 2004

The only thing that you really need to consider is capacity. If you have 2 nodes and one goes down, then will that be sufficient untill you can bring the failed node back up again? If so go for it. Obviously the more nodes you have the more capacity you have - though this also depends on the capacity of each node.

My thinking is that for smallish sites having the linux director as a machine which is also a realserver is fine. The overhead in being a linux director is typically much smaller than that of a realserver. But once you start pushing a lot of traffic you really want a dedicated pair of linux directors.

Also once you have a bunch of nodes it is probably easier to manage things if you you know that these servers are realservers and those ones are linux directors, and spec out the hardware as appropriate for each task - e.g. linux-directors don't need much in the way of storgage, just CPU and memory.

Horms horms (at) verge (dot) net (dot) au 26 Aug 2003

The discussion revolves around using LVS where Linux Directors are also realservers. To complicate matters more there are usually two such Linux Directors that may be active or standby at any point in time, but will be Real Servers as long as they are available.

The key problem that I think you have is that unless you are using a fwmark virtual service then the VIP on the _active_ Linux Director must be on an interface that will answer ARP requests.

To complicate things, this setup really requires the use of LVS-DR and thus, unless you use an iptables redirect of some sort, the VIP needs to be on an interface that will not answer ARP on all the realservers. In this setup that means the stand-by Linux Director.

Thus when using this type of setup with the constraints outlined above, when a Linux Director goes from strand-by to active then the VIP must go from being on an interface that does not answer ARP to an interface that does answer ARP. The opposite is true if a Linux Director goes from being active to stand-by.

In the example on ultramonkey.org the fail-over is controlled by heartbeat (as opposed to Keepalived which I believe you are using). As part of the fail-over process heartbeat can move an interface from lo:0 to ethX:Y and reverse this change as need be. This fits the requirement above. Unfortunately I don't think that Keepalived does this, though I would imagine that it would be trivial to implement.

Another option would be to change the hidden status of lo as fail-over occurs. This should be easily scriptable.

There are some more options too: Use a fwmark service and be rid of your VIP on an interface all together. Unfortunately this probably won't solve your problem though, as you really need one VIP in there somewhere. Or instead of using hidden interfaces just use an iptables redirect rule. I have heard good reports of people getting this to work on redhat kernels. I still haven't had time to chase up whether this works on stock kernels or not (sorry, I know it has been a while).

(For other postings on this thread see the mailing list archive http://marc.theaimsgroup.com/?l=linux-virtual-server&m=103612116901768&w=2.)

9.2. Testing LocalNode

If you want to explore installing localnode by hand, try this. First make sure scheduling is turned on at the director (this command adds round robin scheduling and direct routing)

#ipvsadm -A -t 192.168.1.110:23 -s rr

With an httpd listening on the VIP (192.168.1.110:80) of the director (192.168.1.1) AND with _no_ entries in the ipvsadm table, the director appears as a normal non-LVS node and you can connect to this service at 192.168.1.110:80 from an outside client. If you then add an external realserver to the ipvsadm table in the normal manner with

#/sbin/ipvsadm -a -t 192.168.1.110:80 -r 192.168.1.2

then connecting to 192.168.1.110:80 will display the webpage at the realserver 192.168.1.2:80 and not the director. This is easier to see if the pages are different (eg put the real IP of each machine at the top of the webpage).

Now comes the LocalNode part -

You can now add the director back into the ipvsadm table with

/sbin/ipvsadm -a -t 192.168.1.110:80 -r 127.0.0.1

(or replace 127.0.0.1 by another IP on the director)

Note, the port is the same for LocalNode. LocalNode is independant of the LVS mode (LVS-NAT/Tun/DR) that you are using for the other IP:ports.

Shift-reloading the webpage at 192.168.1.110:80 will alternately display the wepages at the server 192.168.1.2 and the director at 192.168.1.1 (if the scheduling is unweighted round robin). If you remove the (external) server with

/sbin/ipvsadm -d -t 192.168.1.110:80 -r 192.168.1.2

you will connect to the LVS only at the directors port. The director:/etc/lvs# ipvsadm table will then look like

Protocol Local Addr:Port ==>
                        Remote Addr           Weight ActiveConns TotalConns
                        ...
TCP      192.168.1.110:80 ==>
                        127.0.0.1             2      3           3

From the client, you cannot tell whether you are connecting directly to the 192.168.1.110:80 socket or through the LVS code.

9.3. Localnode on the backup director

With dual directors in active/backup mode, some people are interested in running services in localnode, so that the backup director can function as a normal realserver rather than just sit idle. This should be do-able. There will be extra complexity in setting up the scripts to do this, so make sure that robustness is not compromised. The cost of another server is small compared to the penalties for downtime if you have tight SLAs.

Jan Klopper janklopper (at) gmail (dot) com 2005/03/02

I have 2 directors running hearthbeat and 3 realservers to process the requests. I use LVS-DR and want the balancers to also be realservers. Both directors are setup with localnode to serve requests when they are the active director, but when they are the inactive director, it is idle.

If I add the VIP with noarp to the director, hearthbeat would not be able to setup the VIP when it becomes the active director. Is there any way to tell hearthbeat to toggle the noarp switch on the load balancers instead of adding/removing the VIP?

Ideal sollution would be like this: secondary loadbalancer carries the VIP with noarp (trough noarp2.0/noarpctl) and can thus be used to process querys like any realserver. If the primary loadbalancer fails, the secondary loadbalancer would disable the noarp program, and thus start arping for the VIP, becoming the load balancer, using the local node feature to continue processing requests. If the primary load balancer comes back up, it either takes the role as secondary server (and adds the VIP with noarp to become a realserver), or becomes the primary load balancer agian, which would trigger the secondary load balancer to add the noarp patch again, (which would make it behave like a realserver again)

I figured we could just do the following:

  • replace the line that says, ifconfig eth0 add VIP netmask ... with: noarpctl del VIP RIP.
  • And the other way around: replace the line: ifconfig eth0:0 del VIP netmask ... with noarpctl add VIP RIP

the only point I don't know for sure is: will the new server begin replying to arp requests as soon as noarp has been deleted?

Joe

yes. However the arp caches for the other nodes will still have the old MAC address for the VIP and these take about 90secs to expire. Until the arp cache expires and the node makes another arp request, the node will have the wrong MAC address. Heartbeat handles this situation by sending 5 gratuitous arps (arp broadcasts) using send_arp just to make sure everyone on the net knows the new MAC address for the VIP.

Graeme Fowler graeme (at) graemef (dot) net (addressing the issue that complexity is not a problem in practice)

I've got a 3-node DNS system using LVS-DR, where all 3 nodes are directors and realservers simultaneously. I'm using keepalived to manage it all and do the failover, with a single script running when keepalived transitions from MASTER - BACKUP or FAULT and back again. It uses iptables to add an fwmark on the incoming requests, then uses the fwmark check for the LVS. Basic configuration is as follows:

global_defs {
<snipped notifications>
lvs_id DNS02
}

static_routes {
# backend managment LAN
1.2.0.0/16 via 1.2.0.126 dev eth0
}

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!
! VRRP synchronisation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!
vrrp_sync_group SYNC1 {
group {
DNS_OUT
GW_IN
}
}
vrrp_instance DNS_1 {
state MASTER
interface eth0
track_interface {
eth1
}
lvs_sync_daemon_interface eth0
virtual_router_id 111
priority 100
advert_int 5
smtp_alert
virtual_ipaddress {
5.6.7.1 dev eth1
5.6.7.2 dev eth1
}
virtual_ipaddress_excluded {
5.6.7.8 dev eth1
5.6.7.9 dev eth1
}
virtual_routes {
}
notify_master "/usr/local/bin/transitions MASTER"
notify_backup "/usr/local/bin/transitions BACKUP"
notify_fault  "/usr/local/bin/transitions FAULT"
}
vrrp_instance GW_IN {
state MASTER
garp_master_delay 10
interface eth0
track_interface {
eth0
}
lvs_sync_interface eth0
virtual_router_id 11
priority 100
advert_int 5
smtp_alert
virtual_ipaddress {
1.2.0.125 dev eth0
}
virtual_routes {
}
}
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!
! DNS TCP
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!
virtual_server fwmark 5 {
smtp_alert
delay_loop 30
lb_algo wlc
lb_kind DR
persistence_timeout 0
protocol TCP
real_server 1.2.0.2 53 {
weight 10
inhibit_on_failure
TCP_CHECK {
connect_timeout 10
connect_port 53
}
MISC_CHECK {
misc_path "/usr/bin/dig @1.2.0.2 -p 53 known_zone soa"
misc_timeout 10
}
}
<snip other realservers>
<snip UDP realservers>

...Where /usr/local/bin/transitions is:

#!/bin/bash

IPLIST="/etc/resolver_ips"
IPCMD="/sbin/ip addr"

if [ ! -f $IPLIST ]
then
echo "No resolver list found, exiting"
exit 127
fi

if [ $1 ]
then
SWITCH=$1
else
# No command, quit
echo "No command given, exiting"
exit 126
fi


if [ $SWITCH = "MASTER" ]
then
DO="del"
elif [ $SWITCH = "BACKUP" -o $SWITCH = "FAULT" ]
then
DO="add"
else
# No command, quit
echo "Invalid command given, exiting"
exit 126
fi

if [ $DO = "add" ]
then
# we cycle through and make the IPs in /etc/resolver_ips loopback live
# We're in BACKUP or FAULT here
for addr in `cat $IPLIST`
do
$IPCMD $DO $addr dev lo
done
/sbin/route del -net 5.6.7.0 netmask 255.255.255.0 dev eth1
/usr/bin/killall -HUP named
elif [ $DO = "del" ]
then
# we do the reverse
# We're in MASTER here
for addr in `cat $IPLIST`
do
echo $IPCMD $DO $addr dev lo
done
/sbin/route add -net 5.6.7.0 netmask 255.255.255.0 dev eth1
/usr/bin/killall -HUP named
else
echo "Something is wrong, exiting"
exit 125
fi

### EOF /usr/local/bin/transitions

...and /etc/resolver_ips contains:

5.6.7.1/32
5.6.7.2/32
5.6.7.3/32
5.6.7.4/32

...and in /etc/sysctl.conf we have (amongst other things):

# Don't hide mac addresses from arping out of each interface
net.ipv4.conf.all.arp_filter = 0
# Enable configuration of hidden devices
net.ipv4.conf.all.hidden = 1
# Make the loopback device hidden
net.ipv4.conf.lo.hidden = 1

So we have a single MASTER and two BACKUP directors in normal operation, where the MASTER has "resolver" IP addresses on its' "external" NIC, and the BACKUP directors have them on the loopback adapter. Upon failover, the transitions script moves them from loopback to NIC or vice-versa. The DNS server processes themselves are serving in excess of 880000 zones using the DLZ patch to BIND so startup times for the cluster as a whole are really very short (it can be cold-started in a matter of minutes). In practice the system can cope with many thousands of queries per minute without breaking a sweat, and fails over from server to server without a problem. You might think that this is an unmanageable methodology and is impossible to understand, but I think it works rather well :)

9.4. rewriting, re-mapping, translating ports with Localnode

see Re-mapping ports in LVS-DR with iptables

9.5. One Box LVS

Note
This is NOT Local Node. It's experimental code from Horms to allow running realservers on the director (ask Horms for the code off-list). This allows you to test LVS on one box.

Dave Whitla 20 Jun 2005

I am trying to load balance to two "real" servers which are actually listening on virtual IPs on the load-balancing host. Why would I want to do this? To build a test environment for a web application which usually runs on an IPVS cluster. The purpose of the test environment is to test for databasecache contention issues before we deploy to our production cluster. The catch is I must make the test environment (lvs director + 2 x application server instances) run on one physical host (each developer's development machine).

The man page for ipvsadm makes specific mention of forwarding to realservers which are in fact running on local interfaces stating that the load balancing "forward" mechanism specified for a virtual service is completely ignored if the kernel detects that a real server's IP is actually a local interface. The "Local Node" page describes a configuration in which I could load balance between a "real" server running on the director's virtual service IP and a real server running on another host. This does not solve my problem however as I must bind each instance of my application to a different IP address on the same physical box.

You may be thinking "Why not run the two instances on different ports on the same IP (the virtual service IP)?". Sadly the application is not a simple web-site, and source code and deployment container dependencies on certain port numbers exist. eg RMI-IIOP listeners.

Does anyone know of some config or kernel hack, patch or whatever which might make my ipvs present forwarded packets to the virtual interfaces as though they had appeared on the wire so that my forward directives are not ignored and the packets are not simply presented to the TCP stack for the virtual service IP? I guess this is like NAT to local destination addresses (as opposed to NAT of locally originated connections which is supported in the kernel).

Horms

this is a pretty interesting problem that crops up all the time. I have often wondered how hard it would be to make nat work locally (not that LVS-Tun and LVS-DR don't/can't support portmaping anyway). I have a patch for 2.6.12 is a hack that allows nat to work locally by:

  • Not marking local real-servers as local
  • Passing nat-mangled packets down to local processes instead of passing them out onto the network
  • Reversing nat in the LOCAL_IN chain

Please note that this completely breaks NAT for non-Local hosts. It could be cleaned up and made so it doesn't do that. But I thought I'd throw it onto the list before putting any more time into it.

Horms 21 Jun 2005

Here is my second attempt. This should automatically switch local real-servers to Local unless the requested forwarding method is Masq and the real port differs from the virtual port. That is, if you want to do portmaping on a local service it will use Masq, otherwise it will use Local. It seems to work, but there are probably a few gotchas in there and I haven't tested a whole lot.