15. LVS: Services: multi-port

15.1. Introduction

While single-port services all use the same scheme (server listens, client connects), multi-port services each have their own scheme (ftp has two schemes, active and passive). For multi-port services, the initial connection is the standard single-port connection, but the setup of the 2nd (or more) port occurs through information sent in the payload of the connection to the first port. The director does not inspect the payload of packets and has no information about subsequent connection(s) that the client and realserver is attempting to setup. Approaches used to load balance multi-port services are

  • Use persistence to all ports at once on the realserver. (Persistence can also be set for a single port, but this is not used here).

    This is a brute force approach. Once the initial connection is made from the client to the first port on the realserver, then any packet from any port on the client is forwarded to any port that the client requests on the realserver. This has been the approach historically used for ftp on LVS-DR or LVS-Tun. While it works, it is not secure, since any packets are allowed between the client and the LVS, and not just the packets required for the ftp transfer. For ftp, where no state is maintained on the realserver and where idle timeouts are just a matter of the client reconnecting, then persistence is a satisfactory solution for LVS/ftp. It would be nice if we could do better than this, but currently this is the state of the art for LVS with ftp.

  • Use other code to inspect the payload of packets that are passed in the first port opened. Since this code must talk to ip_vs, it must run on the director. All packets in the first connection then must pass through the director and so this approach will only work for LVS-NAT (or LVS-DR with Julian's forward shared patch) (for LVS-DR, LVS-Tun, the packets returning to the client from the realservers, go directly to the client and not through the director). Code which inspects packets passing through the director to aid setup of other ports includes
    • Helper modules: ftp is the only multi-port service for which a helper module has been written (see LVS-NAT ftp helper module).
    • fwmark: for ftp this requires the contrack module ip_conntrack_ftp to look for packets which are RELATED.
  • e-commerce sites with fwmark listening on ports 80 and 443: This is not a multi-port tcpip protocol. A multi-port tcpip protocol requires one demon running on the realserver sending packets on two ports. For an e-commerce site, the connections are independant at the tcpip level and are serviced by different demons. For LVS, it is convenient to think of an e-commerce site as multi-port, for following the initial connection to port 80, you want the client's subsequent connection to 443 to go to the same realserver. This is handled by persistence or by persistent fwmark.

15.2. ftp general, active tcp 20,21; passive 21,high_port

ftp is a 2 port service in both active and passive modes. For a description of active and passive ftp see Active FTP vs. Passive FTP, a Definitive Explanation on Slaksite. Also see the RFC 1579 for passive ftp and the RFC 959 for ftp (where ftp is referred to as just "ftp", but with the arrival of passive ftp, is now called "active ftp"). The usual resource for this sort of information, "TCP/IP Illustrated Vol 1", by W. Richard Stevens (Chapter 27 on FTP), only discusses what is now called active FTP.

Useful links (from Ratz 30 Nov 2003) http://www.ssh.com/support/documentation/online/ssh/winhelp/32/Forwarding_FTP.html Forwarding ftp. (port forwarded ftp is not the same as sftp or ftps, ssl based ftp).

Because of the problems securing ftp, Ratz suggests that you use a single ftp server that is not part of your LVS and secure it separately.

15.3. ftp helper modules: ip_vs_ftp/ip_masq_ftp

The ip_vs build produces the modules ip_masq_ftp (2.2.x) or ip_vs_ftp (2.4.x and later, written as a netfilter module). The ip_masq_ftp module is a patched version of the file which allowed ftp through a NAT box. The patch stopped the original function (at least in early versions of LVS) and is probably why it has a new name in 2.4.x kernels.

The ip_vs_ftp module will autoload (Nov 2003) when ipvsadm is invoked - check that the module is loaded by running lsmod.

The ipvs ftp helper module needed for LVS-NAT has resulted in a disproportionate number of problems on the LVS mailing list (presumably this will continue). In Dec 2006, Eric Robinson eric (dot) robinson (at) pmcipa (dot) com was the unwitting guinea pig in straightening some of this out.

Problems include:

  • Few people are using LVS-NAT with ftp, so we wouldn't hear any problems even if the helper module was completely broken. When we hear of a problem we don't know whether to believe the poster, since we haven't heard a problem with ftp for ... oh you know, years.
  • Bugs have affected other services and we get reports of problems with (say) http when no-one is using the LVS'ed ftp service and the poster doesn't tell us that they have LVS'ed ftp (and doesn't realise that it's relevant).
  • different ftpd demons give different responses to calls from the client and listen on different ports. Unless you take appropriate action, ftp demons listening on non-standard ports stop working, when put behind an LVS director.
  • the docs and fuctionality were out of step for quite a while.

    Tony Clarke sam (at) palamon (dot) ie found (Sep 2002) that the ftp helper module ip_masq_ftp had not been patched for LVS for 2.2.19 at least a year after its release.

    I was testing ftp with its default settings (without being terribly aware that I was using active ftp) and found that I didn't need the helper module. It took at least a year before anyone else (Wensong 17 Sep 2002) would agree with me. The conventional wisdom from 2002-2006 was that the ftp helper module wasn't needed for active ftp. I thought the helper function for active ftp must have been in ip_vs. A possible explanation is Mark de Vries comment immediately below, although not having the setup around any more I don't know for sure.

  • Mark de Vries markdv (dot) lvsuser (at) asphyx (dot) net 23 Dec 2006.

    ftp-clients don't care which IP the connection originates from.

    Joe - the ftp-data connection then would originate on the RIP, rather than the VIP. With the ftp helper, the ftp-data connections would be nat'ed to src_addr=VIP. In my test setup, with no ftp helper and two private networks (which I routed locally), the packets src_addr=RIP:ftp-data would have been routed directly through the director to the CIP. Complicating matters, I don't remember whether the ftpd was listening to the VIP or 0.0.0.0.

Mark de Vries markdv (dot) lvsuser (at) asphyx (dot) net 23 Dec 2006

from ip_vs_ftp.c:

/*
 * Look at incoming ftp packets to catch the PASV/PORT command
 * (outside-to-inside).
 *
 * The incoming packet having the PORT command should be something like
 *      "PORT xxx,xxx,xxx,xxx,ppp,ppp\n".
 * xxx,xxx,xxx,xxx is the client address, ppp,ppp is the client port number.
 * In this case, we create a connection entry using the client address and
 * port, so that the active ftp data connection from the server can reach
 * the client.
 */

So that would suggest (to me) that you do need the ip_vs_ftp helper module, to do the src address translation in the active connection from server to client.

Horms 27 Dec 2006

I just skimmed through the code, and the helper seems to listen for both the PASV and PORT command. My FTP knowledge is a bit rusty, but I think the latter is for non-passive ftp, so yes it seems to be needed for both.

The auto-loading is just a hack for the convenience of most people. Basically, in recent versions of ipvsadm, if you're setting up a virtual service on port 21, it guesses that there is a good chance that it is ftp and tries to load ip_vs_ftp. The ftp helper auto-load went in on 9 Oct 2003 - look at the date of your ipvsadm (due to a releaes procedure that is beyond my control, it seems that ipvsadm has been released multiple times with the version number of 1.24. Indeed, the version only seems to denote that it is the ipvsadm that works with the 2.6 kernels, or perhaps an revision of the ABI, rather than a release of the utility itself. Grrr. - i.e. the version number doesn't mean anything.)

If you are using a port other than 21, then you will need to set the ports argument to the module when it is loded

insmod ip_vs_ftp.ko ports=8021

The default is 21. You can have up to IP_VS_APP_MAX_PORTS (8). They are comma delimited

insmod ip_vs_ftp.ko ports=21,8021,9021

If the ftp helper module doesn't load, maybe you have an old version of ipvsadm? ftp is running on a port other than 21? The module couldn't be found by modprobe for some reason?

Eric: with the ftp helper loaded, the ftp-data packets arriving at the client have src_addr=VIP (the expected behaviour).

Joe - The 2.2.x ftp module is only available as a module (i.e. it can't be built into the kernel).

Juri Haberland juri (at) koschikode (dot) com 30 Apr 2001

AFAIK the IP_MASQ_* parts can only be built as modules. They are automagically selected if you select CONFIG_IP_MASQUERADE.

Julian Anastasov May 01, 2001

Starting from 2.2.19 the following module parameter is required:

modprobe ip_masq_ftp in_ports=21

Joe

I don't see this mentioned in /usr/src/linux/Documentation, ipvs-1.0.7-2.2.19/Changelog, google or dejanews. Is this an ip_vs feature or is it a new kernel feature?

ratz

I see info only in the source. This is a new 2.2.19 feature. It's /usr/src/linux/net/ipv4/ip_masq_ftp.c:

 * Multiple Port Support
 *      The helper can be made to handle up to MAX_MASQ_APP_PORTS (normally 12)
 *      with the port numbers being defined at module load time.  The module
 *      uses the symbol "ports" to define a list of monitored ports, which can
 *      be specified on the insmod command line as
 *              ports=x1,x2,x3...
 *      where x[n] are integer port numbers.  This option can be put into
 *      /etc/conf.modules (or /etc/modules.conf depending on your config)
 *      where modload will pick it up should you use modload to load your
 *      modules.
 * Additional portfw Port Support
 *      Module parameter "in_ports" specifies the list of forwarded ports
 *      at firewall (portfw and friends) that must be hooked to allow
 *      PASV connections to inside servers.
 *      Same as before:
 *              in_ports=fw1,fw2,...
 *      Eg:
 *              ipmasqadm portfw -a -P tcp -L a.b.c.d 2021 -R 192.168.1.1 21
 *              ipmasqadm portfw -a -P tcp -L a.b.c.d 8021 -R 192.168.1.1 21
 *              modprobe ip_masq_ftp in_ports=2021,8021

And it is a new kernel feature, not LVS feature.

what are these modules for: from ipvsadm(8) (ipvs 0.2.11)

If a virtual service is to handle FTP connections then persistence must be set for the virtual service if Direct Routing or Tunnelling is used as the forwarding mechanism. If Masquerading is used in conjunction with an FTP service than persistence is not necessary, but the ip_vs_ftp kernel module must be used. This module may be manually inserted into the kernel using insmod(8)

The modules are NOT used for LVS-DR or LVS-Tun: in these cases persistence is used (or fwmarks version of persistence).

Joe 23 May 2001:

I run these rules on the director (without the ftp module) and ftp works fine

$ ipchains -A forward -p tcp -j MASQ -s RIP ftp -d 0.0.0.0/0
$ ipchains -A forward -p tcp -j MASQ -s RIP ftp-data -d 0.0.0.0/0
$ ipchains -A forward -p tcp -j MASQ -s RIP 1024:65535 -d 0.0.0.0/0

Julian - these rules are risky. What happens with ICMP? It is not masqueraded. I hope there is a similar rule for ICMP.

Note
Joe Dec 2006 - We're a little more careful nat'ing out clients running on the realservers now. We'd at least make sure the packets came out with src_addr=VIP.

Stephane Klein

I've tried to use your example to setup active and passive FTP. I can authenticate, but i can't list or send data. I can see packet in the conntrack file that with dport=20, but the ftp server tried to send a SYN_SENT and have no reply.

ip_vs_ftp is loaded as module, ip_nat_ftp and ip_conntrack_ftp are in the kernel. I used iptables rules of your example in the HOWTO.

I saw this article where you said it's necessary to patch the kernel to work with ip_nat_ftp (http://www.in-addr.de/pipermail/lvs-users/2004-June/011955.html) That patch is for kernel 2.6.5. Is this patch included in your nfct patch or is it necessary to apply this patch?

Julian 29 Aug 2004

Yes, it is needed if you are loading ip_nat_ftp. I didn't received any replies from the netfilter coreteam about this patch, so I just linked it to the web site: ip_nat_ftp-2.6.5-1.diff

There are problems with the helper module approach for ftp, since there is no agreement amongst ftpd code authors about the responses given. To help passive ftp, the ip_vs_ftp module looks for the response

227 Entering Passive Mode

from the ftpd. Postings to the LVS mailing list (starting with a posting by Tom Cronin on LVS-NAT ftp), show that this response is not universal for ftpds. As well Rutger van Oosten found for passive ftp, that the ftpd must be set to listen on the correct IP.

15.3.1. For active ftp, the helper module expects ftp-data=20 (problems with vsftp)

Mark de Vries found that his ftp LVS-NAT didn't work, the reason being that the ftp helper module wasn't forwarding the reply packets from the ftp-data port (usually port 20). On further exploration, Mark found that the ftpd (the GPL'ed vsftp) wasn't using the standard ftp-data port, but was using a high (>1024) port, thus allowing the ftpd to run with lower privileges (vsftp can be setup to run with the standard ftp-data port). Currently the ftp helper expects ftp-data=20. We're working on a fix for this. Here's the discussion so far.

Mark de Vries markdv (dot) lvsuser (at) asphyx (dot) net 25 Nov 2005

Problem found... The thing is that ip_vs(_ftp) seems to assume that the ftp-data connection will be initiated from port 20. Seems like a valid assumption... But unfortunately this is not always the case... the vsftpd I was testing with was configured to "connect_from_port_20=NO" by default. Once I swithched to "=YES" active FTP worked fine. Otherwise I just used some SNAT rules on the director. So.... Now the question is: is this a vsftpd 'problem'? MUST ftp-data connections originate from port 20? Or should this assumption be relaxed?

Aparently the iptables contrack_ftp module does not assume it; Connections from ports other then 20 are considered "RELATED". (I have not checked the src or debugged anything, I just observed that this type of connection is indeed matched by a "RELATED" rule in my own iptables setup.)

I don't think adding an option --data-port="some_number" to the ftp helper would get us anywhere - the src port is not always the same. vsftpd (probably) just connects without binding to a specific port, just getting a random one in the ip_local_port_range... Is there anything against not matching on the src port like the ip_contrack(_ftp) stuff, i.e. matching/finding the source port on the fly?

vsftp has passive ftp (pasv_enable = YES). A lot of clients will default to passive mode or fallback to it if active does not seem to be working. which is probably the main reason I've had relatively few complaints about active ftp not working.

As far as I understands the RFC leaves no room for a different src port for the data connection. It's not fixed at 20 but should be 1 below the controll port. Which is what ip_vs uses literally IIRC.

ip_vs_ftp and ip_conntrack_ftp do much of the same thing. The only difference is that in iptables you need an explicit rule to handle the connection entries created, when in ipvs they are allways used. The real difference is only in the details of the connection entry they create. In ipvs there is the assumption/requirement that the connection will originate from port 20 (assuming the ftpd is listening on port 21). The ip_contrack_ftp module (aparently) does not make this assumption. Taking the RFC as a guide the assumption is of course valid.

15.3.2. Graeme Fowler's checklist for ftp

Graeme Fowler graeme (at) graemef (dot) net 23 Aug 2006

  • Ensure the LVS FTP helper is loaded.
  • Make sure that you define (or make a note of) the range of ports your FTP server uses for data connections (this varies from server to server).
  • Ensure that you will accept traffic to those ports on your director. If the packets are rejected by netfilter/iptables on the director, the FTP helper never sees them so the connections will almost never work.

15.3.3. LVS-NAT, 2.2.x director

I found that ftp worked just fine without the module for 2.2.x (1.0.3-2.2.18 kernel). (see discussion following Mark de Vries comments in the ftp helper section above for a possible explanation.)

15.3.4. LVS-NAT, 2.4.x director

For 2.4.x you can connect with ftp without any extra modules, but you can't "ls" the contents of the ftp directory. For that you need to load the ip_vs_ftp module. Without this module, your client's screen won't lock up, it just does nothing. If you then load the module, you can list the contents of the directory.

15.3.5. LVS-DR, LVS-Tun

For LVS-DR, LVS-Tun active ftp needs persistence. Otherwise it does not work, with or without ip_masq_ftp loaded. You can login, but attempting to do a `ls` will lockup the client screen. Checking the realserver, shows connections on ports 20,21 to paired ports on the client.

15.4. ftp (active) - the classic command line ftp

This is a 2 port service.

  • port 20 calls - data (files transferred in either direction, and the output of the listing from ls command)
  • port 21 listens - commands (e.g.user, pass, ls)

Here's part of my /etc/services

ftp-data         20/tcp    #File Transfer [Default Data]
ftp-data         20/udp    #File Transfer [Default Data]
ftp              21/tcp    #File Transfer [Control]
ftp              21/udp    #File Transfer [Control]

To setup ftp with LVS, you schedule only port 21 for forwarding. While the realserver is listening on port 21, it calls the client from port 20 (i.e. it's not listening on port 20) rather than the client calling the realserver (through the director). You do not add entries for port 20 with ipvsadm. Port 20 is handled by persistence for LVS-DR and LVS-Tun. For active ftp with LVS-NAT, you don't need the ipvs ftp helper module (the ftp helper module is only needed for passive ftp, Wensong 17 Sep 2002) (however see ftp helper module.

15.4.1. session: active ftp (no LVS)

Here's a standard non-LVS active ftp session using phatcat. The ftp "client" machine (192.168.1.254) connects to the ftp server machine "sneezy" (192.168.1.11). Since two ports are involved, phatcat is run from two windows, xterm_1, xterm_2.

xterm_1:

client:~# phatcat sneezy 21
sneezy.mack.net [192.168.1.11] 21 (ftp) open
220 sneezy.mack.net FTP server (Version wu-2.4.2-academ[BETA-15](1) Wed May 20 13:45:04 CDT 1998) ready.
help
214-The following commands are recognized (* =>'s unimplemented).
   USER    PORT    STOR    MSAM*   RNTO    NLST    MKD     CDUP
   PASS    PASV    APPE    MRSQ*   ABOR    SITE    XMKD    XCUP
   ACCT*   TYPE    MLFL*   MRCP*   DELE    SYST    RMD     STOU
   SMNT*   STRU    MAIL*   ALLO    CWD     STAT    XRMD    SIZE
   REIN*   MODE    MSND*   REST    XCWD    HELP    PWD     MDTM
   QUIT    RETR    MSOM*   RNFR    LIST    NOOP    XPWD
214 Direct comments to [email protected].
user ftp
331 Guest login ok, send your complete e-mail address as password.
pass mack
230 Guest login ok, access restrictions apply.

On the client, use netstat -an to find the highest unprivileged port in use (in this case port 1029).

xterm_2: tell the client to listen on the first unused port (here 1030).

client:~# phatcat -l -p 1030

xterm_1: tell the ftpserver to connect to client:1030 (192,168,1,254,256,6) (1030=256x4 + 6), and then list the contents of the directory

port 192,168,1,254,4,6
200 PORT command successful.
list
150 Opening ASCII mode data connection for /bin/ls.
226 Transfer complete.

xterm_2: receives the output of list.

connect to [192.168.1.254] from (UNKNOWN) [192.168.1.11] 20
total 9
drwxr-xr-x   8 root     root        1024 Nov  6 20:15 .
drwxr-xr-x   8 root     root        1024 Nov  6 20:15 ..
drwxr-xr-x   2 root     root        1024 Apr  7  1998 bin
drwxr-xr-x   2 root     root        1024 Aug 30  1993 etc
drwxr-xr-x   2 root     root        1024 Dec  3  1993 incoming
drwxr-xr-x   2 root     root        1024 Nov 17  1993 lib
drwxr-xr-x   2 root     root        1024 Jun  4  2001 pub
-rw-r--r--   1 root     root           0 Oct 24 13:24 this_is_sneezy
drwxr-xr-x   3 root     root        1024 Aug 30  1993 usr
-rw-r--r--   1 root     root         312 Aug  1  1994 welcome.msg

The ftpserver then closes the connection from port 21 (i.e. you can't do a second listing).

xterm_1:

list
425 Can't build data connection: Connection refused.

xterm_2: on the ftp client, initiate another listener (on the next unused port).

client:~# phatcat -l -p 1033

xterm_1: tell the ftp server to connect to client:1033 (1033 = 256 x 4 + 9), prepare for upload of an ascii file (type a), check the size of the file (size welcome.msg about to be downloaded, then retreive it (retr welcome.msg). (ftp server will then close connection from port 20.)

port 192,168,1,254,4,9
200 PORT command successful.
type a
200 Type set to A.
size welcome.msg
213 317
retr welcome.msg
150 Opening ASCII mode data connection for welcome.msg (312 bytes).
226 Transfer complete.

xterm_2: watch welcome.msg being delivered.

connect to [192.168.1.254] from (UNKNOWN) [192.168.1.11] 20
Welcome, archive user!  This is an experimental FTP server.  If have any
unusual problems, please report them via e-mail to root@%L
If you do have problems, please try using a dash (-) as the first character
of your password -- this will turn off the continuation messages that may
be confusing your ftp client.

xterm_1:say goodbye (the data connection has closed, so you can't list using the same connection).

list
425 Can't build data connection: Connection refused.
quit
221 Goodbye.

15.4.2. session: active ftp, one network LVS-DR with no persistence (this is NOT going to work)

The example illustrates what happens with active ftp on LVS-DR without persistence (it is not going to work). Set up a working one network LVS-DR (i.e. all IPs are in the same network), add rules to forward ftp

Note
Here you are only running commands to forward port 21. You have not handled the data port 20 in any way.

pip:/etc/lvs# ipvsadm -A -t lvs.mack.net:ftp -s rr
pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r bashfull.mack.net -g -w 1
pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r sneezy.mack.net -g -w 1
pip:/etc/lvs# ipvsadm
IP Virtual Server version 0.9.4 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  lvs.mack.net:ftp rr
  -> sneezy.mack.net:ftp          Route   1      0          0
  -> bashfull.mack.net:ftp        Route   1      0          0

Use phatcat (as above) to connect attempt to setup an ftp session with the VIP.

xterm_1:connect to VIP:ftp

client:~# phatcat lvs 21
lvs.mack.net [192.168.1.110] 21 (ftp) open
220 sneezy.mack.net FTP server (Version wu-2.4.2-academ[BETA-15](1) Wed May 20 13:45:04 CDT 1998) ready.
user ftp
331 Guest login ok, send your complete e-mail address as password.
pass mack
230 Guest login ok, access restrictions apply.

With netstat -an on the realserver, note that the client is connected to VIP:21, not to RIP:21.

xterm_2:listen on the next available port

client:~# phatcat -l -p 1036

xterm_1:tell the realserver to connect to client:1036, and then list the contents of /home/ftp. (The connection hangs for a while - eventually you'll get the 425 message).

port 192,168,1,254,4,12
200 PORT command successful.
list
425 Can't build data connection: Connection timed out.

On the realserver, netstat -an shows

sneezy:/home/ftp# netstat -an | grep 103
tcp        0      1 192.168.1.110:20        192.168.1.254:1036      SYN_SENT
tcp        5      0 192.168.1.110:21        192.168.1.254:1035      ESTABLISHED

On the client, netstat -an shows that client is listening, but not connecting

client:~# netstat -an | grep 103
tcp        0      0 0.0.0.0:1036            0.0.0.0:*               LISTEN
tcp        0      0 192.168.1.254:1035      192.168.1.110:21        ESTABLISHED

following the list, if you run tcpdump on the realserver when you run the list command, you'll see that the realserver is sending SYN packets from VIP:20->client:1036 but not receiving any replies. The problem is that the ACK from the client is sent to VIP:20 which is routed to the director, which has no forwarding rules for VIP:20. Even if the director had forwarding rules for VIP:20, it requires the first packet in a connection to be a SYN, to start the process of making an entry in the ipsvadm table for packets to port 20. Thus the director will reject the ACK from the client to VIP:20 and no connection will be made.

15.4.3. session: active ftp, one network LVS-DR with persistence

This is the normal method of setting up LVS-DR for ftp.

pip:/etc/lvs# ipvsadm -A -t lvs.mack.net:ftp -s rr -p 600
pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r  bashfull.mack.net -g -w 1
pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r  sneezy.mack.net -g -w 1
pip:/etc/lvs# ipvsadm
IP Virtual Server version 0.9.4 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  lvs.mack.net:ftp rr persistent 600
  -> sneezy.mack.net:ftp          Route   1      0          0
  -> bashfull.mack.net:ftp        Route   1      0          0

15.5. ftp (passive)

Passive ftp is used by netscape to get files from an ftp url like ftp://ftp.domain.com/pub/ . Here's an explanation of passive ftp from http://www.tm.net.my/learning/technotes/960513-36.html

If you can't open connections from Netscape Navigator through a firewall to ftp servers outside your site, then try configuring the firewall to allow outgoing connections on high-numbered ports.

Usually, ftp'ing involves opening a connection to an ftp server and then accepting a connection from the ftp server back to your computer on a randomly-chosen high-numbered telnet port. the connection from your computer is called the "control" connection, and the one from the ftp server is known as the "data" connection. All commands you send and the ftp server's responses to those commands will go over the control connection, but any data sent back (such as "ls" directory lists or actual file data in either direction) will go over the data connection.

However, this approach usually doesn't work through a firewall, which typically doesn't let any connections come in at all; In this case you might see your ftp connection appear to work, but then as soon as you do an "ls" or a "dir" or a "get", the connection will appear to hang.

Netscape Navigator uses a different method, known as "PASV" ("passive ftp"), to retrieve files from an ftp site. This means it opens a control connection to the ftp server, tells the ftp server to expect a control connection to the ftp server, tells the ftp server to expect a second connection, then opens the data connection to the ftp server itself on a randomly-chosen high-numbered port. This works with most firewalls, unless your firewall retricts outgoing connections on high-numbered ports too, in which case you're out of luck (and you should tell your sysadmins about this).

"Passive FTP" is described as part of the ftp protocol specification in RFC 959 ("http://www.cis.ohio-state.edu/htbin/rfc/rfc959.html").

If you are setting up an LVS ftp farm, it is likely that users will retrieve files with a browser and you will need to setup the LVS to handle passive ftp. You will need the ftp helper module or persistent connection (also see on the LVS website under documentation; persistence handling in LVS) or fwmark persistent connection for ftp.

For passive ftp, the ftpd sets up a listener on a high port for the data transfer. This problem for LVS is that the IP for the listener is the RIP and not the VIP.

Wenzhuo Zhang 1 May 2001

I've been using 2.2.19 on my dialup masquerading box for quite some time. It doesn't seem to me that the option is required, whether in PASV or PORT mode. We can actually get ftp to work in NAT mode without using the ip_masq_ftp module. The trick is to tell the real ftp servers to use the VIP as the passive address for connections from outside; e.g. in wu-ftpd, add the following lines to the /etc/ftpaccess:

passive address RIP <localnet>
passive address 127.0.0.1 127.0.0.0/8
passive address VIP 0.0.0.0/0

Of course, the ftp virtual service has to be persistent port 0.

Alois Treindl, 3 May 2001

I found (with kernel 2.2.19) that I needed the command

modprobe ip_masq_ftp in_ports=21

so that (passive mode) ftp from Netscape would work. without the in_ports=21 it did not work.

Julian Anastasov ja (at) ssi (dot) bg 03 May 2001

Yes, it seems this option is not useful for the active FTP transfers because if the data connection is not created while the client's PORT command is detected in the command stream, then it is created later when the internal realserver creates normal in->out connection to the client. So, it is not a fatal problem for active FTP to avoid this option. The only problem is that these two connections are independent and the command connection can die before the data connection, for long transfers. With the in_ports option used this can not happen.

Note
Joe - in previous HOWTOs I had a comment from Julian saying that the ftp helper was "recommended" for active ftp (presumably not required). Presumably this is what he's talking about.

The fatal problems come for the passive transfers when the data connection from the client must hit the LVS service. For this, the ip_masq_ftp module must detect the 227 response from the realserver in the in->out packets and to open a hole for the client's data connection. And the "good" news is that this works only with in_ports/in_mark options used.

Alois

on option so that I could configure on the server that it gives the VIP to clients making a PASV request; it always gives the realserver IP address in replies to such requests.

Bad ftpd :) It seems the follwing rules are valid:

  • active ftp always works through stupid balancers (for external clients) that have minimum support for masquerading, with some drops in the command connection
  • passive ftp always works through stupid masq boxes (for internal clients). The passive ftp setup is useful because the data connection can be marked as a slave to the command connection and in this way avoid connection reconnects.

15.5.1. passive ftp client/server miss-match with LVS-NAT

Jeremy Kusnetz:

although Julian says that all you need for ftp with LVS-NAT is the ip_masq_ftp module, it doesn't work for me (director 2.2.19-1.0.7 with ip_masq_ftp in_ports=21) my ftp client just hangs.

Julian

The Netfilter guys use another approach when detecting the 227 message in Linux 2.4, i.e. they try to ignore the message and to use only the code (I'm not sure what is the final status of this handling there). But in Linux 2.2 the word "Entering" may be a requirement :( You have to select another FTPd, IMO.

Jeremy Kusnetz JKusnetz (at) nrtc (dot) org 24 May 2001

It was my ftp server. When going into passive mode it said:

   Passive mode on (x,x,x,x,x,x)

instead of:

   Entering Passive Mode (x,x,x,x,x,x)

15.6. ftp helper bug(s)

In early 2005 Johan van den Berg, and Simon Schwendemann sent a report of a problem with LVS-NAT (2.4.x) where the ACK reply to a SYN would not be source-NAT'ed and so would emerge with src_addr=RIP and not src_addr=VIP. (http://archive.linuxvirtualserver.org/html/lvs-users/2005-02/msg00299.html) Johan van den Berg switched to using LVS-DR. http://archive.linuxvirtualserver.org/html/lvs-users/2005-02/msg00299.html

Even to figure this out took a while. Initially only one in 60 or so SYNs would have the problem. No-one had any idea what the problem was and cries for help were greeted by silence. Then Jari Takkala Jari (dot) Takkala (at) Q9 (dot) com 15 Aug 2005 found it only occured when the LVS-NAT was forwarding ftp, but the problem occured on all VIPs, not just the VIP that had the ftp service.

With Jari's posting, other people started to recognise the problem too.

Graeme Fowler graeme (at) graemef (dot) net 16 Aug 2005

This is very interesting; I have a number of clusters behind LVS-NAT and hadn't managed to observe myself that the one having problems - which I posted about sometime in the last year - is the only one of the whole lot which has ip_vs_ftp loaded. It's also a 2.4.x kernel, and can't be in-service upgraded.

Julian Anastasov ja (at) ssi (dot) bg 26 Aug 2005

I can not reproduce it, I tried with 2.4.32-pre3 as it contains some changes. Can you show your vs settings?:

grep . /proc/sys/net/ipv4/vs/*

So, you don't have any iptables rules, fwmarking, NAT or linux ethernet bridging? Any extra patches for IPVS?

From your explanation ip_vs_ftp leads to problems where SYN creates web connection, it is hashed in table, DNAT-ed to RS, then RS replies SYN+ACK which can not match the connection in table. It looks like this connection is not present (may be removed, do you see something in debug logs from the SYN to the SYN+ACK) or the hash table is damaged. Do you still think it is caused by ip_vs_ftp? About your tests, is the client IP on lan? Do you think this client IP has many connections to the director?

Jari (data dumps omitted)

The client IP is not on the LAN. The problem occurs from any source IP trying to visit a load balanced VIP. Whenever we add the FTP service to ipvsadm, and begin load balancing to it, the problem begins to occur on all services. However, it is not consistent. Some outgoing SYN+ACK packets will get translated correctly for a certain period of time, then after awhile some packets will not be translated. I do not think it is load related. We have other load balancers built from the same image handling many more connections.

There were various discussions (under the title "LVS bugs") between Julian and Agostino di Salle a (dot) disalle (at) fineco (dot) it that you can find in the archives if you want to know more.

Julian

As reported from some users, the ip_nat_ftp module causes some problems with other virtual services. ip_nat_ftp can keep ip_vs_conn_no_cport_cnt > 0 for the time it expects connections from unknown client ports. This is fatal for the persistence services as the normal packets start to hit persistence templates instead of valid connections. Such packets are correctly forwarded to real servers but the reply packets do not see connections as they are not created. As result, the reply packets are not SNAT-ed by the IPVS code.

It is enough to have passive FTP connection that waits to learn its client port to trigger problems with non-ftp persistent services. The used VIPs do not matter.

I tried to fix this problem with the following patch: Linux 2.6.13: http://www.ssi.bg/~ja/tmp/ipvs-2.6/ct-2.6.13-1.diff, Linux 2.4.32-pre3: http://www.ssi.bg/~ja/tmp/ipvs-2.4/ct-2.4.32-pre3-1.diff

These patches do the following:

  • introduce IP_VS_CONN_F_TEMPLATE connection flag to mark the connection as template
  • create new connection lookup function just for templates: ip_vs_ct_in_get
  • make sure ip_vs_conn_in_get hits only connections with IP_VS_CONN_F_NO_CPORT flag set when s_port is 0. By this way we avoid returning template when looking for cport=0 (ftp)

There is a second patch that properly invalidates the templates as Agostino di Salle noticed: Linux 2.6.13: http://www.ssi.bg/~ja/tmp/ipvs-2.6/invct-2.6.13-1.diff Linux 2.4.32-pre3: http://www.ssi.bg/~ja/tmp/ipvs-2.4/invct-2.4.32-pre3-1.diff

I performed simple tests, so please test these patches, for example, persistence+ip_nat_ftp, the ip_vs_sync code is changed too. If there is a better solution please speak before including them in next kernel releases. I'm expecting confirmation from people with the problem that reply packets were not translated from IPVS.

Jari Takkala Jari (dot) Takkala (at) Q9 (dot) com 9 Sep 2005

We applied these patches to a production load balancer on kernel 2.4.26. Our IPVS code is one version behind, however the patches applied cleanly. We began load balancing FTP last night, and so far everything is working properly. Thanks very much for your help!

The patches worked from Graeme Fowler too.

Julian thinks this problem has been affecting people for a while.

Julian Anastasov ja (at) ssi (dot) bg 12 Sep 2005

thanks to Graeme and to Jari for the tests. It seems the problems reported from many users in last 2 years and more are now fixed.

15.7. ftp is difficult to secure

Roberto Nibali ratz (at) tac (dot) ch 06 May 2001

If you are trying to secure the LVS using the LVS as a packetfilter, will have no big success in doing it for the ftp protocol, because it is so open. You can do a lot to minimize full breaches. At least put the ftp daemon in a chroot environment.

We have multiple choices if we want to narrow down the input ipchains rules on the front interface of director

  • Use ftp via LVS. (this is not a solution actually, we still need special input rules on the EXT_IF for 1024:65535)
  • Use ftp without LVS but with SNAT. (difficult to setup)
  • Use SuSE ftp proxy suite
  • Use 2.4 kernel and ip_conntrack_ftp (don't know much about this, ask Rusty)
  • Don't use ftp at all (this is what we want)
  • The ftpfs project. I haven't fully tested it and it's a very dangerous approach but it is worth to a look.

The biggest problem is with the ip_masq_ftp module. It should create an ip_fw entry in the masq_table for the PORT port. It doesn't do this and we have to open the whole port range. For PASV we have to DNAT the range.

ipchains -A forward -i $EXT_IF -s $INTERNAL_NET $UNPRIV_PORTS -d $DEP -j MASQ

FTP is made up of two connections, the Control- and the Data- Connection.

  • ftp Control Connection

    The Client contacts the Servers port 21 from an UNPRIV Port. No trouble, standard, plain, vanilla TCP-Connection, we all love it. Over this connection the client sends commands to the server. We will see examples later.

  • FTP Data Connection

    "Data" can be either the content of a file (sent as e.g. the result of a "get" or "put" command) or the content of a directory-listing (i.e. the result of a "ls" or "dir" command).

    The data connection is where the trouble starts. To transfer data, a second connection is opened.

    Usually the client opens this second connection to the server. But for active ftp, the server opens this second connection, using the well-known port 20 (called ftp-data) as sourceport. But which port on the client should he connect to? The client announces the port via a "port"-command over the control connection. This is nasty: Ports are negotiated on application-level where L4 switches like LVS can see what's going on.

    For passive ftp, the server announces the port the client should connect to in its reply to the client's "pasv"-command (this command starts passive FTP, active is the default). The client then opens the data-connection to the server. The port that the server listens on is an unprivileged port (rather than a privileged port as is normal for internet services). A passive ftp transfer then requires that connections be allowed between all 63000 unprivileged ports on both the client and realservers rather than just one. A passive ftp server is difficult to secure with packet filter rules.

If we have to protect a client, we would like to only allow passive ftp, because then we do not have to allow incoming connections. If we have to protect a server, we would like to only allow active ftp, because then we only have to allow the incoming control-connection. This is a deadlock.

15.7.1. Example ftp sessions with phatcat

We need 2 xterms (x1, x2), fatcat and an ftp-server (here "ftpserver" 172.23.2.30).

First passive mode (because it is conceptionally easier)

#x1: Open the control-connection to the server,
#and sent the command "pasv" to the server.
$ phatcat ftpserver 21
220 ftpserver.terreactive.ch FTP server (Version 6.4/OpenBSD/Linux-ftpd-0.16) ready.
user ftp
331 Guest login ok, send your complete e-mail address as password.
pass ftp
230 Guest login ok, access restrictions apply.
pasv
227 Entering Passive Mode (172,23,2,30,169,29)

The server replied with 6 numbers:

  • 172,23,2,30 is the IP I have to connect to
  • (169*256+29=43293) is the Port

In x2 I open a second connection with a second phatcat

$ phatcat 172.23.2.30 43293
# x2 will now display output from this connection

Now in x1 (the control-connection)

$ list
list
150 Opening ASCII mode data connection for '/bin/ls'.
226 Transfer complete.

and in x2 the listing appears.

Active ftp

I use the same control-connection in x1 as above, but I want the server to open a connection. Therefore I first need a listener. I do it with phatcat in x2:

$ phatcat -l -p 2560

Now I tell the server on the control connection to connect (2560=10*256+0)

port 172,23,2,8,10,0
200 PORT command successful.

Now you see, why I used port 2560. 172.23.2.8 is, of course, my own IP-address. And now, using x1, I ask for a directory-listing with the list command, and it appears in x2. For completeness sake, here is the the full in/output.

First the xterm 1:

phatcat ftpserver 21
220 ftpserver.terreactive.ch FTP server (Version 6.4/OpenBSD/Linux-ftpd-0.16) ready.
user ftp
331 Guest login ok, send your complete e-mail address as password.
pass ftp
230 Guest login ok, access restrictions apply.
pasv
227 Entering Passive Mode (172,23,2,30,169,29)
list
150 Opening ASCII mode data connection for '/bin/ls'.
226 Transfer complete.
port 172,23,2,8,10,0
200 PORT command successful.
list
150 Opening ASCII mode data connection for '/bin/ls'.
226 Transfer complete.
quit
221 Goodbye.

xterm 2:

phatcat 172.23.2.30 43293
total 7
dr-x--x--x   2 root     root         1024 Jul 26  2000 bin
drwxr-xr-x   2 root     root         1024 Jul 26  2000 dev
dr-x--x--x   2 root     root         1024 Aug 20  2000 etc
drwxr-xr-x   2 root     root         1024 Jul 26  2000 lib
drwxr-xr-x   2 root     root         1024 Jul 26  2000 msgs
dr-xr-xr-x  11 root     root         1024 Mar 15 14:26 pub
drwxr-xr-x   3 root     root         1024 Mar 11  2000 usr
phatcat -l -p 2560
total 7
dr-x--x--x   2 root     root         1024 Jul 26  2000 bin
drwxr-xr-x   2 root     root         1024 Jul 26  2000 dev
dr-x--x--x   2 root     root         1024 Aug 20  2000 etc
drwxr-xr-x   2 root     root         1024 Jul 26  2000 lib
drwxr-xr-x   2 root     root         1024 Jul 26  2000 msgs
dr-xr-xr-x  11 root     root         1024 Mar 15 14:26 pub
drwxr-xr-x   3 root     root         1024 Mar 11  2000 usr

15.7.2. mail on securing ftp

Joe

I see that ftp is hard to make secure and your prime recommendation is to have an ftp server isolated from all other machines. Do you recommend that people not use ftp and say instead use http for LVSs that are delivering files? I don't like http for file download. At home (28k phone ppp link) if I do anything else over the line (like load a webpage) while doing a download, the download stalls and doesn't start up again. This is pain as a 10M file takes 2hrs and I have to start again.

Joe Cooper joe (at) swelltech (dot) com 07 May 2001

wget -c http://url

will solve that problem.

sftp is now available as part of the openssh packages I believe, but requires clients to have a recent version of openssh -- probably not what folks want if they have enough clients to justify an LVS cluster. I don't think LVS really has anything to do with whether someone should use ftp for security reasons or not. Securing ftp is a separate issue from securing LVS.

15.8. ftps (ssl based ftp), tcp 21, 22?

Note
This is not ftp port forwarded through ssh (see port forwarded ftp), nor is it sftp.

From Ratz, 30 Nov 2003, see http://www.stunnel.org/examples/ftp.html FTP+SSL, FTP+TLS. There are two deprecated methods of doing SSL+FTP. Make sure that what you're doing and talking about is http://www.ietf.org/rfc/tfc2228.txt RFC228 ftps. The session starts by the client connecting to port 21 and issuing the "PROT P" command. Quite what happens after that I don't know (which ports, are the packets encrypted?).

Kai

I am using LVS/NAT with ssl based ftp. I can ftp via realserver by using either port mode or passive mode.

ratz 29 Nov 2003

Over the director, correct?

For security reasons SSL based ftp was required. After adding ssl based ftp auth to the realservers, the client computers cannot connect to the realserver with passive mode, but port mode works well.

IIRC you need to load balance port 22 too.

I think the problem is ,data which ftp server send to client include the server's passive port was crypted by ssl. so the LVS don't know which port should be translate and open.

AFAICR this isn't the issue. The client receives the PASV command and then translates the PORT into a local ssh tunnel forward. So I think you have to also load balance port 22 TCP. You can use the port 0 feature :).

Kai reposted this on 19 Feb 2004

I think the problem is, the data which ftp server sends to the client includes the server's passive port was crypted by SSL. So the LVS don't know which port should be translated and opened.

Horms 20 Feb 2004

Yes, that sounds likely. Try tracing the traffic using something like ngrep.

Does LVS support the SSL based FTP? If not, is there any solution?

If your guess is correct, then no. Well, not unless you get the linux director to handle the ssl and just talk plain-text to the real-servers, but then that isn't LVS.

15.9. dns, tcp/udp 53 (and dhcpd server 67, dhcp client 68)

(from the IPCHAINS-HOWTO) DNS doesn't always use UDP; if the reply from the server exceeds 512 bytes, the client uses a TCP connection to port number 53, to get the data. Usually this is for a zone transfer.

The name resolution (ulink url="LVS-HOWTO.services.general.html#name_resolution") process is broken. It's possible for a client (resolver) to get a reply from a hung nameserver which it interprets as "no resolution for that name", rather than allowing the client to go on to the next nameserver in the list. This is a design flaw that will take some fixing (all the clients and all the nameservers must be fixed). DNS should have it's own failover mechanism, but it doesn't. In the meantime, some other failover mechanism will have to present a perfect nameserver to the clients.

There is no consensus amongst people running LVS as to whether it's best to have named/bind LVS'ed or to just to have a set of machines in a failover setup (Horms, 2 Oct 2006, is of the opinion than named can't be load balanced by nature of the protocol). If you're running DNS in a failover setup, you might think that you could have one primary machine and a secondary machine and that on failover of the primary you could promote the secondary to be the primary. By design of DNS, there can only be one primary machine. The primary and secondary have different config files and it's not simple to programatically switch the secondary into the primary role (it can be done, it just requires some thinking). A failover DNS setup then requires two machines with identical config files, one as the master and one as the backup. However if your have dhcpd running on the network, the primary name server machine will be updated continuously with the addresses from the dhcpd, which the backup primary will not get. On failover, you will loose name resolution on these addresses until they renew their lease.

If you are going to LVS named, and are running LVS-DR or LVS-Tun, as usual make sure your named is listening on the VIP (not the RIP).

dhcpd has its own failover/redundancy mechanism. You can't LVS a dhcpd server - it has a database of its leases and no other machine can have the same list. dhcpd can be setup with multiple dhcpd servers on the same network and they pass the updates to each other. Unfortunately it doesn't work - you get to a stage where one machine will mistakenly think that another machine is incharge of all the IPs and both machines refuse to answer requests. The problem has been posted to the dhcpd mailing list for several years without any answers from the dhcpd authors. The only thing to do when this happens is to kill all the dhcpd servers, erase the lease table files, touch new ones, and start the servers again. I went back to having only one dhcpd server and left the other one turned off waiting as a backup.

Simon Pearce sp (at) http (dot) net, 27 Nov 2006, reported that on adding more VIPs to his LVS'ed DNS server (upto 250 VIPs), that name resolution would slow down, timeout, or stop for some domains. The load average was 1-2 and the individual realservers gave valid replies when inaccessible via the LVS/VIP (so the realservers were still working). Wayne wayne (at) compute-aid.com who has a financial interest (works for?) Webmux, posted that only Webmux has solved this problem, but didn't give any details. At the moment this stands as an uncorroborated statement by Wayne, but Simon didn't get his multi-VIP DNS server to work. We didn't find if Simon was correctly nat'ing out all his calls from the realserver (see masquerading clients on realservers through multiple VIPs). Presumably the 53-tcp/udp replies and calls to forwarders (from the RIP?) were being correctly nat'ed. A solution was posted by Graeme Fowler where the IPs are fwmark'ed and the LVS balances the fwmark, but this was for a smaller number of IPs and Graeme didn't know if it would work for 250IPs.

It should be possible to serve DNS using only 1 IP, for any number of domains, so why was Simon using 250 IPs? Simon explained that it was so that each customer would think they had their own DNS machine.

This setup for LVS'ing named is from Ted Pavlic. Two (independant) connections, tcp and udp to port 53 are needed.

Here is part of an lvs.conf file which has dns on two realservers.

#dns, note: need both udp and tcp
#A realserver must be able to determine its own name.
#(log onto machine from console and use nslookup
# to see if it knows who it is)
# and to do DNS on the VIP and name associated with the VIP
#To test a running LVS, on client machine, run nslookup and set server = VIP.
SERVICE=t dns wlc 192.168.1.1 192.168.1.8
SERVICE=u dns wlc 192.168.1.1 192.168.1.8

If the LVS is run without mon, then any setup that allows the realservers to resolve names is fine (ie if you can sit at the console of each realserver and run nslookup, you're OK).

If the LVS is run with mon (e.g. for production), then dns needs to be setup in a way that dns.monitor can tell if the LVS'ed form of dns is working. When dns.monitor tests a realserver for valid dns service, it first asks for the zone serial number from the authoritative (SOA) nameserver of the virtualserver's domain. This is compared with the serialnumber for the zone returned from the realserver. If these match then dns.monitor declares that the realserver's dns is working.

The simplest way of setting up an LVS dns server is for the realservers to be secondaries (writing their secondary zone info to local files, so that you can look at the date and contents of the files) and some other machine (e.g. the director) to be the authoritative nameserver. Any changes to the authoritative nameserver (say the director) will have to be propagated to the secondaries (here the realservers) (delete the secondary's zone files and HUP named on the realservers). After the HUP, new files will be created on the secondary nameservers (the realservers) with the time of the HUP and with the new serial numbers. If the files on the secondary nameservers are not deleted before the HUP, then they will not be updated till the refresh/expire time in the zonefile and the secondary nameservers will appear to dns.monitor to not be working.

LVS is no better than DNS for the same number of working DNS servers. However if a DNS server fails...

Nick Burrett nick (at) dsvr (dot) net 20 Jan 2004

Consider a client with a resolv.conf with IPs:

10.0.0.10
10.0.0.11

If 10.0.0.10 is taken offline, then the client application's speed at getting domains resolved is drastically reduced, because the resolver library will always query 10.0.0.10 before querying 10.0.0.11. Sticking DNS behind LVS alleviates this. Monitoring software will failout the dead DNS realserver.

anon

I'm planning to put my company's dns on lvs with ha.

Greg Woods woods (at) ucar (dot) edu 30 Aug 2002

Unless you have a really unusual situation, I think using LVS for DNS is massive overkill. There is no way that DNS load should overwhelm a single server. If it does, you probably are in dire need of some subdomains. What I do here is just use the heartbeat code so that the hot spare backup machine will take over if the primary goes down, and I do have a restart script that uses scp to move the data files that have been modified over to the backup machine. scp is called out of a script that will keep trying the scp until it succeeds, in case the backup machine is down at the time a change is made. This seems to work for us.

I do use LVS for our mail system, but then, the mail system does anti-spam IP address blacklist checking, and virus scanning. That means the overhead of establishing a connection through LVS is small compared to the load on the server to process a connection. I don't think this is the case for DNS.

Jeff Kilbride

Does anybody else agree that load balancing DNS servers with LVS is not worthwhile?

Peter Mueller pmueller (at) sidestep (dot) com 2005/04/18

Yes, for authorative. ISC-bind has some kind of response-latency measurement built-in. For client side, LVS is useful. In the event of the first server in /etc/resolv.conf failing, there's a 2 second timeout that can be avoided.

If you're LVS'ing named, you may wind up with many VIP's on your director.

Simon Pearce sp (at) http (dot) net 27 Nov 2006

I am running a dns cluster (Gentoo) with two directors active/active and 4 realservers running powerdns. Each server has a 3Ghz Pentium 4 and 1 Gig of Ram. I have about 250 VIPs. I could do it all with one VIP of course, but quite a few of our customers require there own dns servers with there own ip address. A lot of them don't really need it, but it looks good to them.

Simon had problems at high load, which we never figured out

Everytime time the dns cluster exceedes a certain limit some of the ip addresses stop working properly. It effects the system in a way that for certain domains you get a timeout when querying the cluster. Some of the transfered IP's seem to stop working or slow down to an extend that other dns servers stop querying us. Load average is 1-2. Even though queries don't get through the director (reply in 4000ms), the realservers answer direct requests. The only iptables rule is on the director to masquerade out calls to the internet.

Joe: Is the problem load or the number of IPs (if you can tell)? There is another problem with failover of large numbers of IPs, just incase you want to read more on the topic (it may not be related to your problem). http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.failover.html#1024_failover

Can you setup ipvsadm with a single fwmark instead of all the IPs? That would shift the responsibility for handling all the IPs to iptables, rather than ipvsadm.

Graeme Fowler graeme (at) graemef (dot) net 27 Nov 2006

I know it was LVS-DR, and that it didn't have 250+ IP addresses, but the DNS system I built for my previous employer used LVS with keepalived. The last time I had access to the statistics, it was running at something like 1200 queries/sec (which will have risen now by something like 25% if memory serves), 99% of which were UDP, without a glitch.

However - as Joe mentioned - I built it to balance on fwmarks, not on TCP or UDP. Incoming packets were marked in the netfilter 'mangle' table according to protocol and port, and the LVS was then built up from the corresponding fwmarks.

There was one network "race" we never bottomed, which has affected the system once or twice since I left, where an unmarked packet somehow slipped through to the "inside" (ie. realserver-facing rather than client-facing) LAn and then caused massive traffic amplification. That however isn't related in any way to the OP's problem.

15.10. samba, udp 137, udp 138, tcp 139, tcp 445

The problems to be solved with setting up an LVS'ed samba are

  • it's peer-to-peer (rather than client-server)
  • you have to authenticate users
  • if clients can write to the samba'ed disks, you have to propagate the updates to the other realservers.

15.10.1. Fred Lacombe's LVS-Samba HOWTO

Lapin(c) lapin (at) linagora (dot) com 04 Mar 2004

Here is a draft for an LVS-Samba HOWTO (http://www.lapinux.org/howto/) that load balances Samba with LVS-NAT. There are still modifications to add and some tricks to point out, but all feedback will be helpful.

I just tried to make the samba realservers invisible to each other with iptables rules. The only visible machines are an LDAP server and the director. It still has some (undocumented) drawbacks, but I can authenticate against 2 samba realservers and I can access shares on each of them (directly in their filesystem). Unsolved is the problem of sync for the shares: I've thought about a SAN, or some DRBD cross definition. I still to solve this.

Joe: This is big news. I haven't read all of Fred's docs yet, or set one of these up, but Fred seems to have solved the many reader/ single writer problem by having a single LDAP database for all Samba servers and by having (or assuming) a single file system for the shares.

15.10.2. Will McDonald's setup

Will McDonald wmcdonald (at) gmail (dot) com 21 Mar 2006

We have a simple Samba share available on some systems sat behind a pair of LVSs. We have 2 directors in Active/Passive NATing through to two realservers running Heartbeat in Active/Passive. So only one of the realservers has the Heartbeat managed VIP the LVSs NAT through to at any one time. I know for our purposes the realservers could just sit on the same subnet as our other servers but this is an inherited setup and there are other reasonable reasons for it to be like this. Samba's not the realservers *primary* role, there are other services too. The reason they're Active/Passive is because DRBD devices can only be mounted on one node at any one time.

The LVSs are running CentOS4 and the repackaged Ultramonkey packages out of the CentOS Extras repository

heartbeat-ldirectord-1.2.3.cvs.20050927-1.centos4
heartbeat-stonith-1.2.3.cvs.20050927-1.centos4
heartbeat-pils-1.2.3.cvs.20050927-1.centos4
heartbeat-1.2.3.cvs.20050927-1.centos4

ipvsadm: #

TCP  192.168.24.45:445 rr persistent 600
  -> 192.168.25.10:445            Masq    1      2          0
TCP  192.168.24.45:139 rr persistent 600
  -> 192.168.25.10:139            Masq    1      0          0
UDP  192.168.24.45:137 rr persistent 600
  -> 192.168.25.10:137            Masq    1      0          0
UDP  192.168.24.45:138 rr persistent 600
  -> 192.168.25.10:138            Masq    1      0          0

The ldirectord.cf on the LVSs looks as follows...

# TEST SAMBA THROUGH TO DBVIP
virtual=192.168.24.45:137
        real=192.168.25.10:137 masq
        service=none
        scheduler=rr
        persistent=600
        protocol=udp
# TEST SAMBA THROUGH TO DBVIP
virtual=192.168.24.45:138
        real=192.168.25.10:138 masq
        service=none
        scheduler=rr
        persistent=600
        protocol=udp
# TEST SAMBA THROUGH TO DBVIP
virtual=192.168.24.45:139
        real=192.168.25.10:139 masq
        service=none
        scheduler=rr
        persistent=600
        protocol=tcp
# TEST SAMBA THROUGH TO DBVIP
virtual=192.168.24.45:445
        real=192.168.25.10:445 masq
        service=none
        scheduler=rr
        persistent=600
        protocol=tcp

The back-end boxes are FC3 running Ultramonkey packages again, and DRBD for disk replication.

heartbeat-stonith-1.2.3-2.fr.c.1
heartbeat-pils-1.2.3-2.fr.c.1
heartbeat-1.2.3-2.fr.c.1

Samba startup is handled from /etc/ha.d/haresources by simply including "smb" as a resource which starts/stops on failover. The smb.conf's very simple too...

[global]
        server string = Samba on %h
        hosts allow = 192.168.24.
        log file = /var/log/samba/%m.log
        max log size = 5000
        security = share
        socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
        interfaces = 192.168.25.6/32 192.168.25.10/32
        dns proxy = no
[ftp]
        comment = Test FTP Homes
        browseable = yes
        writeable = yes
        guest ok = yes
        path = /mnt/sharedhomes/

This has been pretty reliable but it's not high volume by any stretch of the imagination. Nor is it attached to a domain so I'm not sure how you'll get on with browser mastering etc.

15.10.3. early attempts to LVS samba

The topic of serving samba on an LVS was first raised by John Rodkey rodkey (at) wesmont (dot) edu who wondered if he could serve 300 w2k machines with samba/LVS. Not knowing much about SMB I put out a request for help on samba-technical (at) lists (dot) samba (dot) org. I got replies from several people, including the samba developer Chris Hertal, and from Ryan Fox (who had setup an LVS and who had even read the LVS-HOWTO). I also got a free 2hr phone tutorial by John Terpstra jht (at) samba (dot) org on 26 Oct 2001. One of the big problems is that samba is peer-peer, while LVS works with server-client connections. Wensong has said on the mailing list that you can use samba in read-only mode over LVS, but this will not be of much use to a bunch of windows boxes.

Apparently there's a lot of interest in the commecial world in highly available samba clustering and some effort has been put into making LVS work with samba, by people who don't come up on the LVS mailing list. No-one has succeeded and now it's generally thought that LVS is not the way to go.

Here's John's tutorial as I copied it down over the phone. Thanks John for your help and time.

15.10.4. Attitude adjustment zone for unix people.

  • cifs==smb (two names for the same thing)
  • For more information on cifs/smb see ftp://ftp.samba.org/pub/samba/specs "samba ftp docs" (Sep 2002 link is dead). You'll need to go to a mirror site.

Microsoft's clustering service is derived from the DEC Wolfpack, which was originally a bulletproof, all things to all men, industrial strength, cluster and failout framework. Microsoft is using the part of Wolfpack that corresponds to Linux-HA.

Most communication between windows machines in setting up logins, finding resources (printers, disks, network) is between peers, rather than server/client as for unix. Any machine then will be able to find the resources on the network, whereas in unix, the clients have to find out by some mechanism external to the host (e.g.phone up the sysadmin). The same ports are used at both ends and communication is usually by broadcast (at least initially). Thus there is no distinction between a samba server and a samba client. One host may have the files the user wants and to unix people, this host would be the server. But in windows there are two peers: one machine has a file and the other machine may want it - the role can in principle be reversed without any change in the setup. Election amongst peers is used to determine who will have the role of knowing the location of other resources (e.g. becoming the domain node controller). Unlike unix where you setup a machine deliberately to be a server, by setting up demons listening on a socket, with windows you cannot be guarantee that a certain machine will assume a particular role. You can bias the election (e.g. machines which have been up longer have more weight), but you can't rig the election. If you have to bring a machine down, it's down and it's less likely to win any new elections (possibly for a long time). In a LVS clustered samba setup (if such a thing could be made to exist), a long running client machine out on the internet might win the election and assume the role of locator service.

Communication between machines using IP, is in fact encapsulating netware (if running Novell) or netbuei datagrams inside IP. Samba uses netbios over IP.

To unix people the network is sometimes thought of as a hardware layer. To windows, the network is a netbios messaging layer. Two windows machines could be connected by several protocols (netbuei, netbios, tcpip) over the same piece of wire (ethernet). These connections are regarded as being separate and independant networks - i.e. they use different names for the machines at each end.

15.10.5. the kernel resources are a cloud which receives broadcast messages

Here is an application talking to the kernel in windows.

 ----------
|          |
|   app    |
|          |
 ----------
     |
     |   user space
__________________
     |
     |   kernel space
     |
 ----------
|          |
| WIN32API |  communicates with cloud by broadcast messages
|          |
 ----------
     |
/--------------------------------------------\
| CLOUD                                      |
| replies to bcast messages from WIN32API    |
|                                            |
|  ---------   ---------   ------------      |
| |         | |         | |            |     |
| | locator | | SMB API | | redirector |     |
| | service | |         | |            |     |
|  ---------   ---------   ------------      |
|                                            |
|                                            |
|  ---------   -----------   -----------     |
| | file    | | local     | | remote    |    |
| | system  | | procedure | | procedure |    |
| | drivers | | calls     | | calls     |    |
|  ---------   -----------   ------------    |
|                                            |
\--------------------------------------------/

WIN32API - all communication with cloud is by broadcast. The appropriate box from the cloud will reply. There is no direct connection to drivers as in unix, where the kernel asks the disk driver to "open file X" (on behalf of the application).

SMB API - nothing happens in windows without SMB being involved.

locator - knows where resources (printers, disks, network connections) are

redirector - sends services.

resolver - uses SMB messages to find out where to go.

netware - messes everything up, it's incompatible with the rest of the kernel (e.g. as you'll find if you try to connect by netware _and_ tcpip).

let's look a little more closely

 ----------
| win32api |
 ----------
     |
 ---------
| sbm api |
 ---------
     |
 ---------
| Netbios |
|   api   |
 ---------

SMB has to decide if request is local or remote

  • local - passes call to local PC. anything local has a name like /DEVICE/xxx
  • remote - these have UNC names \\SERVER\share\path\filename

Netbios converts SMB message to a netbios datagram and puts it on the wire as a netbuei or netware message when running IP.

SMB uses netbios over tcpip (not netbios or netware). It uses 3 ports

  • udp 137 - netbios nameservice (==WINS). WINS namespace is flat, rather than hierachial like DNS.
  • udp 138 - browse list (i.e. network neighbourhood)
  • tcp 139 - persistent connection to other machine: session traffic, printing, filesharing.

Every client has to be able to find the local master browser, (domain master browser != local master browser). This could be any machine. Election is conducted by broadcast over udp 137,138 (the election can be biased, but the outcome cannot be forced/guaranteed). What we think of as the samba server, may not win. Broadcast udp will not go over a router, so if the network is routed, then tcp unicast is used for the election (as well as udp broadcasts), telling client to use WINserver (which will be a samba machine or NTWINserver).

15.10.6. connecting to the cluster, windows style

When a new windows machine comes on the net (e.g. an smb client or our samba server), it needs to establish that it has a unique name. Name space is handled by contest. The machine udp broadcasts its name (e.g. JACK) 4 times at 200msec interval and asks "who is local master browser?". A samba server will announce that it is "JILL". If there is another machine of the same name already, it will send back a <NACK>. If there are no <NACK>s, the local master browser will accept the name. The client will register its name by udp broadcast (or possibly tcp unicast) with the WINserver, into the workgroup or domain.

The user will then see something in "network neighborhood". The client machine will do a udp 138 unicast to the local master browser "give me browse list enumeration" (the local master browser has information from the domain master browser too).

On a multisegmented, routed network, each segment has its own local master browser. One machine will be both a local master browser and a domain master browser.

If the user clicks on a machine in "network neighbourhood" (and is using WINserver), the client machine will send a "name lookup request" (like a DNS request) - a netbios unicast request to udp 137 on the local master browser and get the IP of the machine. The client registers (includes services available) with the other machine.

The client machine will then send a tcp 139 "session setup request", and then sets up a netbios connection over tcp to IPC$share on the machine. This setup involves an SMB "net_prot" (negotiate protocol) exchange to setup protocol(s) and establish whether the client can use long filename support and UC/lc letters.

The client has connected with an empty username and passwd at this stage. The client now authenticates and receives back a list of printers, files and is given a persistent connection. The original (passwdless) connection is pulled down.

After 10-15mins of inactivity, the client kernel may elect to drop its session (even if an application is in the middle of editing an open file on the remote machine). The application has no knowlege of this disconnect. When something happens in the application again (or you click on network neighbourhood etc), the session will be renegotiated.

If the remote machine has gone down in the mean time or the client is connected to our hypothetical samba LVS and is redirected to a new samba server (which doesn't know anything about the client's original connection), the user will get a message that the connection cannot be re-established and that the user will have to exit from the application (without saving the edits). Ha-ha, just kidding - that's what you should get - you'll actually get the BSOD.

15.10.7. Samba using a distributed filesystem on the realservers

Kai Suchomel1 KAISUCH (at) de (dot) ibm (dot) com 12 Jun 2006

The Samba Service uses a SAN Filesystem, here GPFS. This File system is shared among all the Samba Services on the RS. When I connect to VIP and the SAN Filesystem, the client can connect to any realserver. When the RS fails, after doing a reconnect, the Client can access the SAN Filesystem over another RS.

15.11. xdmcp, X-window, udp 177 (xdmcp), tcp 6000 (and ssh X-forwarding)

Note

Multiple ports are involved here. However you don't have to LVS all the ports. As far as LVS is concerned, only port 177 needs to be LVS'ed. However you have to know about all the ports to get xdmcp to work, so it's in the multi-port services section.

Not so long ago, a common practice was to serve all applications from a central server to a diskless X-terminal (like an NCD) which ran an X-server from ROM. User's files were backed up centrally. Upgrades/fixes to applications for 100s of clients was a matter of writing the new files to the single, large, reliable central server. The fixes appeared simultaneously for all clients. We've all realised the fundamental flaw in this setup and now we have the applications running on several 100 desktop machines, where upgrades and fixes can take weeks to propagate. The fix to the fix is to run thin clients on the desktops (no wait, didn't we already do that one).

15.11.1. X - attempt 1, connecting directly to X being served by the LVS

Warning
This method does not work

X window is another client-server protocol. The X-client asks for a connection to the X-server by calling from ports starting at 6000 and the server will start displaying X images on its display. If you don't think about it too much, it seems that X should work through LVS. However

Lidsa lidsa (at) legend (dot) com (dot) cn 24 Apr 2002

..but the realserver is the X-client and the X-server does not reside on the realserver. So I think it impossible for LVS to forward X-window.

If you connect from your LVS client to VIP:telnet on an LVS-DR (you will now be connected to one of the realservers) and start xclock on the realserver, you'll get the xclock image on the lvs client (provided that you have a direct connection also between the realserver and the client). If you look with netstat -an you'll find that RIP:1025 is ESTABLISHED with CIP:6000. Yes, the LVS client is the X-server - it is not the X-client. The realserver is the X-client. You can't use LVS to forward X-sesssions.

15.11.2. X - attempt 2, connecting to LVS:xdmcp

This method is most like the login from a diskless xterm.

Severin Olloz showed that you can use an LVS to serve X-sessions by running xdmcpd on the realservers. Severin had problems initially with some logins locking up, but this apparently was due to a misconfiguration of one of his realservers.

When I tried it, I was still able to login from the lvs client after leaving the connection idle for a few hours at the xdm login screen. After leaving the nodes idle overnight I couldn't get a login at the LVS client anymore. On one occasion xdm was running on the node corresponding to the login shown on the client. I restarted xdm on that node and could connect again. On another occasion xdm had died on one of the realservers and the client was just showing the background color for the X-window and a functional mouse, but no xdm login screen. The connections to port 6000 on the lvs client were also gone. I restarted xdm on all the realservers and restarted the client ("X :1 -query VIP") but did not get the xdm login screen. I could connect after running ipvsadm again.

Presumably timeouts will need to be explored to make a working xdmcp LVS.

Severin Olloz S (dot) Olloz (at) soid (dot) ch 30 Apr 2002

I have set up an LVS-DR X11-Server. The LVS client makes a XDMCP-Query with a command like this:

X :1 -query VIP

and the director of the cluster sends the UDP packets on port 177 (XDMCP) to the realserver. The realserver accepts the request and opens a X11-session for the user. (Note: the realserver is opening a direct connection to CIP:600x - this is not under control of the LVS. The LVS client and the realserver must be able to exchange packets directly.) My ipvsadm table looks like this:

director:~# grep xdmcp /etc/services
xdmcp           177/tcp                         # X Display Manager Control Protocol
xdmcp           177/udp

IP Virtual Server version 1.0.2 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port        Forward Weight ActiveConn InActConn
UDP  VIP:xdmcp wlc persistent 360
  -> node1:xdmcp                  Local   100    0          0
  -> node2:xdmcp                  Route   100    0          0

The director is a realserver too, using localnode.

Here's more details I discovered when I reproduced Severin's setup.

For info on XDMCP, see the Linux XDMCP HOWTO and the many links provided therein.

In this method, X-clients on the realservers connect directly to the X-server on the LVS client. The LVS is only used for xdmcp authentication. Once this step has been accomplished, the LVS steps out of the way and the X-session is between the realserver selected and the client. The client then must be able to send packets directly to the RIP on the realserver. In a normal LVS-DR, the RIP is not routable from the lvs client. The RIPs will have to be routable or public IPs.

For a test, first connect directly from your lvs client box to a realserver (no director or LVS involved yet). Setup your xdm-config, Xaccess files on the realserver(s) as described in the XDMCP HOWTO and check the permissions of Xservers and Xsetup_0. Make sure xdm is running on the realserver (the XDMCP-HOWTO does this via the inittab file, but you can just fire it up from the command line for a test). Check that xdm is running

RS1:/etc# ps -auxw | grep xdm
root       329  0.0  1.7  2892 1088 ?        S    11:40   0:00 xdm
root       331  0.5  3.9  5612 2456 ?        S    11:40   0:01 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-Z

Run the next command. If you don't have X running, it will be started for you. If your LVS client is displaying an X-window (i.e. you ran `startx`) then the client at the other end will overwrite your current X-session.

client# X :1 -query RIP

The original window manager screen should disappear on your client box to be replaced by the xdm login from the realserver. If you just have a blank screen on the client, with a mouse X but no login box, check that xdm is running on the realserver. After you login, you get the window manager set in /etc/X11/xdm/Xsessions. In the default Xsession file, xsm is used which defaults (see man xsm) to running twm with smproxy and an xterm. This is pretty gruesome, so I substituted xsm with my window manager, fvwm2. Here's part of Xsession.

if [ -f "$startup" ]; then
        exec "$startup"
else
        if [ -f "$resources" ]; then
                xrdb -load "$resources"
        fi
        #exec xsm
        exec /usr/X11/bin/fvwm2
fi

From the console on the client, you can see the connections from the realserver back to the X-server on the client.

client# netstat -an
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0  client:6001            realserver:1067         ESTABLISHED
tcp        0      0  client:6001            realserver:1066         ESTABLISHED
tcp        0      0  client:6001            realserver:1065         ESTABLISHED
tcp        0      0  client:6001            realserver:1063         ESTABLISHED
tcp        0      0  client:6001            realserver:1059         ESTABLISHED
tcp        0      0 0.0.0.0:6001            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:6000            0.0.0.0:*               LISTEN
.
.
Active UNIX domain sockets (including servers)
Proto RefCnt Flags       Type       State         I-Node Path
unix  1      [ ACC ]     STREAM     LISTENING     399263 /tmp/.X11-unix/X1
.

Now set up the director to forward xdmcp/udp and connect to VIP:xdmcp. Note: I'm not using persistence, while Severin is. Non-persistence seems to work for an LVS of 4 realservers.

client# X :1 -query VIP
Here's the output of ipvsadm after connecting
director:~# ipvsadm
IP Virtual Server version 0.9.4 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
UDP  lvs.mack.net:xdmcp rr
  -> RS4.mack.net:xdmcp           Route   1      0          1
  -> RS3.mack.net:xdmcp           Route   1      0          0
  -> RS2.mack.net:xdmcp           Route   1      0          0
  -> RS1.mack.net:xdmcp           Route   1      0          0

`netstat -an` doesn't show any connections to the VIP (it's udp afterall), but the connections to the X-server ports on the client are seen along with the entry for X1. Once you have logged in via xdm, the client and realserver are connected directly and LVS is not involved anymore. After you logout from the X-session at the client, and return to the XDMCP login screen, the connections to port 600x are gone.

After exiting from an X-session the client will be presented with a new xdm login screen. Watching with tcpdump shows the following steps following the termination of an X-session.

  • many packets are exchanged between RIP:high port and CIP:6001, presumably repainting the login screen.
  • single udp packet from CIP:high_port to VIP:xdmcp is forwarded to realserver. (The director has no further role after this).
  • 2.5 secs later, an exchange of two pairs of udp packets between RIP:xdmcp and CIP:high_port.
  • the login screen appears on the LVS client and the InActConn counter is incremented in ipvsadm.
  • if you wait here, the InActConn counter (ipvsadm) decrements in about 3mins. If you login and out before the timer decrements, you are returned to the same realserver. If you disconnect after the timeout, you are reconnected with the next realserver.
  • Whether you wait or not, when you enter your name/passwd, no further packets are passed by udp/xdmcp. Instead, there is another flood of tcp packets from RIP:high_port to CIP:6001.

It appears that xdmcp only presents the login screen and that login occurs via the X connection later. If both painting the login screen initially and (after the timeout on the director, about 3mins) sending the name/passwd used xdcmp, then it's possible that the login data could be sent to a different host that painted the screen. This apparently can't happen. (Joe, May 2004: I have no idea why I said that this "apparently can't happen.)

You could setup an xterm farm with this using a bunch of diskless 486 PCs with 16M memory.

15.11.3. X - attempt 3, X-forwarding with ssh and connecting to LVS:sshd

In this method, you have your window manager running on your LVS client and you are displaying realserver X-clients on the X-server running on your LVS client.

To setup, on the realserver, have the entry "X11Forwarding yes" in sshd_config (re-HUP sshd if neccessary). On the client, have the entry "ForwardX11 yes" in ssh_config. If you like, as a test, ssh (with `ssh -v`) directly from the client box to the realserver (not to the VIP) as if you were doing a regular ssh login. After login, look to see that X-forwarding is turned on by looking at the DISPLAY variable.

.
- verbose output from login with `ssh -v remote_node` -
.
debug1: channel_free: channel 1: status: The following connections are open:
  #0 client-session (t4 r0 i1/0 o16/0 fd 4/5)
  #1 x11 (t4 r2 i8/0 o128/0 fd 7/7)
realserver:~# echo $DISPLAY
realserver:10.0
realserver:~#

Note
"realserver" is the name of the machine you have logged into (it might be "localhost"). The name you get will NOT be the name of the machine with the X-server that will be displaying the X (as would normally happen with non-forwarded X connections). Here your X-server is the lvs client.

The $DISPLAY variable is showing where X-clients running on the realserver will send their output. In this case "realserver:10.0" is a proxy X-server running on the remote machine, which will forward the X-calls to the X-server running on the lvs client machine (the output will not go to the realserver). If you now run `xclock` on the realserver, it will be displayed on the lvs client machine.

Next setup your director to forward ssh. For more info see the section on sshd. In particular make sure all the host keys on the realservers are identical. Connect to VIP:sshd. You should now be able to start X-clients apparently running on the VIP (but really running on the realserver).

15.12. r commands; rsh, rcp, and their ssh replacements, tcp 513 (,514) and another connection

An example of using rsh to copy files is in performance data for single realserver LVS Sect 5.2,

Note
Caution: The matter of rsh came up in a private e-mail exchange. The person had found that rshd, operating as an LVS'ed service, initiated a call (rsh client request) to the rshd running on the LVS client. (See Stevens "Unix Network Programming" Chapter 14, which explains rsh.) This call will come from the RIP rather than the VIP. This will require rsh to be run under LVS-NAT or else the realservers must be able to contact the client directly. Similar requests from the authd/identd client and passive ftp on realservers cause problems for LVS.

David Lambe david (dot) lambe (at) netunlimited (dot) com Mon, 13 Nov 2000

I've recently completed "construction" of a LVS cluster consisting of 1 LVS and 3 realservers. Everything seems to work OK with the setup except for rcp. All it ever gives is "Permission Denied" when running rcp blahfile node2:/tmp/blahfile from a console on node1. Both rsh and rlogin function, BUT require the password to be entered twice.

Joe

sounds like you are running RedHat. You have to fix the pam files. The beowulf people have been through all of this. You can either recompile the r* executables without pam (my solution), or you can fiddle with the pam files. For suggestions, go to the beowulf mailing archives - you have to download the whole archive at whole archive and grep through it.

If you go to the beowulf site, you'll find people are moving to replace rsh etc with ssh etc on sites which could be attacked from outside (and turning off telnet, r* etc). For examples setup files for ssh see the section on sshd.

15.13. Streaming Media: RealNetworks, Quicktime, Windows Media Server, tcp/udp 554 (and other ports)

15.13.1. RealNetworks streaming protocols, tcp 554, many ports

Jerry Glomph Black black (at) real (dot) com August 25, 2000

RealNetworks' streaming protocols are

  • PNM (TCP on port 7070, UDP from server -> player on ports 6970-7170). PNM was the original protocol in version 1 through 5. It's now mostly legacy.
  • RTSP (TCP on port 554, similar UDP as above, but often on multiple ports) With the G2 release, we adopted the RTSP delivery standard. The current version, RealPlayer 8 came out about two weeks ago. A free one is available to run on just about any platform in common use today. The Linux versions are great.
  • There's also a HTTP/TCP-only fallback mode which is (usually) on port 8080.

The server configuration can be altered to run on any port, but the above numbers are the customary, and almost universally-used ones.

Mark Winter, a network/system engineer in my group wrote up the following detailed recipe on how we do it with LVS:

add IP binding in the G2 server config file

<List Name="IPBindings">
     <Var Address_1="<real ip address>"/>
     <Var Address_2="127.0.0.1"/>
     <Var Address_3="<virtual ip address>"/>
</List>

On the LVS side
./ipvsadm -A -u <VIP>:0  -p
./ipvsadm -A -t <VIP>:554  -p
./ipvsadm -A -t <VIP>:7070  -p
./ipvsadm -A -t <VIP>:8080  -p

./ipvsadm -a -u <VIP>:0 -r <REAL IP ADDRESS>
./ipvsadm -a -t <VIP>:554 -r <REAL IP ADDRESS>
./ipvsadm -a -t <VIP>:7070 -r <REAL IP ADDRESS>
./ipvsadm -a -t <VIP>:8080 -r <REAL IP ADDRESS>

(Ted)

I just wanted to add that if you use FWMARK, you might be able to make it a little simpler and not have to worry about forwarding EVERY UDP port.

# Mark packets with FWMARK1
ipchains -A input -d <VIP>/32 7070 -p tcp -m 1
ipchains -A input -d <VIP>/32 554 -p tcp -m 1
ipchains -A input -d <VIP>/32 8080 -p tcp -m 1
ipchains -A input -d <VIP>/32 6970:7170 -p udp -m 1

# Setup the LVS to listen to FWMARK1
director:/etc/lvs# ipvsadm -A -f 1 -p

# Setup the realserver
director:/etc/lvs# ipvsadm -a -f 1 -r <RIP>

Not only is this only six lines rather than eight, but now you've setup a persistent port grouping. You do not have to forward EVERY UDP port, and you're still free to setup non-persistent services (or other persistent services that are persistent based on other ports).

When you want to remove a realserver, you now do not have to remove FOUR realservers, you just remove one. Same thing with adding. Plus, if you want to change what's forwarded to each realserver, you can do so with ipchains and not bother with taking up and down the LVS. ALSO... if you have an entire network of VIPs, you can setup IPCHAINS rules which will forward the entire network automatically rather than each VIP one by one.

Jerry Glomph Black black (at) prognet (dot) com 07 Jun 2001

Following is a currently-operational configuration for LVS balancing of a set of 3 RealServers (or Real Servers, in LVS-terminology) It has been running at very high loads (thousands of simultaneous connections) for months, in addition to numerous conventional LVS setups for more familiar web load-balancing at massive loads.

#!/bin/sh
# LVS initialization for RealNetworks streaming.
#
# client connects on TCP ports 554 (rtsp) or 7070 (pnm, deprecated)
# data returns to client either as UDP on port-range 6970-7170, or
# via the initial TCP socket, if the client cannot receive the UDP stream.

# written and tested to very high (several thousand simultaneous) client load by
# Mark Winter, network department, RealNetworks
# additional LVS work by Rodney Rutherford and Glen Raynor, internet operations
# with random comments by Jerry Black, former Director of Internet Operations
# supplied with no warranty, support, or sympathy, but it works great for us

# Setup IP Addresses
VIP="publicly-advertised-IP-number.mynet.com"
RIP_1="RealServer-1.mynet.com"
RIP_2="RealServer-2.mynet.com"
RIP_3="RealServer-3.mynet.com"

# Load needed modules
BALANCE="wrr"
# Load LVS fwmark module
/sbin/modprobe ip_masq_mfw
# Load appropriate LVS load-balance algorithm module
/sbin/modprobe ip_vs_$BALANCE

# Mark packets with FWMARK1
/sbin/ipchains -F
/sbin/ipchains -A input -d ${VIP}/32 7070 -p tcp -m 1
/sbin/ipchains -A input -d ${VIP}/32 554 -p tcp -m 1
/sbin/ipchains -A input -d ${VIP}/32 8080 -p tcp -m 1
/sbin/ipchains -A input -s 0.0.0.0/0 6970:7170 -d ${VIP}/32 -p udp -m 1

# Setup the LVS to listen to FWMARK1
/sbin/ipvsadm -C
/sbin/ipvsadm -A -f 1 -p -s $BALANCE

# Setup the realservers
/sbin/ipvsadm -a -f 1 -r ${RIP_1}
/sbin/ipvsadm -a -f 1 -r ${RIP_2}
/sbin/ipvsadm -a -f 1 -r ${RIP_3}

Roberto Nibali ratz (at) tac (dot) ch 08 Jun 2001

there is no fwmark module, and the ip_vs module is loaded by ipvsadm now. Why do you need persistence?

15.13.2. RealNetworks g2 server

philz (at) testengeer (dot) com 3 Apr 2000

A realnetworks g2 server is the daemon that serves up real audio/video streams (http://real.com). I'm using LVS-Tun. When I tried setup a realnetworks g2 server I could not get it to accept the connection (tcp port 7070). A telnet to port 7070 on the VIP yeilds a connection refused. while telnet to the realserver ip yeilds a "connect" (it also serves video and audio if you use the proper client).

Joe

Is the service listening on the VIP (a common thing to forget when setting up LVS-DR or LVS-Tun)?

That's it. Success! Here is what has to be done:

  • The real real audio/video daemon must be configured to listen/respond to _BOTH_ the VIP and its RIP (see Configure->General Setup->IP Binding on the RealAdministrator web page).
  • Both the 7070 and 554 (PNAPort and RTSPPort respectively) must be redirected. You might have to do more ports for other features of the real audio/video daemon.

er OK. The demon listening on the RIP never hears from anyone though ;-\

You actually need the RIP to respond so that you can manage/monitor it.

congratulations. You've got a realserver to be a RealServer. Is this the thing that costs $2995 with RedHat?

Nope. This is the free one that supports 25 session per server ;-)

What's on each of 7070 and 554? Is one video and the other audio? What does PNAPort and RTSPPort stand for? What happens if the client gets 7070 from one realserver and 554 from another? Did you have to link the 2 services with persistent connection?

15.13.3. Quicktime, tcp 554, many ports

First, a quicktime primer from Andy Wettstein:

It is similar to Real. 554 is rtsp, and there is an option on the quicktime server to stream over port 80 to avoid firewall problems. The ports 6970:7170 are what the client will actually send/receive the stream on (if not blocked by firewall rules, etc). The udp stuff is why you need persistence. The stream would try to switch between servers without persistence enabled (since udp is really connectionless).

Andy Wettstein awettstein (at) cait (dot) org 20 Dec 2002

I'm trying to set up the quicktime (darwin) streaming server through lvs. It kind of works, but it is very slow, much slower than just accessing the stream without going through lvs. I have set it up exactly the same as the Real rtsp examples. I am using lvs-dr with fwmark on ports. Here are the iptables commands I used:

# iptables -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d 209.174.123.48 --dport 80 -j MARK --set-mark 1
# iptables -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d 209.174.123.48 --dport 554 -j MARK --set-mark 1
# iptables -t mangle -A PREROUTING -i eth0 -p udp -s 0.0.0.0/0 -d 209.174.123.48 --dport 6970:7170 -j MARK --set-mark 1

Then I added the lvs-dr like the examples:

# ipvsadm -A -f 1 -s rr
# ipvsadm -a -f 1 -r 209.174.123.45
# ipvsadm -a -f 1 -r 209.174.123.47

And I get this with ipvsadm:

FWM  1 rr
  -> lead.web.cait.org:0          Route   1      0          1
  -> tin.web.cait.org:0           Route   1      1          1

I am also unable to access the stream on port 80 through lvs. If anyone has experience with quicktime please let me know if there is anything further that I need to do.

I figured it out. It needs persistence (or streaming movies will fail) i.e. the ipvsadm -A command needs a "-p".

Here's the mon.cf

watch tin
   service rtsp
      interval 30s
      monitor tcp.monitor -p 554
      period wd {Sun-Sat}
         startupalert qtss.alert -u -V caittv.cait.org -R tin.cait.org -W 3 -m 1 -S wlc
         upalert qtss.alert -R tin.cait.org -W 3 -m 1 -F dr -s wlc
         alert qtss.alert -R tin.cait.org -m 1

and the qtss.alert

#!/bin/bash
#


IPTABLES="/sbin/iptables"
IPVSADM="/sbin/ipvsadm"

while getopts ":s:g:h:l:t:V:m:o:W:R:S:u" Option
do
  case $Option in
    V)	VIRTUALSERVER="$OPTARG";;
    m)	MARK="$OPTARG";;
    o)	OPTION="$OPTARG";;
    W)	WEIGHT="$OPTARG";;
    R)	REALSERVER="$OPTARG";;
    S)	SCHEDULER="$OPTARG";;
    u)	UP=1;;
  esac
done

shift $(($OPTIND - 1))

if [ $UP ]; then
	# won't add more iptables MARK rules after the initial go
	# so we don't clog up the rules
	# you'll have to resolve problems if you need to add more to the marked service
	if ! $IPTABLES -L -t mangle | grep "MARK set 0x$MARK" > /dev/null; then
		$IPTABLES -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d $VIRTUALSERVER --dport 80 -j MARK --set-mark $MARK
		$IPTABLES -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d $VIRTUALSERVER --dport 554 -j MARK --set-mark $MARK
		$IPTABLES -t mangle -A PREROUTING -i eth0 -p udp -s 0.0.0.0/0 -d $VIRTUALSERVER --dport 6970:7170 -j MARK --set-mark $MARK
	fi
   # set up the virtual server
	$IPVSADM -A -f $MARK -s $SCHEDULER -p
   # add the realserver
  	$IPVSADM -a -f $MARK -w $WEIGHT -r $REALSERVER
else
	# remove
	$IPVSADM -d -f $MARK -r $REALSERVER

fi
exit 0

15.13.4. Windows Media Server, tcp/udp 554, tcp 1755, udp 1024:5000

Mark Weaver mark (at) npsl (dot) co (dot) uk 23 Mar 2004

Here's how to setup Windows Media Server. This information is not easy to come across as I can't find a simple published document which lists what WMS actually does. There is also some attempt here at WMS9 support, but that's untested and is just based on what the player tries to do (the player connects more quickly, however, if you reject rather than drop those connection attempts, which I'm letting the server do).

# WMS: we want to group TCP 1755 and UDP 1024-500
# Also uses 554/tcp + 554/udp for WMS9.
# You might also want to add port 80 if serving up via http as well.
# To do this, set an fw mark on such connections, and use LVS fwmark
balancing (
# will forward matching IP+fwmark to the same server).  Just what we need.
EXT_IP="1.2.3.4"
EXT_IF="eth0"
WMS_MARK="1"
RS1_IP="192.168.1.2"
RS2_IP="192.168.1.3"

# Allow appropriate ports in...
$IPTABLES -A INPUT -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 1755 -j ACCEPT
$IPTABLES -A INPUT -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 554 -j ACCEPT
$IPTABLES -A INPUT -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 554 -j ACCEPT
$IPTABLES -A INPUT -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 1024:5000 -j ACCEPT

# Group with fwmark...
$IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 1755 -j MARK --set-mark $WMS_MARK
$IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 554 -j MARK --set-mark $WMS_MARK
$IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 554 -j MARK --set-mark $WMS_MARK
$IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 1024:5000 -j MARK --set-mark $WMS_MARK

# Tell LVS to do the load balancing
$IPVSADM -D -f $WMS_MARK
$IPVSADM -A -f $WMS_MARK -s rr -p 600
$IPVSADM -a -f $WMS_MARK -r $RS1_IP:0 -m
$IPVSADM -a -f $WMS_MARK -r $RS1_2P:0 -m

15.14. Radius, udp 1645,1646

Francois Baligant 2000-05-10

We have a very weird problem load-balancing UDP-based RADIUS packets.

UDP 195.74.212.37:16450 rr
      -> 195.74.212.26:16450   Route   1      0          0
      -> 195.74.212.34:16450   Route   1      0          0
UDP 195.74.212.31:1646 wlc
      -> 195.74.212.26:1646    Route   1      0          106
      -> 195.74.212.10:1646    Route   1      0          106
UDP 195.74.212.31:1645 wlc
      -> 195.74.212.26:1645    Route   1      0          1
      -> 195.74.212.10:1645    Route   1      0          0

I have a series of NAS (Network Access Server) sending Authentication Requests to a single central Proxy Radius server (packets arrive sometimes 5packets/sec). This Proxy Radius Server then forwards Authentication Request to the load-balancer which should normally dispatch them to several nodes for processing (check with DB etc..)

We want to load-balance 3 ports: 1645 (authentication), 1646 (accounting) and 16450 (authentication for another kind of service).

The rule for port 1646 loadbalances. However for rule 16450 and 1645, all UDP requests go to only one realserver. (rule 16450 is not used at the moment. 1645 is. You can see the strange little "1" for 195.74.212.26) What's weird is that 1645 works really fine but the 2 others rules just do not load-balance. Packets are always sent to the same host. (in fact the first that was added to the VS IP)

Joe

Someone had a similar sounding problem with udp ntp. All packets would go to one host and then after a little while to another. In the short term the load balancing was bad, but over the long term (>15mins) the loadbalancing was fine. The udp LVS code sends all udp packets to one realserver, till a timeout is reached, and then sends the next packets to another realserver.

(See also Scheduling TCP/UDP.)

Julian

Julian

Single Radius Server? Does that mean that all packets come from a single IP:port too?

Don't forget that for UDP the autobind ports are not rotated. For TCP you have ports selected in the 1024..4999 range but it is possible all your client UDP packets to come from the same port on the client. This can be a good reason they to be redirected to the same realserver if the UDP entry is not expired. Show a tcpdump session or try to set UDP timeout to a small value:

ipchains -M -S 0 0 2

Any difference? How many clients (UDP sockets) you have? If you have one, it can't be balanced. There is a persistency according to the default UDP timeout value.

14:06:36.277177 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF)
14:06:36.277205 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF)
14:06:36.430549 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF)
14:06:36.430575 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF)
14:06:36.639869 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF)
14:06:36.639894 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF)
14:06:38.040246 195.74.193.40.60774 > 195.74.212.31.1645: udp 246 (DF)
14:06:38.040276 195.74.193.40.60774 > 195.74.212.31.1645: udp 246 (DF)
14:06:38.117694 195.74.193.40.60774 > 195.74.212.31.1645: udp 243 (DF)

14:06:49.899222 195.74.193.40.40190 > 195.74.212.31.1646: udp 349 (DF)
14:06:49.899256 195.74.193.40.40190 > 195.74.212.31.1646: udp 349 (DF)
14:06:50.358085 195.74.193.40.40223 > 195.74.212.31.1646: udp 349 (DF)
14:06:50.358114 195.74.193.40.40223 > 195.74.212.31.1646: udp 349 (DF)
14:06:51.494628 195.74.193.40.40346 > 195.74.212.31.1646: udp 349 (DF)
14:06:51.494656 195.74.193.40.40346 > 195.74.212.31.1646: udp 349 (DF)
14:06:51.810022 195.74.193.40.40381 > 195.74.212.31.1646: udp 349 (DF)
14:06:51.810051 195.74.193.40.40381 > 195.74.212.31.1646: udp 349 (DF)
14:06:52.351541 195.74.193.40.40485 > 195.74.212.31.1646: udp 199 (DF)

I think you just helped me to understand what was the problem. Port 1645 is not loadbalancing. I will patch the radius to increate port number for accounting request too.