Horms worked out that transparent proxy could be used in an LVS
Transparent proxy is a piece of Linux kernel code which allows a packet destined for an IP _not_ on the host, to be accepted locally, as if the IP was on the host. Transparent proxy is the mechanism by which you can make a director (or realserver) work without having the VIP configured on it.
Transparent proxy allows the realserver to solve The Arp Problem. The director sends the packets to the MAC address of the realserver; transparent proxy tells the realserver to accept the packet with dst_addr=VIP (even though this IP is not on the realserver); since there is no VIP on the realserver, it does not reply to arp queries for the VIP.
Without the VIP on a machine, methods other than the normal IP routing are required to deliver packets with dst_addr=VIP (see routing to a director without a VIP).
A VIP-less way of setting up an LVS is firewall mark (fwmark). There an incoming packet is marked and the mark (rather than a VIP) is used to forward the packet. In the case of using a fwmark on a director, the packet still has to be accepted on the director. For this you need the VIP on the outside NIC or you need TP. It would be nice for an LVS if a packet with a fwmark that is in the ipvsadm table (i.e. this is a packet to be forwarded by ip_vs) could be accepted by the node without having to also put the VIP on the node (and without using TP), by a modification of the LVS code. In principle this is possible and Julian would write it, if he thought it was going to be used. At the moment I'm the only one asking for it.
Note | |
---|---|
Feb 2003: The TP implementation for stock 2.4.x kernels behaves differently than the 2.2. For 2.4 the packet is accepted locally with the primary IP of the NIC, rather than the VIP, as for 2.2 kernels. This makes 2.4 TP unusable for LVS directors, although it still works fine for web-caches (i.e. squids), it's original purpose. On talking to Harald Welte at the 2001 Ottawa Linux Symposium, there had been much discussion on the netfilter mailing lists as to whether to preserve the original behaviour. Since no-one (that they knew about) needed the original behaviour, that functionality was dropped. It seems too late to restore the functionality to netfilter now. Some of the functionality that LVS wants out of TP is available via firewall mark (fwmark) and so the issue is probably moot now, and we're not going to ask the netfilter people to restore the original TP functionality for 2.4 kernels. It's possible to patch the code for LVS, but this would require someone to keep track of the netfilter code for each version of the kernel. RedHat has patched its kernels to restore the TP functionality and Ratz maintains patches for the standard kernel (see below). Most of the writeup for 2.4 kernels in this section was my efforts to find out what was happening with 2.4 TP. This section of the HOWTO will have to be rewritten when people start using the 2.4 TP patches. |
Take-home lesson: TP only works for LVS (directors and realservers) on 2.0 and 2.2 kernels. For 2.4 (and higher) TP only works on realservers for LVS to handle the Arp problem.
Note: web caches (proxies) can operate in transparent mode, when they cache all IP's on the internet. In this mode, requests are received and transmitted without changing the port numbers (ie port 80 in and port 80 out). In a normal web cache, the clients are asked to reconfigure their browsers to use the proxy, some_IP:3128. It is difficult to get clients to do this, and the solution is transparent caching. This is more difficult to setup, but all clients will then use the cache.
In the web caching world, transparent caching is often called "transparent proxy" because it is implemented with transparent proxy. In the future, it is conceivable that transparent web caching will be implemented by another feature of the tcpip layer and it would be nice if functionality of transparent web caching had a name separate from the command that is used to implement it.
To use TP in an LVS, packets from the client have to be delivered to a machine which does not have the IP of the dst_addr of the client's packets (i.e. the VIP). Read the part of the section on routing and delivery concerned with routing packets to machines without the dst_addr.
This is Horms' (horms (at) vergenet (dot) net) method (also called the transparent proxy or TP method). It uses the transparent proxy feature of ipchains to accept packets with dst=VIP by the host (director or realservers) when it doesn't have the IP (eg the VIP) on a device. It can be used on the realservers (where it handles the The Arp Problem) or the director to accept packets for the VIP. When used on the director, TP allows the director to be the default gw for LVS-DR (see martian modification).
Unfortunately the 2.2 and 2.4 versions of transparent proxy are as different as chalk and cheese in an LVS. Presumably the functionality has been maintained for for transparent web caching but the effect on LVS has not been considered.
You can use transparent proxy for
(Historical note from Horms:) From memory I was getting a cluster ready for a demo at Inetnet World, New York which was held in October 1999. The cluster was to demo all sorts of services that Linux could run that were relevant to ISPs. Apache, Sendmail, Squid, Bind and Radius I believe. As part of this I was playing with LVS-DR and spotted that the realservers coulnd't accept traffic for the VIP. I had used Transparent Proxying in the past so I tried it and it worked. That cluster was pretty cool, it took me a week to put it together and it was an ISP in an albeit very large box.
Transparent proxy is only implemented in Linux.
Julian
Transparent proxy support calls ip_local_deliver from where the LVS code is reached. One of the advantages of this method is that it is easy for a director and realserver to exchange roles in a failover setup.
This is a demonstration of TP using 2 machines: a realserver (which will accept packets by TP) and a client (i.e. this is not an LVS).
On the realserver: ipv4 forwarding must be on.
echo "1" > /proc/sys/net/ipv4/ip_forward |
You want your realserver to accept telnet requests on an IP that is not on the network (say 192.168.1.111). Here's the result of commands run at the server console before running the TP code, confirming that you can't ping or telnet to the IP.
realserver:# ping 192.168.1.111 PING 192.168.1.111 (192.168.1.111) from 192.168.1.11 : 56(84) bytes of data. From realserver.mack.net (192.168.1.11): Destination Host Unreachable realserver:# telnet 192.168.1.111 Trying 192.168.1.111... telnet: Unable to connect to remote host: No route to host |
so add a route and try again (lo works here, eth0 doesn't)
realserver:# route add -host 192.168.1.111 lo realserver:# telnet 192.168.1.111 Trying 192.168.1.111... Connected to 192.168.1.111. Escape character is '^]'. Welcome to Linux 2.2.16. realserver login: |
This shows that you can connect to the new IP from the localhost. No transparent proxy involved yet.
If you go to another machine on the same network and add a route to the new IP.
client:# route add -host 192.168.1.111 gw 192.168.1.11 client:# netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.1.111 192.168.1.11 255.255.255.255 UGH 0 0 0 eth0 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo |
raw sockets work between the client and server -
client:# traceroute 192.168.1.111 traceroute to 192.168.1.111 (192.168.1.111), 30 hops max, 40 byte packets 1 server.mack.net (192.168.1.11) 0.634 ms 0.433 ms 0.561 ms |
however you can't ping (i.e. icmp doesn't work) or telnet to that IP from the other machine.
client:# ping 192.168.1.111 PING 192.168.1.111 (192.168.1.111) from 192.168.1.9 : 56(84) bytes of data. From realserver.mack.net (192.168.1.11): Time to live exceeded client:# telnet 192.168.1.111 Trying 192.168.1.111... telnet: Unable to connect to remote host: No route to host |
Here's the output of tcpdump running on the target host
14:09:09.789132 client.mack.net.1101 > tip.mack.net.telnet: S 1088013012:1088013012(0) win 32120 <mss 1460,sackOK,timestamp 7632700[|tcp]> (DF) [tos 0x10] 14:09:09.791205 realserver.mack.net > client.mack.net: icmp: time exceeded in-transit [tos 0xd0] |
(Anyone have an explanation for this, apart from the fact that icmp is not working? Is the lack of icmp the only thing stopping the telnet connect?)
The route to 192.168.1.111 is not needed for the next part.
realserver:# route del -host 192.168.1.111 |
Now add transparent proxy to the server to allow the realserver to accept connects to 192.168.1.111:telnet
This is the command for 2.2.x kernels
realserver:# ipchains -A input -j REDIRECT telnet -d 192.168.1.111 telnet -p tcp realserver:# ipchains -L Chain input (policy ACCEPT): target prot opt source destination ports REDIRECT tcp ------ anywhere 192.168.1.111 any -> telnet => telnet Chain forward (policy ACCEPT): Chain output (policy ACCEPT): |
In the normal functioning of an LVS, once the packet has been redirected, the director steps in and sends it to the realservers and the reply comes from the realservers. However you can use the REDIRECT to connect with a socket on a different port independantly of the LVS function.
Joe, 4 Jun 2001
If I have 2 boxes (not part of an LVS) and on the server box I run
$ipchains -A input -j REDIRECT telnet serverIP 81 -p tcp |
then I can telnet to port 81 on the realserver box and have a normal telnet session. I watched with tcpdump on the server and all I see is a normal exchange of packets with dest-port=81.
I thought with REDIRECT that the packet with dest-port=81 was delivered to the listener on realserverIP:telnet. How does the telnetd know to return a packet with source-port=telnet?
Julian
This is handled from the protocol, TCP in this case:
grep redirport net/ipv4/*.cThe higher layer (telnet in this case) can obtain the two dest addr/ports by using getsockname(). In 2.4 this is handled additionally by using getsockopt(...SO_ORIGINAL_DST...)
The netfilter mailing list contains examples on this issue. You can search for "getsockname"
server:# iptables -t nat -A PREROUTING -p tcp -d 192.168.1.111 --dport telnet -j REDIRECT server:# iptables -L -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination REDIRECT tcp -- anywhere 192.168.1.111 tcp dpt:telnet </para><para> Chain POSTROUTING (policy ACCEPT) target prot opt source destination </para><para> Chain OUTPUT (policy ACCEPT) target prot opt source destination |
You still can't ping the transparent proxy IP on the server from the client
client:# ping 192.168.1.111 PING 192.168.1.111 (192.168.1.111) from 192.168.1.9 : 56(84) bytes of data. From server.mack.net (192.168.1.11): Time to live exceeded |
The transparent proxy IP on the server will accept telnet connects
client:# telnet 192.168.1.111 Trying 192.168.1.111... Connected to 192.168.1.111. Escape character is '^]'. Welcome to Linux 2.2.16. server login: |
but not requests to other services
client:# ftp 192.168.1.111 ftp: connect: No route to host ftp> |
Conclusion: The new IP will only accept packets for the specified service. It won't ping and it won't accept packets for other services.
________ | | | client | |________| CIP=192.168.1.254 | (router) | VIP=192.168.1.110 (eth0, arps) __________ | | | director | |__________| DIP=192.168.1.1 (eth1, arps) | | ------------------------------------- | | | RIP1=192.168.1.2 RIP2=192.168.1.3 RIP3=192.168.1.4 (eth0) _____________ _____________ _____________ | | | | | | | realserver | | realserver | | realserver | |_____________| |_____________| |_____________| | | | (router) (router) (router) | | | ----------------------------------------------> to client |
Here's a script to run on 2.2.x realservers/directors to setup Horms' method. This is incorporated into the configure script.
#!/bin/sh #rc.horms #script by Joseph Mack and Horms (C) 1999, released under GPL. #Joseph Mack jmack (at) wm7d (dot) net, Horms horms (at) vergenet (dot) net #This code is part of the Linux Virtual Server project #http://www.linuxvirtualserver.org # # #Horm's method for solving the LVS arp problem for a LVS-DR LVS. #Uses ipchains to redirect a packet destined for an external #machine (in this case the VIP) to the local device. #----------------------------------------------------- #Instructions: # #1. Director: Setup normally (eg turn on LVS services there with ipvsadm). #2. Realservers: Must be running 2.2.x kernel. # 2.1 recompile the kernel (and reboot) after turning on the following under "Networking options" # Network firewalls # IP: firewalling # IP: transparent proxy support # IP: masquerading # 2.2 Setup the realserver as if it were a regular leaf node on the network, # <emphasis>i.e.</emphasis> with the same gateway and IP as if it were in the LVS, but DO NOT # put the VIP on the realserver. The realserver will only have its regular IP # (called the RIP in the HOWTO). #3. Edit "user configurable" stuff below" #4. Run this script #----------------------------------------------------- #user configurable stuff IPCHAINS="/sbin/ipchains" VIP="192.168.1.110" #services can be represented by their name (in /etc/services) or a number #SERVICES is a quote list of space separated strings # eg SERVICES="telnet" # SERVICES="telnet 80" # SERVICES="telnet http" #Since the service is redirected to the local device, #make sure you have SERVICE listening on 127.0.0.1 # SERVICES="telnet http" # #---------------------------------------------------- #main: #turn on IP forwarding (off by default in 2.2.x kernels) echo "1" > /proc/sys/net/ipv4/ip_forward #flush ipchains table $IPCHAINS -F input #install SERVICES for SERVICE in $SERVICES do { echo "redirecting ${VIP}:${SERVICE} to local:${SERVICE}" $IPCHAINS -A input -j REDIRECT $SERVICE -d $VIP $SERVICE -p tcp } done #list ipchain rules $IPCHAINS -L input #rc.horms---------------------------------------------- |
Here's the conf file for a LVS-DR LVS using TP on both the director and the realservers. This is for a 2.2.x kernel director. (For a 2.4.x director, the VIP device can't be TP - TP doesn't work on a 2.4.x director).
#------------------------------------- #lvs_dr.conf for TP on director and realserver #you will have to add a host route or equivelent on the client/router #so that packets for the VIP are routed to the director LVS_TYPE=VS_DR INITIAL_STATE=on #note director VIP device is TP VIP=TP lvs 255.255.255.255 lvs DIP=eth0 dip 192.168.1.0 255.255.255.0 192.168.1.255 DIRECTOR_DEFAULT_GW=client SERVICE=t telnet rr realserver1 realserver2 #note realserver VIP device is TP SERVER_VIP_DEVICE=TP SERVER_NET_DEVICE=eth0 SERVER_DEFAULT_GW=client #----------end lvs_dr.conf------------------------------------ |
Here's the output from ipchains -L showing the redirects for just the 2.2.x director
Chain input (policy ACCEPT): target prot opt source destination ports REDIRECT tcp ------ anywhere lvs2.mack.net any -> telnet => telnet REDIRECT tcp ------ anywhere lvs2.mack.net any -> telnet => telnet REDIRECT tcp ------ anywhere lvs2.mack.net any -> www => www REDIRECT tcp ------ anywhere lvs2.mack.net any -> www => www Chain forward (policy ACCEPT): Chain output (policy ACCEPT): |
For 2.4.x kernels transparent proxy is built on netfilter and is installed with ip_tables (not ipchains as with 2.2.x kernels).
Note | |
---|---|
You need ip_tables support in the kernel and the ip_tables module must be loaded. The ip_tables module is incompatible with the ipchains module (which in 2.4.x is available for compatibility with scripts written for 2.2.x kernels). If present, the ipchains module must be unloaded. You shouldn't be running ipchains on 2.4.x kernels anymore and you should have changed over to ip_tables. |
Unfortunately the transparent proxy that comes with 2.4 kernels does not work for LVS. The packet arrives locally with the IP of the NIC which accepts the packet, rather than with an unchanged IP (the VIP). This still allows a squid to work, but is useless for LVS. The netfilter people didn't realise that someone (i.e. LVS) had found a use for the original behaviour and it was dropped from the 2.4 code.
Balazs Scheidler bazsi (at) balabit (dot) hu has written a netfilter patch which restores the original functionality of tproxy, for the firewall Zorp (note: no-one has tested it with LVS yet). Here is Balazs' 2.4 transparent proxy patches README. (In previous HOWTO's, I incorrectly attributed the patch to Ratz. My apologies to Balazs. Ratz has written a tproxy patch for LVS as part of his job, but he is not allowed to release the code - it seems I confused the two patches.)
Mike McLean mikem (at) redhat (dot) com 04 Dec 2002
The patch for 2.4 kernels should be shipped by RedHat. If not please file a bug at bugzilla.redhat.com.
If RedHat is patched with Balazs' code, then it is possible that it has been tested with LVS (RedHat doesn't necessarily test their released code).
(Dec 2002). Nearly all the following section is me figuring out that TP for 2.4 doesn't work for LVS. It will have to be rewritten as Balazs's patches are incorporated into LVS. (Mar 2006, seems like noone is using them.)
The command for installing transparent proxy with iptables for 2.4.x came from looking in Daniel Kiracofe's drk (at) unxsoft (dot) com Transparent Proxy with Squid mini-HOWTO and guessing the likely command. It turns out to be
director:# iptables -t nat -A PREROUTING [-i $SERVER_NET_DEVICE] -d $VIP -p tcp \ --dport $SERVICE -j REDIRECT |
(where $SERVICE = telnet, $SERVER_NET_DEVICE = eth0).
Here's the result of installing the VIP by transparent proxy on one of the realservers.
realserver:~# iptables -L -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination REDIRECT tcp -- anywhere lvs2.mack.net tcp dpt:telnet REDIRECT tcp -- anywhere lvs2.mack.net tcp dpt:http Chain POSTROUTING (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination |
This works fine for the realserver allowing it to accept packets for the VIP, without having the VIP on an ethernet device (eg lo, eth0).
With the problems of 2.4 kernel TP for the VIP on the director, people seem to have forgotten that TP will still allow the realserver to accept packets for the VIP, solving the arp problem. Bill Omer rediscovered this a few years later
Bill Omer bill (dot) omer (at) gmail (dot) com 2 Mar 2006
Here's my setup with all the nitty gritty. I'm using rhel3as, I have all of the lvs portions of the kernel compiled as modules.
the director part is straight forward:
ifconfig eth0:0 cvg1-lvs-vip netmask 255.255.255.255 broadcast cvg1-lvs-vip up ipvsadm -A -t cvg1-lvs-vip:0 -s wlc -p ipvsadm -a -t cvg1-lvs-vip:0 -r cvg1-app-101 -g ipvsadm -a -t cvg1-lvs-vip:0 -r cvg1-app-102 -g ipvsadm -a -t cvg1-lvs-vip:0 -r cvg1-app-103 -g ipvsadm -a -t cvg1-lvs-vip:0 -r cvg1-app-104 -g ipvsadm -a -t cvg1-lvs-vip:0 -r cvg1-app-105 -gThe realserver(s)
RIP: iptables -t nat -F iptables -t nat -A PREROUTING -d cvg1-lvs-vip -p tcp --dport 0:65535 -j REDIRECT echo 1 > /proc/sys/net/ipv4/ip_forwardAs far as ports 0:65535 goes, I know its a security risk. It's as secure as the RIP's them self. I plan on having about 30-40 thin clients book up over the network (PXE, which I'd like to in time be lvs'd) to an xdm. After I get some stress testing done and pin point some more bugs here and there, I'll narrow down the port range to be a lil more complaint to rudimentary security measures. However, everything is being ran over a local lan and nothing is exposed to the wild wild web.
If you do the same with TP on the director, setup for an LVS with (say) telnet forwarded in the ipvsadm tables, then the telnet connect request from the client is accepted by the director, rather than forwarded by ipvs to the realservers (tcpdump sees a normal telnet login to the director). Apparently ipchains is sending the packets to a place that ipvs can't get at them.
Joe
I have got TP to work on a LVS-DR telnet 2.4 realserver with the command
#iptables -t nat -A PREROUTING -p tcp -d $VIP --dport telnet -j REDIRECTWhen I put the VIP onto the director this way, the LVS doesn't work. I connect to the director instead of the realservers. ipvsadm doesn't show any connections (active or otherwise)
If I run the same command on the director, with ipvsadm blank (ie no LVS configured), then I connect to the director from the client (as expected) getting the director's telnet login.
I presume that I'm coming in at the wrong place in the input chain of the director and ipvsadm is not seeing the packets?
Julian
I haven't tried tproxy in 2.4 but in theory it can't work. The problem is that netfilter implements tproxy by mangling the destination address in the prerouting. LVS requires from the tproxy implementation only to deliver the packet locally and not to alter the header. So, I assume LVS detects the packets with daddr=local_addr and refuses to work.
Netfilter maintains a sockopt SO_ORIGINAL_DST that can be used from the user processes to obtain the original dest addr/port before they are mangled in the pre routing nat place. This can be used from the squids, for example, to obtain these original values.
If LVS wants to support this broken tproxy in netfilter we must make a lookup in netfilter to receive the original dst and then again to mangle (for 2nd time) the dst addr/port. IMO, this is very bad and requires LVS always to require netfilter nat because it will always depend on netfilter: LVS will be compiled to call netfilter functions from its modules.
So, the only alternative remains to receive packets with advanced routing with fwmark rules. There is one problem in 2.2 and 2.4 when the tproxy setups must return ICMP to the clients (they are internal in such setup), for example, when there is no realserver LVS returns ICMP DEST_UNREACH:PORT_UNREACH. In this case both kernels mute and don't return the ICMP. icmp_send() drops it. I contacted Alexey Kuznetsov, the net maintainer, but he claims there are more such places that must be fixed and "ip route add table 100 local 0/0 dev lo" is not a good command to use. But in my tests I don't have any problems, only the problem with dropped ICMP replies from the director.
So, for TP, I'm not sure if we can support it in the director. May be it can work for the realservers and even when the packet is mangled I don't expect peformance problems but who knows.
These experiments were conducted with 2.2 or 2.4 kernel realservers accepting packets for the VIP by TP. I initially noticed that the connection to 2.4 realservers was not delayed by identd (which is running on my realservers). What was happening was that the realserver was accepting the packet at the RIP and generating the reply from the RIP, rather than the VIP. On my setup, the RIP is routable to the client and the client probably received the identd request directly from the realserver (I didn't figure out what was going on for a while after I did this. I originally thought this had something to do with identd).
Here's the data showing that TP behaves differently for 2.2 and 2.4 kernels. If you want to skip ahead, the piece of information you need is that the IP of the packet when it arrives on the target machine by TP, is different for 2.2 and 2.4 TP.
As we shall see, for 2.2.x the TP'ed packets arrive on the VIP, while for 2.4.x, the TP'ed packets arrive on the RIP.
Here's the tcpdump on the realserver (RS2) for a telnet request delayed by authd (the normal result for LVS). Realserver 2.4.2 with Julian's hidden patch, director 0.2.5-2.4.1. The VIP on the realserver is on lo:110.
Note: all packets on the realserver are originating and arriving on the VIP (lvs2) as expected for a LVS-DR LVS.
initial telnet request 21:04:46.602568 client2.1174 > lvs2.telnet: S 461063207:461063207(0) win 32120 <mss 1460,sackOK,timestamp 17832675[|tcp]> (DF) [tos 0x10] 21:04:46.611841 lvs2.telnet > client2.1174: S 3724125196:3724125196(0) ack 461063208 win 5792 <mss 1460,sackOK,timestamp 514409[|tcp]> (DF) 21:04:46.612272 client2.1174 > lvs2.telnet: . ack 1 win 32120 <nop,nop,timestamp 17832676 514409> (DF) [tos 0x10] 21:04:46.613965 client2.1174 > lvs2.telnet: P 1:28(27) ack 1 win 32120 <nop,nop,timestamp 17832676 514409> (DF) [tos 0x10] 21:04:46.614225 lvs2.telnet > client2.1174: . ack 28 win 5792 <nop,nop,timestamp 514409 17832676> (DF) realserver makes authd request to client 21:04:46.651500 lvs2.1061 > client2.auth: S 3738365114:3738365114(0) win 5840 <mss 1460,sackOK,timestamp 514413[|tcp]> (DF) 21:04:49.651162 lvs2.1061 > client2.auth: S 3738365114:3738365114(0) win 5840 <mss 1460,sackOK,timestamp 514713[|tcp]> (DF) 21:04:55.651924 lvs2.1061 > client2.auth: S 3738365114:3738365114(0) win 5840 <mss 1460,sackOK,timestamp 515313[|tcp]> (DF) after delay of 10secs, telnet request continues 21:04:56.687334 lvs2.telnet > client2.1174: P 1:13(12) ack 28 win 5792 <nop,nop,timestamp 515416 17832676> (DF) 21:04:56.687796 client2.1174 > lvs2.telnet: . ack 13 win 32120 <nop,nop,timestamp 17833684 515416> (DF) [tos 0x10] |
Here's the tcpdump on the realserver (RS2) for a telnet request which connects immediately. This is not the normal result for LVS. Realserver 2.4.2 with Julian's hidden patch (not used), director 0.2.5-2.4.1. Packets on the VIP are being accepted by TP rather than on lo:0 (the only difference).
Note: some packets on the realserver (RS2) are arriving and originating on the VIP (lvs2) and some on the RIP (RS2). In particular all telnet packets from the CIP are arriving on the RIP, while all telnet packets from the realserver are originating on the VIP. For authd, all packets to and from the realserver are using the RIP.
initial telnet request 20:56:43.638602 client2.1169 > RS2.telnet: S 4245054245:4245054245(0) win 32120 <mss 1460,sackOK,timestamp 17784379[|tcp]> (DF) [tos 0x10] 20:56:43.639209 lvs2.telnet > client2.1169: S 3234171121:3234171121(0) ack 4245054246 win 5792 <mss 1460,sackOK,timestamp 466118[|tcp]> (DF) 20:56:43.639654 client2.1169 > RS2.telnet: . ack 3234171122 win 32120 <nop,nop,timestamp 17784380 466118> (DF) [tos 0x10] 20:56:43.641370 client2.1169 > RS2.telnet: P 0:27(27) ack 1 win 32120 <nop,nop,timestamp 17784380 466118> (DF) [tos 0x10] 20:56:43.641740 lvs2.telnet > client2.1169: . ack 28 win 5792 <nop,nop,timestamp 466118 17784380> (DF) realserver makes authd request to client 20:56:43.690523 RS2.1057 > client2.auth: S 3231319041:3231319041(0) win 5840 <mss 1460,sackOK,timestamp 466123[|tcp]> (DF) 20:56:43.690785 client2.auth > RS2.1057: S 4243940839:4243940839(0) ack 3231319042 win 32120 <mss 1460,sackOK,timestamp 17784385[|tcp]> (DF) 20:56:43.691125 RS2.1057 > client2.auth: . ack 1 win 5840 <nop,nop,timestamp 466123 17784385> (DF) 20:56:43.692638 RS2.1057 > client2.auth: P 1:10(9) ack 1 win 5840 <nop,nop,timestamp 466123 17784385> (DF) 20:56:43.692904 client2.auth > RS2.1057: . ack 10 win 32120 <nop,nop,timestamp 17784385 466123> (DF) 20:56:43.797085 client2.auth > RS2.1057: P 1:30(29) ack 10 win 32120 <nop,nop,timestamp 17784395 466123> (DF) 20:56:43.797453 client2.auth > RS2.1057: F 30:30(0) ack 10 win 32120 <nop,nop,timestamp 17784395 466123> (DF) 20:56:43.798336 RS2.1057 > client2.auth: . ack 30 win 5840 <nop,nop,timestamp 466134 17784395> (DF) 20:56:43.799519 RS2.1057 > client2.auth: F 10:10(0) ack 31 win 5840 <nop,nop,timestamp 466134 17784395> (DF) 20:56:43.799738 client2.auth > RS2.1057: . ack 11 win 32120 <nop,nop,timestamp 17784396 466134> (DF) telnet connect continues, no delay 20:56:43.835153 lvs2.telnet > client2.1169: P 1:13(12) ack 28 win 5792 <nop,nop,timestamp 466137 17784380> (DF) 20:56:43.835587 client2.1169 > RS2.telnet: . ack 13 win 32120 <nop,nop,timestamp 17784399 466137> (DF) [tos 0x10] |
Evidently TP on the realserver is making the realserver think that the packets arrived on the RIP, hence the authd call is made from the RIP.
As it happens in my test setup, the client can connect directly to the RIP. (In a LVS-DR LVS, the client doesn't exchange packets with the RIP, so I haven't blocked this connection. In production, the router would not allow these packets to pass). Since the authd packets are between the RIP and CIP, the authd exchange can proceed to completion.
Here's the tcpdump on the realserver (RS2) for a telnet request which connects immediately. This is not the normal result for LVS. Realserver 2.2.14, director 0.2.5-2.4.1. Packets on the VIP are being accepted by TP rather than on lo:0.
Note: TP is different in 2.2 and 2.4 kernels. Unlike the case for the 2.4.2 realserver, the packets all arrive at the RIP.
initial telnet request 22:16:23.407607 client2.1177 > lvs2.telnet: S 707028448:707028448(0) win 32120 <mss 1460,sackOK,timestamp 18262396[|tcp]> (DF) [tos 0x10] 22:16:23.407955 lvs2.telnet > client2.1177: S 3961823491:3961823491(0) ack 707028449 win 32120 <mss 1460,sackOK,timestamp 21648[|tcp]> (DF) 22:16:23.408385 client2.1177 > lvs2.telnet: . ack 1 win 32120 <nop,nop,timestamp 18262396 21648> (DF) [tos 0x10] 22:16:23.410096 client2.1177 > lvs2.telnet: P 1:28(27) ack 1 win 32120 <nop,nop,timestamp 18262396 21648> (DF) [tos 0x10] 22:16:23.410343 lvs2.telnet > client2.1177: . ack 28 win 32120 <nop,nop,timestamp 21648 18262396> (DF) authd request from realserver 22:16:23.446286 lvs2.1028 > client2.auth: S 3966896438:3966896438(0) win 32120 <mss 1460,sackOK,timestamp 21652[|tcp]> (DF) 22:16:26.445701 lvs2.1028 > client2.auth: S 3966896438:3966896438(0) win 32120 <mss 1460,sackOK,timestamp 21952[|tcp]> (DF) 22:16:32.446212 lvs2.1028 > client2.auth: S 3966896438:3966896438(0) win 32120 <mss 1460,sackOK,timestamp 22552[|tcp]> (DF) after delay of 10secs, telnet proceeds 22:16:33.481936 lvs2.telnet > client2.1177: P 1:13(12) ack 28 win 32120 <nop,nop,timestamp 22655 18262396> (DF) 22:16:33.482414 client2.1177 > lvs2.telnet: . ack 13 win 32120 <nop,nop,timestamp 18263404 22655> (DF) [tos 0x10] |
Note: for TP, there is no VIP on the realservers as seen by ifconfig.
Since telnetd on the realservers listens on 0.0.0.0, we can't tell which IP the packets have on the realserver after being TP'ed. tcpdump only tells you the src_addr after the packets have left the sending host.
Here's the setup for the test.
The IP of the packets after arriving by TP was tested by varying the IP (localhost, RIP or VIP) that the httpd listens to on the realservers. At the same time the base address of the web page was changed to be the same as the IP that the httpd was listening to. The nodes on each network link can route to and ping each other (eg 192.168.1.254 and 192.168.1.12).
____________ | |192.168.1.254 (eth1) | client |---------------------- |____________| | CIP=192.168.2.254 (eth0) | | | | | VIP=192.168.2.110 (eth0) | ____________ | | | | | director | | |____________| | DIP=192.168.1.9 (eth1, arps) | | | (switch)------------------------ | RIP=192.168.1.12 (eth0) VIP=192.168.2.110 (LVS-DR, lo:0, hidden) _____________ | | | realserver | |_____________| |
The results (LVS-DR LVS) are
For 2.2.x realservers
For 2.4.x realservers
During tests, the browser says "connecting to VIP", then says "transferring from..."
Some of these connections are problematic. The client in a LVS-DR LVS isn't supposed to be getting packets from the RIP. What is happening is
The way to prevent this is to remove the route on the client to the RIP network (eg see removing routes not needed for LVS-DR). Doing so when the httpd is listening to the RIP and the base address is the RIP causes the browser on the client to hang. This shows that the client is really retrieving packets directly from the RIP. Changing the base address of the webpage back to the VIP allows the webpage to be delivered to the client, showing that the client is now retrieving packets by making requests to the VIP via the director.
It would seem then that with 2.4 TP, the realserver is receiving packets on the RIP, rather than the VIP as it does with 2.2 TP. With a service listening to only 1 port (eg httpd) then the httpd has to
The client will then ask for the webpage at the VIP. The realserver will accept this request on the RIP and return a webpage full of references to the VIP (eg gifs). The client will then ask for the gifs from the VIP. The realserver will accept the requests on the RIP and return the gifs.
Since the identd request is coming from the RIP (rather than the VIP) on the realserver, you can use Julian's method for NAT'ing client requests from realservers.
Using transparent proxy instead of a regular ethernet device has slightly higher latency, but the same maximum throughput.
For performance of transparent proxy compared to accepting packets on an ethernet device see the performance page.
Transparent proxy requires reprocessing of incoming packets, and could have a similar speed penalty as LVS-NAT. However only the incoming packets are reprocessed. Initial results (before the performance tests above) were initially not encouraging.
Doug Bagley doug (at) deja (dot) com
Subject: [lvs-users] chosen arp problem solution can apparently affect performance
I was interested in seeing if the linux/ipchains workaround for the arp problem would perform just as well as the arp_invisible kernel patch. It is apparently much worse.
I ran a test with one client running ab ("apache benchmark"), one director, and one realserver running Apache. They are all various levels of pentium desktop machines running 2.2.13.
Using the arp_invisible patch/dummy0 interface, I get 226 HTTP requests/second. Using the ipchains redirect method, I get 70 requests per second. All other things remained the same during the test.
See the performance page for discussion and sample graphs of hits/sec for http servers. Hits/sec can increase to high levels as the payload decreases in size. While large numbers for hits/sec may be impressive, they only indicate one aspect of a web server's performance. If large (> 1 packet) files are transferred/hit or computation is involved, then hits/sec is not a useful measure of web performance.
Here's the current explanation for decreased latency of transparent proxy.
Kyle Sparger ksparger (at) dialtoneinternet (dot) net
Logically, it's just a function of the way the redirect code operates.
Without redirect: Ethernet -> TCP/IP -> Application -> TCP/IP -> Ethernet With redirect: Ethernet -> TCP/IP -> Firewall/Redirect Code -> TCP/IP -> Application -> TCP/IP -> EthernetThat would definitely explain the slowdown, since _every single packet_ received is going to go through these extra steps.
Other people are happy with TP
Jerry Glomph Black black (at) real (dot) com Nov 99 (or thereabouts)
The revival of Horms' posting, which I overlooked a month ago, was a lifesaver for us. We had a monster load distribution problem, and spread 4 virtual IP numbers across 10 'real' boxes (running Roxen, a fantastic web platform). The ipchains-REDIRECT feature works perfectly, without any of that arp aggravation! A PII_450 held up just fine at 20 megabits/s of HTTP -REQUEST- TRAFFIC!
Here's Jerry 18 months later.
Jerry Glomph Black black (at) prognet (dot) com 06 Jul 2001
The ipchains/iptables REDIRECT method (introduced to this list by Mr Horms a long time ago) works fine, we've used it in production in the past.
However, at -very- high packet loads it is far less CPU-efficient than getting the ARP settings correctly working. The REDIRECT method was bogging down our LVS boxes during peak traffic, something which does not happen with doing it the 'right way' with LVS-DR and silent arp-less interfaces on the real servers.
Horms
REDIRECT works by changing the destination IP address to a local address so that it ends up in the LOCAL_IN chain.
Note | |
---|---|
REDIRECT with 2.2 kernels was the original basis for "Horm's method" |
Joe Oct 03, 2003
is the original 2.4.x REDIRECT disaster (see TP_2.4_problems) fixed now?
TPROXY looks like it would work because it is completely different from REDIRECT and uses its own connection tracking. REDIRECT uses netfilter's internal connection tracking routines. Because of the way that LVS is implemented, these do not work for packets that are handled by LVS. Thus the connection tracking for REDIRECT does not work. Thus the return packets from the realservers are not modified and the connection fails. From my reading TPROXY uses its own connection tracking routines (though for what reason I am not sure). These routines probably aren't effected by LVS and thus TPROXY should work.
N.B: I have not verified this.