These topics were too short or not central enough to LVS operation to have their own section.
Multiple VIPs (and their associated services) can co-exist independantly on an LVS. On the director, add the extra IPs to a device facing the internet. On the realservers, for LVS-DR||VS-Tun, add the VIPs to a device and setup services listening to the ports. On the realservers, for LVS-NAT, add the extra services to the RIP.
Keith Rowland wrote:
Can I use Virtual Server to host multiple domains on the cluster? Can VS be setup to respond to multiple 10-20 different IP addresses and use the clusters to reposnd to any one of them with the proper web directory.
James CE Johnson jjohnson (at) mobsec (dot) com
If I understand the question correctly, then the answer is yes :-) I have one system that has two IP addresses and responds to two names:
foo.mydomain.com A.B.C.foo eth1 bar.mydomain.com A.B.C.bar eth1:0 |
On that system (kernel 2.0.36 BTW) I have LVS setup as:
ippfvsadm -A -t A.B.C.foo:80 -R 192.168.42.50:80 ippfvsadm -A -t A.B.C.bar:80 -R 192.168.42.100:80 |
To make matters even more confusing, 192.168.42.(50|100) are actually one system where eth0 is 192.168.42.100 and eth0:0 is 192.168.42.50. We'll call that 'node'.
Apache on 'node' is setup to serve foo.mydomain.com on ...100 and bar.mydomain.com on ...50.
It took me a while to sort it out but it all works quite nicely. I can easily move bar.mydomain.com to another node within the cluster by simply changing the ippfvsadm setup on the externally addressable node.
Tao Zhao 6 Nov 2001
what if I need multiple VIPs on the realserver?
Julian Anastasov ja (at) ssi (dot) bg 06 Nov 2001
for i in 180 182 182 do ip addr add X.Y.Z.$i dev dummy0 done |
There is also an example for setting up multiple VIPs on HA.
Ratz ratz (at) tac (dot) ch
We're going to set up a LVS cluster from scratch. you need
The goal is to set up an loadbalanced tcp application. The application will consist of a own written shell script being invoked by inetd. As you might have guessed, security is very low priority, you should get the idea behind this. Of course I should take xinetd and of course I should use a tcpwrapper and maybe even SecurID authentication but here the goal is to understand the fundamental design principals of a LVS cluster and its deploy. All instructions will be done as root.
Setting up the realserver
Edit /etc/inetd.conf and add following line: lvs-test stream tcp nowait root /usr/bin/lvs-info lvs-info Edit /etc/services and add following line: lvs-test 31337/tcp # supersecure lvs-test port |
Now you need to get inetd running. This is different for every Unix. So please have a look at it yourself. You verify if it's running with 'ps ax|grep [i]netd' And to verify if it really runs this port you do a 'netstat -an|grep LISTEN' and if there is a line:
tcp 0 0 0.0.0.0:31337 0.0.0.0:* LISTEN |
you're one step closer to the truth. Now we have to supply the script that will be called if you connect to realserver# port 31337. So simply do this on your command line (copy 'n' paste):
cat > /usr/bin/lvs-info << EOF && chmod 755 /usr/bin/lvs-info #!/bin/sh echo "This is a test of machine `ifconfig -a | grep HWaddr | awk '{print $1}'`" echo EOF |
Now you can test if it really works with telnet or phatcat:
telnet localhost 31337 phatcat localhost 31337 |
This should spill out something like:
hog:/ # phatcat localhost 31337 This is a test of machine 192.168.1.11 hog:/ # |
If it worked, do the same procedure to set up the second realserver. Now we're ready to set up the load balancer. These are the required commands to set it up for our example:
director:/etc/lvs# ipvsadm -A -t 192.168.1.100:31337 -s wrr director:/etc/lvs# ipvsadm -a -t 192.168.1.100:31337 -r 192.168.1.11 -g -w 1 director:/etc/lvs# ipvsadm -a -t 192.168.1.100:31337 -r 192.168.1.12 -g -w 1 |
Check it with ipvsadm -L -n:
hog:~ # ipvsadm -L -n IP Virtual Server version 0.9.14 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.100:31337 wrr -> 192.168.1.12:31337 Route 1 0 0 -> 192.168.1.11:31337 Route 1 0 0 hog:~ # |
Now if you connect from outside with the client node to the VIP=192.168.1.100 you should get to one of the two realserver (presumably to ~.12) Reconnect to the VIP again an you should get to the other realserver. If so, be happy, if not go back, check netstat -an, ifconfig -a, arp-problem, routing tables and so on ...
I want to use virtual server functionality to allow switching over from one pool of server processes to another without an interruption in service to clients.
Michael Sparks sparks (at) mcc (dot) ac (dot) uk
current realservers : A,B,C servers to swap into the system instead D,E,F
Joe
A planned feature for ipvsadm will be to give a realserver a weight of 0 (now implemented). This realserver will not be sent any new connections and will continue serving its current connections till they close. You may have to wait a while if a user is downloading a 40M file from the realserver.
"Duran, Richard" RDuran (at) dallasairmotive (dot) com 01 Oct 2004
Is it possible to take a realserver offline in such a way that existing connections are immediately redirected to another realserver? We've had the need to do this before and don't know what else to do beyond either (1) setting the weight to "0" and iterating through a process of disconnecting "inactive" users/sessions and hoping that they don't reconnect within the 5 minute persistence_timeout, or (2) removing the host-specific entry from keepalived.conf (brutally disconnecting everyone).
Joe
If you're talking about transferring an existing tcpip connection: no
Malcolm Turnbull
A brutal disconnect is the usual way to go. ldirectord handles it cleanly
Jacob Smullyan 2006-02-13
It is a tribute to lvs that I've forgotten most of what I once knew about it, because I set up LVS-DR with keepalived about three years ago and it has run without a hiccup ever since. As a result, I'm rusty, so forgive me if this is a stupid or frequently asked question.
How should I go about taking a server temporarily out of rotation? Since I am using keepalived, I know I can simply turn off the service it depends on -- but in fact I want to replace that service with a new application on the same port (which will go live a few minutes/hours later). I am aware of some alternatives:
- configure keepalived's health check to rely on some aspect on the old application, then swap the configuration when I want to go live with the new application.
- temporarily run the new application on a different ip.
- simply comment out that realserver in keepalived.conf temporarily, or add a healthcheck that will never be satisfied.
- directly delete the ipvsadm config for that server (but what about the backup director, and how if at all will keepalived interfere with that?)
- get a job more suitable for a dim-witted person like myself.
But all these are workarounds; what I really want is to tell the director (or keepalived), "retain all configuration, but temporarily drop this realserver until I remove the block". Is there a way to do that?
Graeme Fowler graeme (at) graemef (dot) net 13 Feb 2006
On the director(s), assuming you use eth0 as the interface forwarding the packets to the realservers...
iptables -I OUTPUT -o eth0 -s $VIP -d $RIP -j REJECT |
That'll stop keepalived doing any healthchecks whatsoever on the realserver you need to work on. Simply replace -I with -D when you're done.
The same thing can be achieved by "null" routing the RIP on the director too, but I'll leave that as an exercise :)
Mark msalists (at) gmx (dot) net 15 Feb 2006
I usually do one of the two:
You can use the commandline ipvsadm-command to get a list of all nodes and manipulate (add/remove) nodes. Use it to remove and later add the node again. This will not influence any other nodes of the configuration. If you get totally lost, just restart ldirectord and it will come back up with the regular configuration.
Or, as second option, use the http negotiation mechanism that uses a string comparison of a certain URL against an expected string pattern. Have it check against a dummy html page and put a flag in there that ldirectord checks against to determine if the host is supposed to be in the pool or not. Modify the flag manually to take the node out.
e.g. if you want to test LVS on your BIG Sunserver and how to restore an LVS to a single node server again.
current ftp server: standalone A planned LVS (using LVS-DR): realserver A director Z |
Setup the LVS in the normal way with the director's VIP being a new IP for the network. The IP of the standalone server will now also be the IP for the realserver. You can access the realserver via the VIP while the outside users continue to connect to the original IP of A. When you are happy that the VIP gives the right service, change the DNS IP of your ftp site to the VIP. Over the next 24hrs as the new DNS information is propagated to the outside world, users will change over to the VIP to access the server.
To expand the number of servers (to A, B,...), add another server with duplicated files, add an extra entry into the director's tables with ipvsadm.
To restore - in your DNS, change the IP for the service to the realserver IP. When no-one is accessing the VIP anymore, unplug the director.
You can't shutdown an LVS. However you can stop it forwarding by clearing the ipvsadm table (ipvsadm -C), then allow all connections to expire (check the active connections with ipvsadm) and then remove the ipvs modules (rmmod). Since ip_vs.o requires ip_vs_rr.o etc, you'll have to remove ip_vs_rr.o first.
Do you know how to shutdown LVS? I tried rmmod but it keeps saying that the device is busy.
Kjetil Torgrim Homme kjetilho (at) linpro (dot) no 18 Aug 2001
Run ipvsadm -C. You also need to remove the module(s) for the balancing algorithm(s) before rmmod ip_vs. Run lsmod to see which modules these are.
Roy Walker Roy (dot) Walker (at) GEZWM (dot) com 18 Mar 2002 could not cleanly shutdown his director (LVS 1.0, 2.4.18) which hung at "Send TERM signal". The suggested cure, was to bring down the LVS first (we haven't heard back if it works).
The difference between a beowulf and an LVS:
The Beowulf project has to do with processor clustering over a network -- parallel computing... Basically putting 64 nodes up and running that all are a part of a collective of resources. Like SMP -- but between a whole bunch of machines with a fast ethernet as a backplane.
LVS, however, is about load-balancing on a network. Someone puts up a load balancer in front of a cluster of servers. Each one of those servers is independent and knows nothing about the rest of the servers in the farm. All requests for services go to the load balancer first. That load balancer then distributes requests to each server. Those servers respond as if the request came straight to them in the first place. So -- with the more servers one adds -- the less load goes to each server.
A person might go to a web site that is load balanced, and their requests would be balanced between four different machines. (Or perhaps all of their requests would go to one machine, and the next person's request would go to another machine)
However, a person who used a Beowulf system would actually be using one processing collaborative that was made up of multiple computers...
I know that's not the best explanation of each, and I apologize for that, but I hope it at least starts to make things a little clearer. Both projects could be expanded on to a great extent, but that might just confuse things farther.
(Joe) -
both use several (or a lot of) nodes.
A beowulf is a collection of nodes working on a single computation. The computation is broken into small pieces and passed to a node, which replies with the result. Eventually the whole computation is done. THe beowulf usually has a single user and the computations can run for weeks.
An LVS is a group of machines offering a service to a client. A dispatcher connects the client to a particular server for the request. When the request is completed, the dispatcher removes the connection between the client and server. The next request from the same client may go to a different server but the client cannot tell which server it has connected to. The connection between client and server may only be seconds long
from a posting to the beowulf mailing list by Alan Heirich -
Thomas Sterling and Donald Becker made "Beowulf" a registered service mark with specific requirements for use:
-- Beowulf is a cluster -- the cluster runs Linux -- the O/S and driver software are open source -- the CPU is multiple sourced (currently, Intel and Alpha) |
I assume they did this to prevent profit-hungry vendors from abusing this term; can't you just imagine Micro$oft pushing a "Beowulf" NT-cluster?
(Joe - I looked up the Registered Service Marks on the internet and Beowulf is not one of them.)
(Wensong) Beowulf is for parallel computing, Linux Virtual Server is for scalable network services.
They are quite different now. However, I think they may be unified under "single system image" some day. In the "single system image", every node can see a single system image (the same memory space, the same process space, the same external storage), and the processes/threads can be transparently migrated to other nodes in order to achieve load balance in the cluster. All the processes are checkpointed, they can be restarted in the node or the others if they fails, full fault tolerant can be made here. It will be easy for programmers to code because of single space, they don't need to statically partition jobs to different sites and let them communicate through PVM or MPI. They just need identify the parallelism of his scientific application, and fork the processes or generate threads, because processes/threads will be automatically load balanced on different nodes. For network services, the service daemons just need to fork the processes or generates threads, it is quite simple. I think it needs lots of investigation in how to implement these mechanisms and make the overhead as low as possible.
What Linux Virtual Server has done is very simple, Single IP Address, in which parallel services on different nodes is appeared as a virtual service on a single IP address. The different nodes have their own space, it is far from "single system image". It means that we have a long way to run. :)
Eddie http://www.eddieware.org
(Jacek Kujawa blady (at) cnt (dot) pl) Eddie is a load balancing software, using NAT (only NAT), for webservers, written in language erlang. Eddie include intelligent HTTP gateway and Enhanced DNS.
(Joe) Erlang is a language for writing distrubuted applications.
Shain Miley 4 Jun 2001
any recommendations for Level 5 SCSI RAID?
Matthew S. Crocker matthew (at) crocker (dot) com 04 Jun 2001
I have had very good luck with Mylex. We use the DAC960 which is a bit old now but if the newer stuff works as well as what I have I would highly recommend it. You might also want to think about putting your data on a NAS and seperate your CPU from your harddrives
Don Hinshaw dwh (at) openrecording (dot) com 04 Jun 2001
Mylex work well. I use ICP-Vortex (http://www.icp-vortex.com/index_e.html, link dead Jan 2003) which are supported by the Linux kernel. I've also had good luck with Adaptec 3200s and 3400si.
(this must have been solved, no-one is complaining about memory leaks now :-)
Jerry Glomph Black black (at) real (dot) com
We have successfully used 2.0.36-vs (direct routing method), but it does fail at extremely high loads. Seems like a cumulative effect, after about a billion or so packets forwarded. Some kind of kernel memory leak, I'd guess.
Note | |
---|---|
This is no longer a problem if you use the new Policy Routing. |
(without bringing them all down)
Problem: if down/delete an aliased device (eg eth0:1) you also bring down the other eth0 devices. This means that you can't bring down an alias remotely as you loose your connection (eth0) to that machine. You then have to go the console of the remote machine to fix it by rmmod'ing the device driver for the device and bring it up again.
The configure script handles this for you and will exit (with instructions on what to do next) if it finds that an aliased device needs to be removed by rmmod'ing the module for the NIC.
(I'm not sure that all of the following is accurate, please test yourself first).
(Stephen D. WIlliams sdw (at) lig (dot) net) whenever you want to down/delete an alias, first set its netmask to 255.255.255.255. This avoids also automatically downing aliases that are on the same netmask and are considered 'secondaries' by the kernel.
(Joe) To bring up an aliased device
$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.0
to bring eth0:1 down without taking out eth0, you do it in 2 steps, first change the netmask
$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.255
then down it
$ifconfig eth0:1 192.168.1.10 netmask 255.255.255.255 down
then eth0 device should be unaffected, but the eth0:1 device will be gone.
This works on one of my machines but not on another (both with 2.2.13 kernels). I will have to look into this. Here's the output from the machine for which this procedure doesn't work.
Examples: Starting setup. The realserver's regular IP/24 on eth0, the VIP/32 on eth0:1 and another IP/24 for illustration on eth0:2. Machine is SMP 2.2.13 net-tools 1.49
chuck:~# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:90:27:71:46:B1 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:6071219 errors:0 dropped:0 overruns:0 frame:0 TX packets:6317319 errors:0 dropped:0 overruns:4 carrier:0 collisions:757453 txqueuelen:100 Interrupt:18 Base address:0x6000 eth0:1 Link encap:Ethernet HWaddr 00:90:27:71:46:B1 inet addr:192.168.1.110 Bcast:192.168.1.110 Mask:255.255.255.255 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 Interrupt:18 Base address:0x6000 eth0:2 Link encap:Ethernet HWaddr 00:90:27:71:46:B1 inet addr:192.168.1.240 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 Interrupt:18 Base address:0x6000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:299 errors:0 dropped:0 overruns:0 frame:0 TX packets:299 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 chuck:~# netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.1.110 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0 Deleting eth0:1 with netmask /32 chuck:~# ifconfig eth0:1 192.168.1.110 netmask 255.255.255.255 down chuck:~# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:90:27:71:46:B1 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:6071230 errors:0 dropped:0 overruns:0 frame:0 TX packets:6317335 errors:0 dropped:0 overruns:4 carrier:0 collisions:757453 txqueuelen:100 Interrupt:18 Base address:0x6000 eth0:2 Link encap:Ethernet HWaddr 00:90:27:71:46:B1 inet addr:192.168.1.240 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 Interrupt:18 Base address:0x6000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:299 errors:0 dropped:0 overruns:0 frame:0 TX packets:299 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 If you do the same thing with eth0:2 with the /24 netmask </para><para> chuck:~# ifconfig eth0:2 192.168.1.240 netmask 255.255.255.0 down chuck:~# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:90:27:71:46:B1 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:6071237 errors:0 dropped:0 overruns:0 frame:0 TX packets:6317343 errors:0 dropped:0 overruns:4 carrier:0 collisions:757453 txqueuelen:100 Interrupt:18 Base address:0x6000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3924 Metric:1 RX packets:299 errors:0 dropped:0 overruns:0 frame:0 TX packets:299 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 tunl0 Link encap:IPIP Tunnel HWaddr unspec addr:[NONE SET] Mask:[NONE SET] NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 |
Michael Sparks
It's useful for the director to have 3 IP addresses. One which is the real machines base IP address, one which is the virtual service IP address, and then another virtual IP address for servicing the director. The reason for this is associated with director failover.
Suppose:
X realservers pinging director on real IP A (assume a heartbeat style monitor) serving pages off virtual IP V. (IP A would be in place of hostip above)
Director on IP A fails, backup director (*) on IP B comes online taking over the virtual IP V. By not taking over IP A, IP B can watch for IP A to come back online via the network, rather than via a serial link (etc).
Problem is the realservers are still sending to IP A for the heartbeat code to be valid on IP B, the realservers need to send their pings to IP B instead. IMO the easiest solution is to allocate a we need a "heartbeat"/monitor virtual IP. (this is the vhostip)
This isn't particularly inclusive. We don't pester people for testimonials as we don't want to scare people from posting to the mailing list and we don't want inflated praise. People seem to understand this and don't pester us with their performance data either. The quotes below aren't scientific data, but it is nice to hear. The people who don't like LVS presumably go somewhere else, and we don't hear any complaints from them.
"Daniel Erdös" 2 Feb 2000
How many connections did you really handled? What are your impressions and experiences in "real life"? What are the problems?
Michael Sparks zathras (at) epsilon3 (dot) mcc (dot) ac (dot) uk
Problems - LVS provides a load balancing mechanism, nothing more, nothing less, and does it *extremely* well. If your back end realservers are flakey in anyway, then unless you have monitoring systems in place to take those machines out of service as soon as there are problems with those servers, then users will experience glitches in service.
NB, this is essentially a realserver stability issue, not an LVS issue - you'd need good monitoring in place anyway if you weren't using LVS!
Another plus in LVS's favour in something like this over the commercial boxes, is the fact that the load balancer is a Unix type box - meaning your monitoring can be as complex or simple as you like. For example load balancing based on wlc could be supplemented by server info sent to the director.
Drew Streib ds (at) varesearch (dot) com 23 Mar 2000
I can vouch for all sorts of good performance from lvs. I've had single processor boxes handle thousands of simultaneous connections without problems, and yes, the 50,000 connections per second number from the VA cluster is true.
lvs powers SourceForge.net, Linux.com, Themes.org, and VALinux.com. SourceForge uses a single lvs server to support 22 machines, multiple types of load balancing, and an average 25Mbit/sec traffic. With 60Mbit/sec of traffic flowing through the director (and more than 1000 concurrent connections), the box was having no problems whatsoever, and in fact was using very little cpu.
Using DR mode, I've sent request traffic to an director box resulting in near gigabit traffic from the realservers. (Request traffic was on the order of 40Mbit.)
I can say without a doubt that lvs toasts F5/BigIP solutions, at least in our real world implementations. I wouldn't trade a good lvs box for a Cisco Local Director either.
The 50,000 figure is unsubstantiated and was _not_ claimed by anyone at VA Linux Systems. A cluster with 16 apache servers and 2 LVS servers in a was configured for Linux World New York but due to interconnect problems the performance was never measured - we weren't happy with the throughput of the NICs so there didn't seem to be a lot of point. This problem has been resolved and there should be an opportunity to test this again soon.
In recent tests, I've taken multinode clusters to tens of thousands of connections per second. Sorry for any confusion here. The exact 50,000 number from LWCE NY is unsubstantiated.
Jerry Glomph Black black (at) real (dot) com 23 Mar 2000
We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine.
Now, at more terrestrial, but quite high real-world loads, the systems run just fine, for months on end. (using the weighted-least-connection algorithm, usually).
We tried virtually all of the commercial load balancers, LVS beats them all for reliability, cost, manageability, you-name-it.
Noma wrote Nov 2000
Are you going to implement TLS(Transport Layer Security) Ver1.0 on LVS?
Wensong
I haven't read the TLS protocol, so don't know if the TLS transmits IP address and/or port number in payload. In most cases, it should not, because SSL doesn't.
If it doesn't, you can use either of three VS/NAT, LVS-Tun and LVS-DR methods. If it does, LVS-Tun and LVS-DR can still work.
Ted Pavlic tpavlic (at) netwalk (dot) com, Nov 2000
I don't see any reason why LVS would have any bearing on TLS. As far as LVS was concerned, TLS connections would just be like any other connections.
Perhaps you are referring to HTTPS over TLS? Such a protocol has not been completed yet in general, and when it does it still will not need any extra work to be done in the LVS code.
The whole point of TLS is that one connects to the same port as usual and then "upgrades" to a higher level of security on that port. All the secure logic happens at a level so high that LVS wouldn't even notice a change. Things would still work as usual.
Julian Anastasov ja (at) ssi (dot) bg
This is an end-to-end protocol layered on another transport protocol. I'm not a TLS expert but as I understand TLS 1.0 is handled just like the SSL 3.0 and 2.0 are handled, i.e. they require only a support for persistent connections.
Mark Miller markm (at) cravetechnology (dot) com 09 May 2001
We want a configuration where two Solaris based web servers will be setup in a primary and secondary configuration. Rather than load balancing between the two we really want the secondary to act as a hot spare for the primary.
Here is a quick diagram to help illustrate this question:
Internet LD1,LD2 - Linux 2.4 kernel | RS1,RS2 - Solaris Router | -------+------- | | ----- ----- |LD1| |LD2| ----- ----- | | -------+------- | Switch | --------------- | | ----- ----- |RS1| |RS1| ----- -----
Paul Baker pbaker (at) where2getit (dot) com 09 May 2001
Just use heartbeat on the two firewall machines and heartbeat on the two Solaris machines.
Horms horms (at) vergenet (dot) net 09 May 2001
You can either add and remove servers from the virtual service (using ipvsadm) or toggle the weights of the servers from zero to non-zero values.
Alexandre Cassen alexandre (dot) cassen (at) wanadoo (dot) fr 10 May 2001
For your 2 LDs you need to run a Hot standby protocol. Hearthbeat can be used, you can also use vrrp or hsrp. I am actually working on the IPSEC AH implementation for vrrp. That kind of protocol can be usefull because your LD backup server can be used even if it is in backup state (you simply create 2 LDs VIP and set default gateway of your serveur pool half on LD1 and half on LD2).
For your webserver hot-spare needs, you can use the next keepalived [17] in which there will be "sorry server" facility. This mean exactly what you need => You have a RS server pool, if all the server of this RS server pool are down then the sorry server is placed into the ipvsadm table automaticaly. If you use keepalived keep in mind that you will use NAT topology.
Joe 11 May 2001
Unless there's something else going on that I don't know about, I expect this isn't a great idea. The hot spare is going to degrade (depreciate, disk wear out - although not quite as fast, software need upgrading) just as fast idle as doing work.
You may as well have both working all the time and for the few hours of down time a year that you'll need for planned maintenance, you can make do with one machine. If you only need the capacity of 1 machine, then you can use two smaller machines instead.
Since an LVS obeys unix client/server semantics, an LVS can replace a realserver (at least in principle, no-one has done this yet). Each LVS layer could have its own forwarding method, independantly of the other LVSs. The LVS of LVSs would look like this, with realserver_3 being in fact the director of another LVS and having no services running on it.
________ | | | client | |________| | | (router) | | | ____________ | DIP | | |------| director_1 | | VIP |____________| | | | ------------------------------------ | | | | | | RIP1, VIP RIP2, VIP RIP3, VIP ______________ ______________ _____________ | | | | | | | realserver1 | | realserver2 | | realserver3 | | | | | | =director_2 | |______________| |______________| |_____________| | | ------------------------------------ | | | | | | RIP4, VIP RIP5, VIP RIP6, VIP ______________ ______________ ______________ | | | | | | | realserver4 | | realserver5 | | realserver6 | | | | | | | |______________| |______________| |______________| |
If all realservers were offering http and only realservers1..4 were offering ftp, then you would (presumably) setup the directors with the following weights for each service:
director_1: realserver1 http,ftp=1; realserver2 http,ftp=1;realserver3 http=3,ftp=1
director_2: realserver4 http,ftp=1; realserver5 http=1 (no ftp);realserver3 http=1 (no ftp)
You might want to do this if realservers4..6 were on a different network (i.e. geographically remote). In this case director_1 would be forwarding by LVS-Tun, while director_2 could use any forwarding method.
This is the sort of ideas we were having in the early days. It turns out that not many people are using LVS-Tun, most people are using Linux realservers, and not many people are using geographically distributed LVSs.
Joe, Jun 99
For the forseeable future many of the servers who could benefit from the LVS will be microsoft or Solaris. The problem is that they don't have tunneling. A solution would be to have a linux box in front of each realserver on the link from the director to the realserver. The linux box appears to be the server to the director (it has the real IP eg 192.168.1.2) but does not have the VIP (eg 192.168.1.110). The linux box decapsulates the packet from the director and now has a packet from the client to the VIP. Can the linux box route this packet to the realserver (presumably to an lo device on the realserver)?
The linux box could be a diskless 486 machine booting off a floppy with a patched kernel, like the machines in the Linux router project.
Wensong 29 Jun 1999
We can use nested (hyprid) LinuxDirector approach. For example,
LVS-Tun ----> LVS-NAT ----> RealServer1 | | ... | -----> RealServer2 | | .... | | --------> LVS-NAT ....Real Servers can run any OS. A LVS-NAT load balancer usually can schedule over 10 general servers. And, these LVS-NATs can be geographically distributed.
By the way, LinuxDirector in kernel 2.2 can use LVS-NAT, VS-TUN and LVS-DR together for servers in a single configuration.
Kyle Sparger ksparger (at) dialtoneinternet (dot) net 18 Sep 2001
I'm familiar with the s/390; the zSeries 900 will be similar, but on a 'next-gen' scale -- It's 64-bit and I expect 2-3 times the maximum capacity.
The s/390 is ONLY, at most, a 12-way machine in a single frame, 24-way in a two-frame configuration. The CPU's are not super-powered; they're normal CPU's, so imagine a normal 12-24 way, and you have a good idea. It does have special crypto-processors built in, if you can find a way to use them.
The s/390, however, has an obnoxiously fast bus -- 24GByte/s. Yes, I did mean gigabytes. Also, I/O takes up almost no CPU time, as the machines have sub-processors to take care of it.
The s/390 is a 31bit machine -- yes, 31. One bit defines whether the code is 16 or 31 bit code. The z/900 is a 64-bit machine. Note that the s/390, afaik, suffers when attempting to access memory over a certain amount, like any 31/32 bit machine would -- 2 gigs can be addressed in a single clock cycle; greater than that takes longer to process, since it requires more than 32 bits to address.
From top to bottom, the entire machine is redundant. There is no single point of failure anywhere in the machine. According to IBM's docs, the MTBF is 30 years. It calls IBM when it's broken, and they come out and fix it. The refrigerator ad was no joke ;) Of course, this doesn't protect you from power outages, but interestingly enough, if I recall correctly, all RAM is either SRAM, or battery backed -- the machine will come back up and continue right where it left off when it lost power. No restarting instances or apps required. No data lost.
There are five premises for the cost-savings:
You don't have to design a redundant system -- it's already built in.
One machine is easier to manage than n number servers.
One machine uses less facilities than n number servers.
A single machine, split many ways, can result in higher utilization.
Linux, Linux, Linux. All the free software you can shake a stick at.
On the flip-side, there are some constraints:
If you have 500 servers, all at 80% CPU usage, there's no way you're going to cram them all onto the mainframe. Part of the premise is that most servers sit at only a fraction of their maximum capacity.
The software must be architecture compatible.
Mainframe administrators and programmers are rare and expensive.
The ideal situation for an s/390 or z/Series is an application which is not very CPU intensive, but is highly I/O intensive, that must _NEVER_ go down. Could that be why many companies do databases on them? Think airline ticketing systems, financial systems, inventory, etc :) Realize, however, that your cost of entry is probably going to be well over a million dollars, unless you want a crippled entry-level box. You probably don't want to buy this server to run your web site. You probably want to buy it to run your database. That being said, if you happen to order more than you really need -- a reasonably common phenomenon in IT shops -- you can now run Linux instances with that extra capacity. :)
Can I load both the ipvs code and the failover code in a single stand alone machine?
Joe 09 Jul 2001
VMWare?
Henrik Nordstrom hno (at) marasystems (dot) com
user-mode-linux works beautifully for simulating a network of Linux boxes on a single CPU. Use it extensively when hacking on netfilter/iptables, or when testing our patches on new kernels and/or ipvs versions. Also has the added benefit that you can run the kernel under full control of gdb, which greatly simplifies tracking kernel bugs down if you get down to kernel hacking.
Joe
I attended a talk by the UML author at OLS 2001. It's pretty smart software. You can have virtual CPUs, NICs... - you can have a virtual 64-way SMP machine running on your 75MHz pentium I. The performance will be terrible, but you can at least test your application on it.
Apparently linux running under VMWare doesn't keep time.
Todd Lyons tlyons (at) ivenue (dot) com 16 Nov 2005
ntp under vmware causes major problems. The longer it runs, the more it "compensates" for things. It jumps back and forth, further and further each time, until after a few days, it is jumping back and forth *hours*, wreaking all kind of havoc on a linux system that's using nfs or samba. I've seen a vmware system that exhibited this with both RedHat and Gentoo. The original poster is correct to be using ntpdate instead of ntp daemon. It's the only way to keep the time reasonably close. Personally, I'd tell him to do it more often, such as:
* * * * * /usr/sbin/ntpdate time.server.com >/dev/null 2>&1 |
substitute your own internal time server for "time.server.com".
Sebastiaan Veldhuisen seppo (at) omaclan (dot) nl 16 Nov 2005
This has nothing to do with LVS and/ or heartbeat. I guess you are running a Linux guest within a Linux host vmware server (or ESX)? If so, there are known problems with clock fluctiations in guests VM's. We run our Development servers on VMWare ESX and GSX and had large clock fluctuations. The VMWare TID's weren't directly much helpfull in solving the problem.
How we fixed it:
On the linux guest machine:
-This should fix your problem (run ntpd on both host and guest OS, no vmware-tools)
More info on this issue (not appropriate fix though):
http://www.vmware.com/support/kb/enduser/stdadp.php?pfaqid=1339 http://www.vmware.com/support/kb/enduser/stdadp.php?pfaqid=1420 https://www.vmware.com/community/thread.jspa?forumID21&threadID13498&messageID=138110#138110 https://www.vmware.com/community/thread.jspa?forumID21&threadID16921&messageID=185408#185408 http://www.vmware.com/support/kb/enduser/stdadp.php?pfaqid=1518 |
Bunpot Thanaboonsombut bunpotth (at) gmail (dot) com 18 Nov 2005
VMware KB is erroneous. Add "clock = pit" in the same line of "kernel" in grub.conf like this
kernel /vmlinuz-2.6.9-22.EL ro root=/LABEL=/ rhgb quiet clock=pit |
The LVS worked for a client connected directly to the director, but not from a client on the internet.
Carlos J. Ramos cjramos (at) genasys (dot) es 12 Mar 2002
Now, it seems to be solved by using static routes to hosts instead of using static routes to networks.
There is also another important note. Directors uses MQSeries from IBM, the starting sequence in haresources was mqseries masq.lvs (script for NAT), it looks that the 1 minute needed by mqseries to get up was confusing(!?) masq.lvs or ldirectord. We have just change the order to get up mqseries and masq.lvs, rising up first masq.lvs and finally mqseries.
With these two changes it works perfectly.
Chris Ruegger
Does LVS maintain a log file or can I configure it to use one so I can see a history of the requests that came in and how it forwarded them?
Joe 1 Apr 2002
It doesn't but it could. LVS does make statistics available.
Another question is whether logging is a good idea. The director is a router with slightly different rules than a regular router. It is designed to handle 1000's requests/sec and operate with no spinning media (eg on a flash card). There's no way you can log all connections to a disk and maintain throughput. You couldn't even review the contents of the logs. People do write filter rules, looking for likely problems and logging suspicious packets. Even reviewing those files overwhelmes most people.
Ratz 2 Apr 2002
LVS works on L4. Maybe the following command will make you happy:
echo 666 > /proc/sys/net/ipv4/vs/debug_level |
Joe - is 666 the logging level of the beast?
Horms 06/20/2005
LVS is part of the kernel. And as such any logging is done through the kernel. If LVS was compiled with support for debugging information, then /proc/sys/net/ipv4/vs/debug_level will exist. If you run
echo 0 > /proc/sys/net/ipv4/vs/debug_level |
then it will turn logging off. If you echo any value greater than 0, it will increase the verbosity of logging. I believe the useful range of values is from 1 - 12. That is, once you get to 12, you have as much debugging information as you will get, and increasing the value won't give you any more. For a running server, I'd suggest a value of 3 or less.
Graeme
The debug logs above are (at the higher levels) hugely detailed, far more so than most people would require, and (oddly enough) are best suited for debugging problems with the LVS module code itself than anything else.
If what you want is heartbeat or healthcheck monitoring, there are a number of applications which do this; the most common approaches are (in no particluar order):
Matt Stockdale
Does the current LVS code work in conjuction with the linux vlan code? We'd like to have a central load balancing device, that connects into our core switch w/ a dot1q trunk, and can have virtual interfaces on any of our many netblocks/vlans.
Benoit Gaussen bgaussen (at) fr (dot) colt (dot) net 20 Mar 2002
I tested it and it works. The only problem I encountered is a MTU problem with eepro100 driver and 8021q code. However there is a small patch on 8021q website. My config was linux 2.4.18/lvs 1.0.0 configured with LVS-NAT.
Matthew S. Crocker matthew (at) crocker (dot) com 29 Oct 2002
I use LVS in a multi-homed, multi-router HSRP setup.
Each LVS is connected to a seperate switch Each Router is connected to each switch and my upstream providers. We use BGP4 to talk with our upstream providers. Routers use HSRP failover for an IP address that the LVS boxes use as a gateway address.
The LVS setup is pretty much a standard LVS-NAT install using keepalived. Each LVS has a default route pointing to an IP address which is a virtual IP and part of the HSRP router failover system.
The Routers are standard cisco 7500 series running BGP4 between themselves and my providers. They also run HSRP (Hot Swap Router Protocol) between their ethernet interfaces.
With my setup I can lose a link, a router, a switch or an LVS box and not go down.
This was a long thread. The poster's application worked fine on a single server, but suffered intermittant freezes when moved to an LVS. Although many suggestions were offered, none helped and the poster had to figure it out by himself. In the meantime the poster changed from LVS-NAT to LVS-DR and rearranged his setup several times over a period of about 3 weeks.
Jan Abraham jan_abraham (at) gmx (dot) net 11 Nov 2003
I've solved the issue this morning. Combination of two independent problems:
Problem A:
Use of ext3 with the default data mode (ordered) I should be beaten for this. Still, it's a unknown issue why it worked well when running on a single server (without LVS).
I'm not an expert in filesystems, but I can imagine that the ext3 journal ran out of space and holds the system until all entries were written on their respective places. Just an idea. "man mount" suggests to use "writeback" as data mode to improve performance, with the risk of having some files containing old data fragments after a crash.
The reason I choosed a journaling file system was to minimize down time after a crash. For now, I've switched back to ext2, but I'll do some new attempts with the suggested writeback mode on ext3.
I think it shall be written with bright red letters: "do not use journaling filesystem without noatime,nodiratime on a high traffic website".
Problem B:
A switch that mysteriously sends packets to the wrong servers. We've replaced the switch two times, now all packets arrive where they should. I'll try to get back to LVS-DR tomorrow.
Jacob Coby jcoby (at) listingbook (dot) com 12 Nov 2003
If you aren't using it already, take a look at the PHP Accelerator (http://www.php-accelerator.co.uk/). It made a HUGE difference in our ability to serve dynamic content quickly. Our site is made up of about 75k loc of PHP (plus an additonal 20kloc of support code in php), of which about ~35k is used per page, including at least 8 includes per page. We serve up some 7 million pages/month (~110gb). We aren't a huge site, but we're able to support this with a dual PIII 733 running at max at 60%.
Using PHPA reduced the server load by about 50%, improved latency and page rendering times by anywhere from 50 - 300%, and allowed us to continue using our single web server for at least another 2 years without moving to multiple, load balanced servers. Load balancing is still in the future, but more or less for reduncancy than anything else.
Son Nguyen, 8 Jul 2005
root@realserver [~]# ip route get from CIP to VIP iif tunl0 RTNETLINK answers: Invalid argument |
Horms
I suspect that the route in question is unknown to the kernel. e.g. my box is 172.16.4.222 and the gateway is 172.16.0.1.
# ip route get from 172.16.4.222 to 172.16.0.1 172.16.0.1 from 172.16.4.222 dev eth0 cache mtu 1500 rtt 3ms rttvar 5ms cwnd 2 advmss 1460 hoplimit 64 |
# ip route get from 172.16.4.223 to 172.16.0.1 RTNETLINK answers: Invalid argument |
Reid Sutherland mofino (at) gmail (dot) com 4 Aug 2005
Enable IP Advanced Routing and then whatever else you need under that.
Andrei Taranchenko andrei (at) towerdata (dot) com 25 Aug 2005
We have an active director with 256 MB RAM, and two nodes. When I do a stress test, the client starts getting timeouts when the number of *inactive* connections hits 600 or so. If I take out a node, the same number of inactive connections is easily handled by a node, but it is still a problem for the director. The nodes and the director are connected on the hub, and the director is the default gateway. The nodes are also connected to the rest of the network on the other interface (they need to see the database, etc).
Horms
Perhaps there is some issue with the kernel's network stack, and it could be resolved by trying a different version. 256Mb of ram should be able to handle a lot more than 600 connections (they consome 100 or so bytes each). That is, unless your box is very low on memory for other reasons. I've done tests with LVS going up to 3,000,000 connections, on boxes with around 512Mb or ram (I don't remember exactly), so it shouldn't be an LVS problem.