3. LVS: Install, Configure, Setup

3.1. Installing from Source Code

Doing this from source code is now described in the LVS-mini-HOWTO. Two methods of setup are described

  • Setup from the command line. This is fine to understand what's going on, and if you only want to have a single type of setup. For LVSs which you're reconfiguring a lot, it's tedious and mistake prone. If it doesn't work, you will spend some time figuring out why.
  • From a configure script which sets up an LVS with a single director. This script is fine for initial setups: it's mistake proof (will give you enough information about failures to figure out what might be wrong) and I used it for all my testing of LVS. Since it's not easily expandable to handle director failover and other configuration tools handle this now, the configure script is not being developed anymore. For production, where you need failover directors, you should use other setup tools or save your hand-built setup as a script (e.g. with ipvsadm-sav).

3.2. Ultra Monkey

Ultra Monkey is a packaged set of binaries for LVS, including Linux-HA for director failover and ldirectord for realserver failover. It's written by Horms, one of the LVS developers. Ultra Monkey was used on many of the server setups sold by VA Linux and presumably made lots of money for them. Ultra Monkey has been around since 2000 and is mature and stable. Questions about Ultra Monkey are answered on the LVS mailing list. Ultra Monkey is mentioned in many places in the LVS-HOWTO.

3.3. Keepalived

Keepalived is written by Alexandre Cassen, and is based on vrrpd for director failover. Health checking for realservers is included. It has a lengthy but logical conf file and sets up an LVS for you. Alexandre released code for this in late 2001. There is a keepalived mailing list and Alexandre also monitors the LVS mailing list (May 2004, most of the postings have moved to the keepalived mailing list). The LVS-HOWTO has some information about Keepalived.

3.4. Alternate hardware: Soekris (and embedded hardware)

Clint Byrum cbyrum (at) spamaps (dot) org 27 Sep 2004

I'd like to setup a two node Heartbeat/LVS load balancer using Soekris Net4801 machines. These have a 266Mhz Geode CPU, 3 Ethernet, and 128MB of RAM. The OS (probably LEAF) would live on a CF disk. If these are overkill, I'd also consider a Net4501, which has a 133Mhz CPU, 64MB RAM, and 3 ethernet.

I'd need to balance about 300 HTTP requests per second, totaling about 150kB/sec, between two servers. I'm doing this now with the servers themselves (big dual P4 3.02 Ghz servers with lots and lots of RAM). This is proving problematic as failover and ARP hiding are just a major pain. I'd rather have a dedicated LVS setup.

1) anybody else doing this?

2) IIRC, using the DR method, CPU usage is not a real problem because reply traffic doesn't go through the LVS boxes, but there is some RAM overhead per connection. How much traffic do you guys think these should be able to handle?

Ratz 28 Sep 2004

The Net4801 machines are horribly slow but for your purpose enough. The limiting factor on those boxes are almost always the cache sizes. I've waded through too many processor sheets of those Geode derivates to give your specific details on your processor but I would be surprised if it had more than 16kb i/d-cache each.

16k unified cache. :-/

Make sure that your I/O rate is as low as possible or the first thing to blow is your CF disk. I've worked with hundreds of those little boxes in all shapes, sizes and configurations. The biggest common mode failures were CF disk due to temperature problems and I/O pressure (MTTF was 23 days); other problems only showed up in really bad NICs locking up half of the time.

I haven't ever had an actual CF card blow on me. LEAF is made to live on readonly media.. so its not like it will be written to a lot.

Sorry, blow is exaggerated, I mean they simply fail because they only have limited write capacity on the cells.

RO doesn't mean that there's no I/O going to your disk as you correctly noted. The problem is that if you plan on using them 24/7 I suggest you monitor your block I/O on your RO partitions using the values from /proc/partitions or the wonderful iostat tool. Then extrapolate about 4 hours worth of samples, check your CF vendor specification on how many writes it can endure and see how long you can expect the thing to run.

I have to add that thermal issues were adding to our high failure rates. We wanted to ship those little nifty boxes to every branch of a big customer to do a big VPN network. Unfortunately the customer is in the automobile industry and this means that those boxes were put in the stranges places imaginable in garages sometimes causing major heat congestion. Also as it is usual in this sector of industry people are used to reliable hardware and so they don't care if at the end of a working day they simply shut down the power of the whole garage. Needless to say that this adds up to the reduced lifetime of a CF.

I then did a reliability analysis using the MGL (multiple greek letter, derived from the beta-factor model) model to calculate the average risk in terms of failure*consequence and we had to refrain from using those little nifty things. The costs of repair (detection of failure -> replacement of product) at a customer would exceed the income our service provided through a mesh of those boxes.

If these are overkill, I'd also consider a Net4501, which has a 133Mhz CPU, 64MB RAM, and 3 ethernet.

I'd go with the former ones, just to be sure ;).

Forgive me for being frank, but it sounds like you wouldn't go with either of them.

I don't know your business case so it's very difficult to give you a definite answer. I only give you an (somewhat intimidating) experience report, someone might just as well give you a much better report.

I'd need to balance about 300 HTTP requests per second, totaling about 150kB/sec, between two servers.

So one can assume a typical request to your website is 512 Bytes, which is rather quite high. But not really an issue for LVS-DR.

I didn't clarify that. The 150kB/sec is outgoing. This isn't for all of the website, just the static images/html/css.
I'm doing this now with the servers themselves (big dual P4 3.02 Ghz servers with lots and lots of RAM). This is proving problematic as failover and ARP hiding are just a major pain. I'd rather have a dedicated LVS setup.

I'd have to agree to this.

1) anybody else doing this?

Maybe. Stupid questions: How often did you have to failover and how often did it work out of the box?

Maybe once every 2 or 3 months I'd need to do some maintenance and switch to the backup. Every time there was some problem with noarp not coming up or some weird routing issue with the IPs. Complexity bad. :)

So frankly speaking: your HA solution didn't work as expected ;).

2) IIRC, using the DR method, CPU usage is not a real problem because reply traffic doesn't go through the LVS boxes, but there is some RAM overhead per connection. How much traffic do you guys think these should be able to handle?

This is very difficult to say since these boxes impose limits also through their inefficiant PCI busses, their rather broken NICs and the dramatically reduced cache. Also it would be interesting to know if you're planning on using persistency on your setup.

Persistency is not a requirement. Note that most of the time a client opens a connection once, and keeps it up as long as they're browsing with keepalives.

Yes, provided most clients use HTTP/1.1. But since on an application level you don't need persistency.

But to give you a number to start with, I would say those boxes should be able (given your constraints) to sustain 5Mbit/s of traffic with about 2000pps (~350 Bytes/packet) and only consume 30 Mbyte of your precious RAM when running without persistency. This is if every packet of your 2000pps is a new client requesting a new connection to the LVS and will be inserted by the template at an average of 1 Minute.

As mentioned previously, you HW configuration is very hard to compare to actual benchmarks, thus take those numbers with a grain of salt, please.

Thats not encouraging. I need something fairly cheap.. otherwise I might as well go down the commercial load balancer route.

Well, I have given you number which are (at a second look) rather low estimates ;). Technically, your system should be able to deliver 25000pps (yes, 25k) at a 50Mbit/s rate. You would then, if every packet was a new client, consume about all the memory of your system :). So somewhere in between those two numbers I would place the performance of your machine.

Bubba Parker sysadmin (at) citynetwireless (dot) net 27 Sep 2004

In my tests, the Soekris net4501, 4511, and 4521 all were able to route almost 20Mbps at wire-speed. I would suspect the 4801 to be in excess of 50Mbps, but remember, your Soekris board has 3 nics, but what they don't tell you is that they all share the same interrupt, so performance degredation is exponential with many packets per second.

Ratz 28 Sep 2004

For all Geode based boards I've received more technical documentation than I was ever prepared to dive in. Most of the time you get a very accurate depiction of your hardware including south and north bridges and there you can see that the interrupt lines are hardwired and require a interrupt sharing.

However this is not a problem since there's not a lot of devices on the bus anyway that would occupy it and if you're really unhappy about the bus speed, use setpci to reduce latency for the NIC's IRQs.

Newer kernels have excellent handling for shared IRQs btw.

Did you measure exponential degradation? I know you get a pretty steep performance reduction once you push the pps too high but I newer saw exponential behaviour.

Peter Mueller 2004-09-27

What about not using these Soekris's and just using those two beefy servers? e.g., http://www.ultramonkey.org/2.0.1/topologies/ha-overview.html or http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-overview.html

Clint Byrum 27 Sep 2004

Thats what I'm doing now. The setup works, but its complexity causes issues. Bringing up IPs over here, moving them from eth0 to lo over there, running noarpctl on that box. Its all very hard to keep track of. Its much simpler to just have two boxes running LVS, and not worry about whats on the servers.

Simple things are generally easier to fix if they break. It took me quite a while to find a simple typo in a script on my current setup, because it was very non-obvious at what layer things were failing.

3.5. LVS on a CD: Malcolm Turnbull's ISO files

Malcolm Turnbull Malcolm (dot) Turnbull (at) crocus (dot) co (dot) uk 03 Jun 2003, has released a Bootable ISO image of his Loadbalancer.org appliance software. The link was at http://www.loadbalancer.org/modules.php?name=Downloads&d_op=viewdownload&cid=2 but is now dead (Dec 2003). Checking the website (Apr 2004) I find that the code is available as a 30 day demo (http://www.loadbalancer.org/download.html, link dead Feb 2005).

Here's the original blurb from Malcolm

The basic idea is creating an easy to use layer 4 switch appliance to compete with Coyote Point Equalizer/ CISCO local director... All my source code is GPL, but the ISO distribution contains files that are non-GPL to protect the work and allow vendors to licence the software. The ISO requires a license before you can legally use it in production.

Burn it to CD and then use it to boot a spare server with pentium/celeron + ATAPI CD + 64MB RAM + 1 or 2 NICs+20GB HD

root password is : loadbalancer
ip address is : 10.0.0.21/255.255.0.0
web based login is : loadbalancer
web based password is : loadbalancer

Default setup is DR so just plug it straight into the same hub as your web servers and have a play.. Download the manuals for configuration info...