Requirements from the environment

The high-availability setup in Jenkins Enterprise provides the means for multiple JVMs to coordinate and ensure that the Jenkins master is running somewhere, but it does so by relying on the availability of the storage that houses $JENKINS_HOME and the HTTP reverse proxy mechanism that hides a fail-over from users who are accessing Jenkins.

Aside from NFS as a storage and IP aliasing as a reverse proxy mechanism, Jenkins Enterprise can run on a wide range of environments. In this section, we’ll describe the parameters required from them and discuss more examples of the deployment mode.

Storage

All the member nodes of a Jenkins Enterprise HA cluster need to see a single coherent file system that can be read and written simultaneously. That is to say, for any node A and B in the cluster, if node A creates a file in $JENKINS_HOME, node B needs to be able to see it within a reasonable amount of time. A "reasonable amount of time" here means the time window during which you are willing to lose data in case of a failure.

So long as the same directory is visisble on every member node, it need not be mounted on the same path. For example, node A can have $JENKINS_HOME at /net/storage/jenkins while node B has $JENKINS_HOME at /mnt/jenkins.

$JENKINS_HOME is read intensively during the start-up. If bandwidth to your storage is limited, you’ll see the most impact in startup performance. Large latency causes a similar issue, but this can be mitigated somewhat by using a higher value in the bootup concurrency by a system property -Djenkins.InitReactorRunner.concurrency=8 (in 1.458 and later).

Builds that run on master use $JENKINS_HOME as their work space. Because builds are often I/O intensive, CloudBees recommends you set the number of executors on the master to 0 to avoid doing non-slave builds. If you do allow the master to perform builds, consider limiting the builds to those that are not I/O-bound, or set the "workspace root directory" to be on local storage, not shared storage.

Finally, additional deployment-specific discussions follow.

NFS

When mounting NFS, use the intr mount flag so that you can interrupt the process in case of a failure to access storage. For truly highly-available Jenkins, NFS storage itself needs to be made highly-available. There are many resources on the web describing how to do this. See the Linux HA-NFS Wiki as a starting point.

DRBD

Distributed Replicated Block Device can be used to host $JENKINS_HOME. It provides highly available storage without the need to have a separate storage node. Because of the access requirements, however, DRBD needs to be run in the dual-primary mode, which restricts the choice of file system.

HTTP reverse proxy

When a fail-over happens and the Jenkins master role moves from one node to another, the location of the service moves, and therefore an additional mechanism is needed to shield users from this change. We call this "HTTP reverse proxy".

Jenkins Enterprise instances acting as stand-by nodes respond to all inbound HTTP requests with the 503 "service unavailable" HTTP status code. The frontend can use this as a signal to determine the node to forward the requests to.

The rest of this section discusses various reverse proxy deployment options.

IP aliasing

IP aliasing that we saw in the tutorial is a cheap way to emulate the effect of a highly-available HTTP reverse proxy, but this requires you to have root access to the system, and your machines need to be running in the same subnet. Google "Linux IP aliasing" for more details.

haproxy

haproxy is a load balancer that can be used as a reverse proxy. For truly highly-available setup, haproxy itself needs to be made highly available. See the haproxy HA setup tutorial for how to do this.

Network

Normally the nodes in an HA cluster discover one another automatically by means of files in the $JENKINS_HOME/jgroups/ directory. This works so long as each node can “see” the others’ advertised IP addresses and make TCP connections to them. See Long-running builds in case your network makes this impossible.