Sanity checking the environment while acting as primary

You can place an executable script in $JENKINS_HOME/sanity-check.sh that gets run before a node assumes the primary role, as well as when there’s a change in the cluster membership. The use case is for you to make sure that the node should really proceed to act as the primary. If the script exits with 0, the node will boot up as the primary node, and if it exists with non-zero, the node will not act as the primary node.

When two nodes that form a cluster lose contact, each node will assume that the other had died, and will take the responsibility as the primary node. This is called a "split brain" problem. This is problematic as you end up having two independently acting Jenkins masters. A similar problem can happen if one node in the cluster is severely stressed under load. The sanity check script provides users an opportunity to apply some heuristics to reduce the likelihood of this problem.

For example, you can check the availability of $JENKINS_HOME, if you can ping the router, or if the system load is reasonably low. If you are using the Jenkins Enterprise HA monitor tool to control resources, the sanity check script might run before the HA monitor tool had completed running its promotion script. If some of the sanity checks require the promotion to be complete, please retry the check a few times.