All workflows are resumable: if Jenkins needs to be restarted (or crashes, or the server reboots) while a flow is running, it should resume at the same point in its program after Jenkins starts back up. Similarly, if a flow is running a lengthy sh
or bat
step when a slave unexpectedly disconnects, no progress should be lost when connectivity is restored.
However, in some cases a flow build will have done a great deal of work and proceeded to a point where a transient error occurred: one which does not reflect the inputs to this build, such as source code changes. For example, after completing a lengthy build and test of a software component, final deployment to a server might fail for a silly reason, such as a DNS error or low disk space. After correcting the problem you might prefer to restart just the last portion of the flow, without needing to redo everything that came before.
The Enterprise-only checkpoint
step makes this possible. Simply place a checkpoint at a safe point in your script, after performing some work and before doing something that might fail randomly:
sh './build-and-test' checkpoint 'Completed tests' sh './deploy'
Whenever build-and-test
completes normally, this checkpoint will be recorded as part of the flow build, along with any program state at that point, such as local variables. If deploy
in this build fails (or just behaved differently than you wanted), you can later go back and restart from this checkpoint in this build. (You can use the Checkpoints link in the sidebar of the original build, or the Retry icon in the stage view, mentioned below.) A new flow build (with a fresh number) will be started which skips over all the steps preceding checkpoint
and just runs the remainder of the flow.
Restarted flow builds preserve all program state just as they were left, such as values of variables. But by the time you restore from a checkpoint, your original workspace may have been overwritten with different files from subsequent builds. So if your post-checkpoint steps rely on local files, not just the command you run, you will need to consider how to get those files back to their original condition.
The safest practice is to keep the checkpoint
step outside of any node
block, so not associated with either a slave or a workspace. Prior to the checkpoint, use the archive
step to save any important files, such as build products. If and when this build is restarted from the checkpoint, all of its artifacts will be copied into the new build first. Thus, you can use the unarchive
step to restore some or all of the archived files into your new workspace. flow.groovy
gives an example of this technique.
Alternately, you could use any other technique to recover the original files. For example, if prior to the checkpoint you uploaded an artifact to a repository manager, and received an identifier or permalink of some kind which you saved in a local variable, after the checkpoint you can retrieve it using the same identifier.
You can have a checkpoint inside a node
block, but this is unlikely to be useful since you cannot rely on the workspace being identical after the restart. You will still need to use one of the above methods to restore the original files. Also note that Jenkins will attempt to grab the same slave and workspace as the original build used, which could fail in the case of transient “cloud” slaves. By contrast, when the checkpoint is outside node
, the post-restart node
can specify a label which can match any available slave.