In Batch Domain Language , the overall architecture design was discussed, using the following diagram as a guide:
While the
Job
object may seem like a simple
container for steps, there are many configuration options of which a
developers must be aware . Furthermore, there are many considerations for
how a
Job
will be run and how its meta-data will be
stored during that run. This chapter will explain the various configuration
options and runtime concerns of a
Job
.
There are multiple implementations of the
Job
interface, however, the
namespace abstracts away the differences in configuration. It has only
three required dependencies: a name,
JobRepository
,
and a list of
Step
s.
<job id="footballJob"> <step id="playerload" parent="s1" next="gameLoad"/> <step id="gameLoad" parent="s2" next="playerSummarization"/> <step id="playerSummarization" parent="s3"/> </job>
The namespace defaults to referencing a repository with an id of 'jobRepository', which is a sensible default. However, this can be overridden explicitly:
<job id="footballJob" job-repository="specialRepository">
<step id="playerload" parent="s1" next="gameLoad"/>
<step id="gameLoad" parent="s3" next="playerSummarization"/>
<step id="playerSummarization" parent="s3"/>
</job>
One key issue when executing a batch job concerns the behavior of
a Job
when it is restarted? The launching of a
Job
is considered to be a 'restart' if a
JobExecution
already exists for the particular
JobInstance
. Ideally, all jobs should be able to
start up where they left off, but there are scenarios where this is not
possible. It is entirely up to the developer to
ensure that a new JobInstance
is created in this
scenario. However, Spring Batch does provide some help. If a
Job
should never be restarted, but should always
be run as part of a new JobInstance
, then the
restartable property may be set to 'false':
<job id="footballJob" restartable="false">
...
</job>
To phrase it another way, setting restartable to false means "this
Job does not support being started again". Restarting a Job that is not
restartable will cause a JobRestartException
to
be thrown:
Job job = new SimpleJob(); job.setRestartable(false); JobParameters jobParameters = new JobParameters(); JobExecution firstExecution = jobRepository.createJobExecution(job, jobParameters); jobRepository.saveOrUpdate(firstExecution); try { jobRepository.createJobExecution(job, jobParameters); fail(); } catch (JobRestartException e) { // expected }
This snippet of JUnit code shows how attempting to create a
JobExecution
the first time for a non restartable
job
will cause no issues. However, the second
attempt will throw a JobRestartException
.
During the course of the execution of a
Job
, it may be useful to be notified of various
events in its lifecycle so that custom code may be executed. The
SimpleJob
allows for this by calling a
JobListener
at the appropriate time:
public interface JobExecutionListener { void beforeJob(JobExecution jobExecution); void afterJob(JobExecution jobExecution); }
JobListener
s can be added to a
SimpleJob
via the listeners element on the
job:
<job id="footballJob">
<step id="playerload" parent="s1" next="gameLoad"/>
<step id="gameLoad" parent="s2" next="playerSummarization"/>
<step id="playerSummarization" parent="s3"/>
<listeners>
<listener class="org.springframework.batch.sample.SampleListener"/>
</listeners>
</job>
It should be noted that afterJob
will be
called regardless of the success or failure of the
Job
. If success or failure needs to be determined
it can be obtained from the JobExecution
:
public void afterJob(JobExecution jobExecution){ if( jobExecution.getStatus() == BatchStatus.COMPLETED ){ //job success } else if(jobExecution.getStatus() == BatchStatus.FAILED){ //job failure } }
The annotations corresponding to this interface are:
@BeforeJob
@AfterJob
If a group of Job
s share similar, but not
identical, configurations, then it may be helpful to define a "parent"
Job
from which the concrete
Job
s may inherit properties. Similar to class
inheritance in Java, the "child" Job
will combine
its elements and attributes with the parent's.
In the following example, "baseJob" is an abstract
Job
definition that defines only a list of
listeners. The Job
"job1" is a concrete
definition that inherits the list of listeners from "baseJob" and merges
it with its own list of listeners to produce a
Job
with two listeners and one
Step
, "step1".
<job id="baseJob" abstract="true"> <listeners> <listener class="com.ListenerOne"/> <listeners> </job> <job id="job1" parent="baseJob3"> <step id="step1" parent="standaloneStep"/> <listeners merge="true"> <listener class="com.ListenerTwo"/> <listeners> </job>
Please see the section on Inheriting from a Parent Step for more detailed information.
Unlike many traditional Spring applications, many of the
components of a batch application are stateful; the file readers and
writers are obvious examples. The recommended way to deal with this is
to create a fresh ApplicationContext
for each job
execution. If the Job
is launched from the
command line with CommandLineJobRunner
, this is
trivial. For more complex launching scenarios where jobs are executed in
parallel or serially from the same process, some extra steps have to be
taken to ensure that the ApplicationContext
is
refreshed. This is preferable to using prototype scope for the stateful
beans because then they would not receive lifecycle callbacks from the
container at the end of use. (e.g. through destroy-method in XML)
The strategy provided by Spring Batch to deal with this scenario
is the JobFactory
, and the samples provide an
example of a specialized implementation that can load an
ApplicationContext
and close it properly when the
job is finished. A relevant examples is
ClassPathXmlApplicationContextJobFactory
and its
use in the adhoc-job-launcher-context.xml
and the
quartz-job-launcher-context.xml
, which can be found in the
Samples project.
As described in earlier, the JobRepository
is
used for basic CRUD operations of the various persisted domain objects
within Spring Batch, such as JobExecution
and
StepExecution
. It is required by many of the major
framework features, such as the JobLauncher
,
Job
, and Step
. The batch
namespace abstracts away many of the implementation details of the
JobRepository
implementations and their
collaborators. However, there are still a few configuration options
available:
<job-repository id="jobRepository" dataSource="dataSource" transactionManager="transactionManager" isolation-level-for-create="SERIALIZABLE" table-prefix="BATCH_" />
None of the configuration options listed above are required except the id. If they are not set, the defaults shown above will be used. They are shown above for awareness purposes.
If the namespace is used, transactional advice will be
automatically created around the repository. This is to ensure that the
batch meta data, including state that is necessary for restarts after a
failure, is persisted correctly. The behavior of the framework is not
well defined if the repository methods are not transactional. The
isolation level in the create*
method attributes is
specified separately to ensure that when jobs are launched, if two
processes are trying to launch the same job at the same time, only one
will succeed. The default isolation level for that method is
SERIALIZABLE, which is quite aggressive: READ_COMMITTED would work just
as well; READ_UNCOMMITTED would be fine if two processes are not likely
to collide in this way. However, since a call to the
create*
method is quite short, it is unlikely
that the SERIALIZED will cause problems, as long as the database
platform supports it. However, this can be overridden:
<job-repository id="jobRepository"
isolation-level-for-create="ISOLATION_REPEATABLE_READ" />
If the namespace or factory beans aren't used then it is also essential to configure the transactional behavior of the repository using AOP:
<aop:config> <aop:advisor pointcut="execution(* org.springframework.batch.core..*Repository+.*(..))"/> <advice-ref="txAdvice" /> </aop:config> <tx:advice id="txAdvice" transaction-manager="transactionManager"> <tx:attributes> <tx:method name="*" /> </tx:attributes> </tx:advice>
This fragment can be used as is, with almost no changes. Remember also to include the appropriate namespace declarations and to make sure spring-tx and spring-aop (or the whole of spring) are on the classpath.
Another modifiable property of the
JobRepository
is the table prefix of the
meta-data tables. By default they are all prefaced with BATCH_.
BATCH_JOB_EXECUTION and BATCH_STEP_EXECUTION are two examples. However,
there are potential reasons to modify this prefix. If the schema names
needs to be prepended to the table names, or if more than one set of
meta data tables is needed within the same schema, then the table prefix
will need to be changed:
<job-repository id="jobRepository"
table-prefix="SYSTEM.TEST_" />
Given the above changes, every query to the meta data tables will be prefixed with "SYSTEM.TEST_". BATCH_JOB_EXECUTION will be referred to as SYSTEM.TEST_JOB_EXECUTION.
Only the table prefix is configurable. The table and column names are not.
There are scenarios in which you may not want to persist your domain objects to the database. One reason may be speed; storing domain objects at each commit point takes extra time. Another reason may be that you just don't need to persist status for a particular job. For this reason, Spring batch provides an in-memory Map version of the job repository:
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"> <property name="transactionManager" ref="transactionManager"/> </bean>
Note that the in-memory repository is volatile and so does not allow restart between JVM instances. It also cannot guarantee that two job instances with the same parameters are launched simultaneously, and is not suitable for use in a multi-threaded Job, or a locally partitioned Step. So use the database version of the repository wherever you need those features.
However it does require a transaction manager to be defined
because there are rollback semantics within the repository, and because
the business logic might still be transactional (e.g. RDBMS access). For
testing purposes many people find the
ResourcelessTransactionManager
useful.
If you are using a database platform that is not in the list of
supported platforms, you may be able to use one of the supported types,
if the SQL variant is close enough. To do this you can use the raw
JobRepositoryFactoryBean
instead of the namespace
shortcut and use it to set the database type to the closest
match:
<bean id="jobRepository" class="org...JobRepositoryFactoryBean"> <property name="databaseType" value="db2"/> <property name="dataSource" ref="dataSource"/> </bean>
(The JobRepositoryFactoryBean
tries to
auto-detect the database type from the DataSource
if it is not specified.) The major differences between platforms are
mainly accounted for by the strategy for incrementing primary keys, so
often it might be necessary to override the
incrementerFactory
as well (using one of the standard
implementations from the Spring Framework).
If even that doesn't work, or you are not using an RDBMS, then the
only option may be to implement the various Dao
interfaces that the SimpleJobRepository
depends
on and wire one up manually in the normal Spring way.
The most basic implementation of the
JobLauncher
interface is the
SimpleJobLauncher
. Its only required dependency is
a JobRepository
, in order to obtain an
execution:
<bean id="jobLauncher" class="org.springframework.batch.execution.launch.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> </bean>
Once a JobExecution
is
obtained, it is passed to the execute method of
Job
, ultimately returning the
JobExecution
to the caller:
The sequence is straightforward and works well when launched from a
scheduler. However, issues arise when trying to launch from an HTTP
request. In this scenario, the launching needs to be done asynchronously
so that the SimpleJobLauncher
returns immediately
to its caller. This is because it is not good practice to keep an HTTP
request open for the amount of time needed by long running processes such
as batch. An example sequence is below:
The SimpleJobLauncher
can easily be
configured to allow for this scenario by configuring a
TaskExecutor
:
<bean id="jobLauncher" class="org.springframework.batch.execution.launch.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> <property name="taskExecutor"> <bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" /> </property> </bean>
Any implementation of the spring TaskExecutor
interface can be used to control how jobs are asynchronously
executed.
At a minimum, launching a batch job requires two things: the
Job
to be launched and a
JobLauncher
. Both can be contained within the same
context or different contexts. For example, if launching a job from the
command line, a new JVM will be instantiated for each Job, and thus every
job will have its own JobLauncher
. However, if
running from within a web container within the scope of an
HttpRequest
, there will usually be one
JobLauncher
, configured for asynchronous job
launching, that multiple requests will invoke to launch their jobs.
For users that want to run their jobs from an enterprise
scheduler, the command line is the primary interface. This is because
most schedulers (with the exception of Quartz unless using the
NativeJob
) work directly with operating system
processes, primarily kicked off with shell scripts. There are many ways
to launch a Java process besides a shell script, such as Perl, Ruby, or
even 'build tools' such as ant or maven. However, because most people
are familiar with shell scripts, this example will focus on them.
Because the script launching the job must kick off a Java
Virtual Machine, there needs to be a class with a main method to act
as the primary entry point. Spring Batch provides an implementation
that serves just this purpose:
CommandLineJobRunner
. It's important to note
that this is just one way to bootstrap your application, but there are
many ways to launch a Java process, and this class should in no way be
viewed as definitive. The CommandLineJobRunner
performs four tasks:
Load the appropriate
ApplicationContext
Parse command line arguments into
JobParameters
Locate the appropriate job based on arguments
Use the JobLauncher
provided in the
application context to launch the job.
All of these tasks are accomplished using only the arguments passed in. The following are required arguments:
Table 4.1. CommandLineJobRunner arguments
jobPath | The location of the XML file that will be used to
create an ApplicationContext . This file
should contain everything needed to run the complete
Job |
jobName | The name of the job to be run. |
These arguments must be passed in with the path first and the name second. All arguments after these are considered to be JobParameters and must be in the format of 'name=value':
bash$
java CommandLineJobRunner endOfDayJob.xml endOfDay schedule.date(date)=2008/01/01
In most cases you would want to use a manifest to declare your
main class in a jar, but for simplicity, the class was used directly.
This example is using the same 'EndOfDay' example from Batch Domain Language. The first argument is 'endOfDayJob.xml', which is
the Spring ApplicationContext
containing the
Job
. The second argument, 'endOfDay' represents
the job name. The final argument, 'schedule.date(date)=2008/01/01'
will be converted into JobParameters
. An
example of the XML configuration is below:
<job id="endOfDay"> <step id="step1" parent="simpleStep" /> </job> <!-- Launcher details removed for clarity --> <beans:bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher" />
This example is overly simplistic, since there are many more
requirements to a run a batch job in Spring Batch in general, but it
serves to show the two main requirements of the
CommandLineJobRunner
:
Job
and
JobLauncher
When launching a batch job from the command-line, an enterprise
scheduler is often used. Most schedulers are fairly dumb and work only
at the process level. This means that they only know about some
operating system process such as a shell script that they're invoking.
In this scenario, the only way to communicate back to the scheduler
about the success or failure of a job is through return codes. A
return code is a number that is returned to a scheduler by the process
that indicates the result of the run. In the simplest case: 0 is
success and 1 is failure. However, there may be more complex
scenarios: If job A returns 4 kick off job B, and if it returns 5 kick
off job C. This type of behavior is configured at the scheduler level,
but it is important that a processing framework such as Spring Batch
provide a way to return a numeric representation of the 'Exit Code'
for a particular batch job. In Spring Batch this is encapsulated
within an ExitStatus
, which is covered in more
detail in Chapter 5. For the purposes of discussing exit codes, the
only important thing to know is that an
ExitStatus
has an exit code property that is
set by the framework (or the developer) and is returned as part of the
JobExecution
returned from the
JobLauncher
. The
CommandLineJobRunner
converts this string value
to a number using the ExitCodeMapper
interface:
public interface ExitCodeMapper { public int intValue(String exitCode); }
The essential contract of an
ExitCodeMapper
is that, given a string exit
code, a number representation will be returned. The default
implementation used by the job runner is the SimpleJvmExitCodeMapper
that returns 0 for completion, 1 for generic errors, and 2 for any job
runner errors such as not being able to find a
Job
in the provided context. If anything more
complex than the 3 values above is needed, then a custom
implementation of the ExitCodeMapper
interface
must be supplied. Because the
CommandLineJobRunner
is the class that creates
an ApplicationContext
, and thus cannot be
'wired together', any values that need to be overwritten must be
autowired. This means that if an implementation of
ExitCodeMapper
is found within the BeanFactory,
it will be injected into the runner after the context is created. All
that needs to be done to provide your own
ExitCodeMapper
is to declare the implementation
as a root level bean and ensure that it is part of the
ApplicationContext
that is loaded by the
runner.
Historically, offline processing such as batch jobs have been
launched from the command-line, as described above. However, there are
many cases where launching from an HttpRequest
is
a better option. Many such use cases include reporting, ad-hoc job
running, and web application support. Because a batch job by definition
is long running, the most important concern is ensuring to launch the
job asynchronously:
The controller in this case is a Spring MVC controller. More
information on Spring MVC can be found here: http://static.springframework.org/spring/docs/2.5.x/reference/mvc.html.
The controller launches a Job
using a
JobLauncher
that has been configured to launch
asynchronously, which
immediately returns a JobExecution
. The
Job
will likely still be running, however, this
nonblocking behaviour allows the controller to return immediately, which
is required when handling an HttpRequest
. An
example is below:
@Controller public class JobLauncherController { @Autowired JobLauncher jobLauncher; @Autowired Job job; @RequestMapping("/jobLauncher.html") public void handle() throws Exception{ jobLauncher.run(job, new JobParameters()); } }
So far, both the JobLauncher and JobRepository interfaces have been discussed. Together, they represent simple launching of a job, and basic CRUD operations of batch domain objects:
A JobLauncher
uses the
JobRepository
to create new
JobExecution
objects and run them.
Job
and Step
implementations
later use the same JobRepository
for basic updates
of the same executions during the running of a Job
.
The basic operations suffice for simple scenarios, but in a large batch
environment with hundreds of batch jobs and complex scheduling
requirements, more advanced access of the meta data is required:
The JobExplorer
and
JobOperator
interfaces, which will be discussed
below, add additional functionality for querying and controlling the meta
data.
The most basic need before any advanced features is the ability to
query the repository for existing executions. This functionality is
provided by the JobExplorer
interface:
public interface JobExplorer { List<JobInstance> getJobInstances(String jobName, int start, int count); JobExecution getJobExecution(Long executionId); StepExecution getStepExecution(Long jobExecutionId, Long stepExecutionId); JobInstance getJobInstance(Long instanceId); List<JobExecution> getJobExecutions(JobInstance jobInstance); Set<JobExecution> findRunningJobExecutions(String jobName); }
As is evident from the method signatures above,
JobExplorer
is a read-only version of the
JobRepository
, and like the
JobRepository
, it can be easily configured via a
factory bean:
<bean id="jobExplorer" class="org.spr...JobExplorerFactoryBean" p:dataSource-ref="dataSource" />
Earlier in this
chapter, it was mentioned that the table prefix of the
JobRepository
can be modified to allow for
different versions or schemas. Because the
JobExplorer
is working with the same tables, it
too needs the ability to set a prefix:
<bean id="jobExplorer" class="org.spr...JobExplorerFactoryBean"
p:dataSource-ref="dataSource" p:tablePrefix="BATCH_" />
As previously discussed, the JobRepository
provides CRUD operations on the meta-data, and the
JobExplorer
provides read-only operations on the
meta-data. However, those operations are most useful when used together
to perform common monitoring tasks such as stopping, restarting, or
summarizing a Job, as is commonly done by batch operators. Spring Batch
provides for these types of operations via the
JobOperator
interface:
public interface JobOperator { List<Long> getExecutions(long instanceId) throws NoSuchJobInstanceException; List<Long> getJobInstances(String jobName, int start, int count) throws NoSuchJobException; Set<Long> getRunningExecutions(String jobName) throws NoSuchJobException; String getParameters(long executionId) throws NoSuchJobExecutionException; Long start(String jobName, String parameters) throws NoSuchJobException, JobInstanceAlreadyExistsException; Long restart(long executionId) throws JobInstanceAlreadyCompleteException, NoSuchJobExecutionException, NoSuchJobException, JobRestartException; Long startNextInstance(String jobName) throws NoSuchJobException, JobParametersNotFoundException, JobRestartException, JobExecutionAlreadyRunningException, JobInstanceAlreadyCompleteException; boolean stop(long executionId) throws NoSuchJobExecutionException, JobExecutionNotRunningException; String getSummary(long executionId) throws NoSuchJobExecutionException; Map<Long, String> getStepExecutionSummaries(long executionId) throws NoSuchJobExecutionException; Set<String> getJobNames(); }
The above operations represent methods from many different
interfaces, such as JobLauncher
,
JobRepository
,
JobExplorer
, and
JobRegistry
. For this reason, the provided
implementation of JobOperator
,
SimpleJobOperator
, has many dependencies:
<bean id="jobOperator" class="org.spr...SimpleJobOperator"> <property name="jobExplorer"> <bean class="org.spr...JobExplorerFactoryBean"> <property name="dataSource" ref="dataSource" /> </bean> </property> <property name="jobRepository" ref="jobRepository" /> <property name="jobRegistry" ref="jobRegistry" /> <property name="jobLauncher" ref="jobLauncher" /> </bean>
Most of the methods on JobOperator
are
self-explanatory, and more detailed explanations can be found on the
javadoc
of the interface. However, the
startNextInstance
method is worth noting. This
method will always start a new instance of a Job
.
This can be extremely useful if there are serious issues in a
JobExecution
and the Job
needs to be started over again from the beginning. Unlike
JobLauncher
though, which requires a new
JobParameters
object that will trigger a new
JobInstance
if the parameters are different from
any previous set of parameters, the
startNextInstance
method will use the
JobParametersIncrementer
tied to the
Job
to force the Job
to a
new instance:
public interface JobParametersIncrementer { JobParameters getNext(JobParameters parameters); }
The contract of JobParametersIncrementer
is
that, given a JobParameters
object, it will return the 'next' JobParameters
object by incrementing any necessary values it may contain. This
strategy is useful because the framework has no way of knowing what
changes to the JobParameters
make it the 'next'
instance. For example, if the only value in
JobParameters
is a date, and the next instance
should be created, should that value be incremented by one day? Or one
week (if the job is weekly for instance)? The same can be said for any
numerical values that help to identify the Job
,
as shown below:
public class SampleIncrementer implements JobParametersIncrementer { public JobParameters getNext(JobParameters parameters) { if (parameters==null || parameters.isEmpty()) { return new JobParametersBuilder().addLong("run.id", 1L).toJobParameters(); } long id = parameters.getLong("run.id",1L) + 1; return new JobParametersBuilder().addLong("run.id", id).toJobParameters(); } }
In this example, the value with a key of 'run.id' is used to
discriminate between JobInstances
. If the
JobParameters
passed in is null, it can be
assumed that the Job
has never been run before
and thus its initial state can be returned. However, if not, the old
value is obtained, incremented by one, and returned. An incrementer can
be associated with Job
via the 'incrementer'
attribute in the namespace:
<job id="footballJob" incrementer="sampleIncrementer">
...
</job>
One of the most common use cases of
JobOperator
is gracefully stopping a
Job:
Set<Long> executions = jobOperator.getRunningExecutions("sampleJob"); jobOperator.stop(executions.iterator().next());
The shutdown is not immediate, since there is no way to force
immediate shutdown, especially if the execution is currently in
developer code that the framework has no control over, such as a
business service. However, as soon as control is returned back to the
framework, it will set the status of the current
StepExecution
to
BatchStatus.STOPPED
, save it, then do the same
for the JobExecution
before finishing.