Introduction
It is frequently useful to process a set of documents as a single batch.
This example describes a batch process
that searches for all files named system.xml
in a given directory
and all subdirectories.
For each file the process counts the number of elements and finally presents the results in an HTML table.
This pattern could be used for other batch processing work such as converting
a set of HTML documents into XHMTL for a web site.
Description
The DPML script that implements the batch processing is shown below.
It contains embedded comments to provide suggestions on adapting it to suit your
specific needs.
The DPML script uses three distinct steps -
- Create a list of resources to batch process
- Process each resource in turn
- Create a report on the results of the batch processing
Create list of resources
In this example we use the fls
services
to create an XML document that lists all instances of files named
system.xml
below a specified directory.
As an alternative you could supply a list of resources as a parameter to the
batch process (and generate it externally).
For example, you could create a web robot that discovers the resources
of a remote web site and processes them as a batch.
Process each resource
In this example the script iterates over the resources found in the XML document
provided by the fls
service.
The script uses XQuery to count the elements in each system.xml
file.
You can see that any service can be applied to each resource.
For example we could run XHTMLTidy
to batch convert HTML resources into valid XHTML resources.
Report on results
In our example we accumulate the results of each process in a variable. The result could be used for more extensive reporting
including any exceptions that might have occured. To keep our example simple we've only provided basic exception processing.
Finally the results are styled and presented as an HTML table. You could write them to a file or use them as the start of another process.
Executing this process
To execute this DPML create a new module.
- Use the new module wizard
to create and install a new module - choose the default settings
ensuring your module supports DPML.
Import the module into the Frontend fulcrum to make it available on localhost and port 8080.
Physically, the module will be located in [install]/modules/your_module_name/
.
-
The example process uses the fls
accessor supplied from the ext_sys module and the xquery
accessor
supplied from ext_xquery. These modules must be imported into your module by
adding the following two imports into the mapping section of your module.xml definition located in the root directory of your module.
<mapping>
...Existing Imports...
<import>
<uri>urn:org:ten60:netkernel:ext:sys</uri>
</import>
<import>
<uri>urn:org:ten60:netkernel:ext:xquery</uri>
</import>
</mapping>
Next perform a cold restart
to pick up the module changes.
-
Finally copy the batch process listing below to a file named batch.idoc in the resources/ directory of your module.
Edit the fls instruction to point to a directory in your filesystem and, if you wish, change the regex filter to match different file names.
-
Start the process by requesting the following URI with a web-browser http://localhost:8080/batch.idoc
.
DPML Script Code
<idoc>
<seq>
<comment>
******************************************
A Batch Processing Pattern.
This example finds all system.xml documents and
counts the number of elements they contain. You
can adapt it to suit your needs.
******************************************
</comment>
<comment>
***********
Use File LS accessor to list files.
o Modify the root for your filesystem
o Modify the filter regex to target other XML files
The result is a tree of matching resources each with a
uri element containing the URI of the resource. We'll
use this as the source for the batch process.
***********
</comment>
<instr>
<type>fls</type>
<operator>
<fls>
<root>file:///home/pjr/dev/</root>
<filter>.*system.xml</filter>
<recursive />
<uri />
</fls>
</operator>
<target>var:fls</target>
</instr>
<comment>
*************
Prepare a results document
**************
</comment>
<instr>
<type>copy</type>
<operand>
<results />
</operand>
<target>var:results</target>
</instr>
<comment>
***********
Start batch processing loop
***********
</comment>
<while>
<comment>
***********
Loop condition - do processing sequence while there's
a file URI left to process
***********
</comment>
<cond>
<instr>
<type>xpatheval</type>
<operand>var:fls</operand>
<operator>
<xpath>/descendant::uri[1]</xpath>
</operator>
<target>this:cond</target>
</instr>
</cond>
<seq>
<comment>
***********
Copy the URI fragment to a variable
and log it to show progress
***********
</comment>
<instr>
<type>copy</type>
<operand>var:fls#xpointer(/descendant::uri[1])</operand>
<target>var:uri</target>
</instr>
<instr>
<type>log</type>
<operand>var:uri</operand>
</instr>
<comment>
***********
Main Process - We could do anything we liked here
including executing another DPML process or modifying
the target file in some way. Here we simply count
the elements in the file.
***********
</comment>
<instr>
<type>xquery</type>
<operator>
<xquery>
(:
*********
Declare the external URI variable and
extract the file URI to $file variable
*********
:)
declare variable $uri as node() external;
declare variable $file {$uri/uri/text()};
(:
*******
Return a fragment:
Quote back the URI fragment and add a
count element with the number of
elements contained in the target document
*******
:)
<result>
{$uri}
<count>
{count(doc($file)/descendant::*)}
</count>
</result>
</xquery>
</operator>
<uri>var:uri</uri>
<target>var:result</target>
</instr>
<comment>
***********
Append the xquery result to our cumulative
var:results document
***********
</comment>
<instr>
<type>stm</type>
<operand>var:results</operand>
<operator>
<stm:group xmlns:stm="http://1060.org/stm">
<stm:append xpath="/results">
<stm:param xpath="/result:sequence/result:element/result" />
</stm:append>
</stm:group>
</operator>
<param>var:result</param>
<target>var:results</target>
</instr>
<comment>
**********
Exception: Catch any processing exceptions...
**********
</comment>
<exception>
<comment>
**********
Since this is a dumb example we'll simply log the exception,
you can add more extensive error handling for
your process if required...
**********
</comment>
<instr>
<type>log</type>
<operand>this:exception</operand>
</instr>
</exception>
<comment>
***********
Remove the first URI from the file listing before starting
next iteration of the loop. If this isn't done we'll have an
infinite loop!!!
***********
</comment>
<instr>
<type>stm</type>
<operand>var:fls</operand>
<operator>
<stm:group xmlns:stm="http://1060.org/stm">
<stm:delete xpath="/descendant::uri[1]" />
</stm:group>
</operator>
<target>var:fls</target>
</instr>
</seq>
</while>
<comment>
***********
All done. Style the results for presentation.
***********
</comment>
<instr>
<type>xslt</type>
<operand>var:results</operand>
<operator>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" />
<xsl:template match="/results">
<html>
<body>
<h1>Batch Results</h1>
<table>
<tr bgcolor="#aaaaaa">
<td>file</td>
<td>elements</td>
</tr>
<xsl:for-each select="result">
<tr>
<td>
<xsl:value-of select="uri" />
</td>
<td>
<xsl:value-of select="count" />
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
</operator>
<target>this:response</target>
</instr>
</seq>
</idoc>
Deadlock Detector Exception
Searching a filesystem may take a long time and may exceed the time NetKernel
allows a process to work before raising the NetKernel Deadlock Detector exception.
If this happens, simply increase the deadlock detection period in your
System Configuration
.