Batch ProcessingBatch Processing
Batch Processing with NetKernel
Home > Books > Tutorials and Training Guides > Pipeline Processing > Batch Processing

Rate this page:
Really useful
Satisfactory
Not helpful
Confusing
Incorrect
Unsure
Extra comments:


Introduction

It is frequently useful to process a set of documents as a single batch.

This example describes a batch process that searches for all files named system.xml in a given directory and all subdirectories. For each file the process counts the number of elements and finally presents the results in an HTML table.

This pattern could be used for other batch processing work such as converting a set of HTML documents into XHMTL for a web site.

Description

The DPML script that implements the batch processing is shown below. It contains embedded comments to provide suggestions on adapting it to suit your specific needs. The DPML script uses three distinct steps -

  1. Create a list of resources to batch process
  2. Process each resource in turn
  3. Create a report on the results of the batch processing

Create list of resources

In this example we use the fls services to create an XML document that lists all instances of files named system.xml below a specified directory. As an alternative you could supply a list of resources as a parameter to the batch process (and generate it externally). For example, you could create a web robot that discovers the resources of a remote web site and processes them as a batch.

Process each resource

In this example the script iterates over the resources found in the XML document provided by the fls service. The script uses XQuery to count the elements in each system.xml file. You can see that any service can be applied to each resource. For example we could run XHTMLTidy to batch convert HTML resources into valid XHTML resources.

Report on results

In our example we accumulate the results of each process in a variable. The result could be used for more extensive reporting including any exceptions that might have occured. To keep our example simple we've only provided basic exception processing. Finally the results are styled and presented as an HTML table. You could write them to a file or use them as the start of another process.

Executing this process

To execute this DPML create a new module.

  1. Use the new module wizard to create and install a new module - choose the default settings ensuring your module supports DPML. Import the module into the Frontend fulcrum to make it available on localhost and port 8080. Physically, the module will be located in [install]/modules/your_module_name/.
  2. The example process uses the fls accessor supplied from the ext_sys module and the xquery accessor supplied from ext_xquery. These modules must be imported into your module by adding the following two imports into the mapping section of your module.xml definition located in the root directory of your module.
    <mapping> ...Existing Imports...
      <import>
        <uri>urn:org:ten60:netkernel:ext:sys</uri>
      </import>
      <import>
        <uri>urn:org:ten60:netkernel:ext:xquery</uri>
      </import>
    </mapping>
    Next perform a cold restart to pick up the module changes.
  3. Finally copy the batch process listing below to a file named batch.idoc in the resources/ directory of your module. Edit the fls instruction to point to a directory in your filesystem and, if you wish, change the regex filter to match different file names.
  4. Start the process by requesting the following URI with a web-browser http://localhost:8080/batch.idoc.

DPML Script Code

<idoc>
  <seq>
    <comment> ****************************************** A Batch Processing Pattern. This example finds all system.xml documents and counts the number of elements they contain. You can adapt it to suit your needs. ****************************************** </comment>
    <comment> *********** Use File LS accessor to list files. o Modify the root for your filesystem o Modify the filter regex to target other XML files The result is a tree of matching resources each with a uri element containing the URI of the resource. We'll use this as the source for the batch process. *********** </comment>
    <instr>
      <type>fls</type>
      <operator>
        <fls>
          <root>file:///home/pjr/dev/</root>
          <filter>.*system.xml</filter>
          <recursive />
          <uri />
        </fls>
      </operator>
      <target>var:fls</target>
    </instr>
    <comment> ************* Prepare a results document ************** </comment>
    <instr>
      <type>copy</type>
      <operand>
        <results />
      </operand>
      <target>var:results</target>
    </instr>
    <comment> *********** Start batch processing loop *********** </comment>
    <while>
      <comment> *********** Loop condition - do processing sequence while there's a file URI left to process *********** </comment>
      <cond>
        <instr>
          <type>xpatheval</type>
          <operand>var:fls</operand>
          <operator>
            <xpath>/descendant::uri[1]</xpath>
          </operator>
          <target>this:cond</target>
        </instr>
      </cond>
      <seq>
        <comment> *********** Copy the URI fragment to a variable and log it to show progress *********** </comment>
        <instr>
          <type>copy</type>
          <operand>var:fls#xpointer(/descendant::uri[1])</operand>
          <target>var:uri</target>
        </instr>
        <instr>
          <type>log</type>
          <operand>var:uri</operand>
        </instr>
        <comment> *********** Main Process - We could do anything we liked here including executing another DPML process or modifying the target file in some way. Here we simply count the elements in the file. *********** </comment>
        <instr>
          <type>xquery</type>
          <operator>
            <xquery> (: ********* Declare the external URI variable and extract the file URI to $file variable ********* :) declare variable $uri as node() external; declare variable $file {$uri/uri/text()}; (: ******* Return a fragment: Quote back the URI fragment and add a count element with the number of elements contained in the target document ******* :) &lt;result&gt; {$uri} &lt;count&gt; {count(doc($file)/descendant::*)} &lt;/count&gt; &lt;/result&gt; </xquery>
          </operator>
          <uri>var:uri</uri>
          <target>var:result</target>
        </instr>
        <comment> *********** Append the xquery result to our cumulative var:results document *********** </comment>
        <instr>
          <type>stm</type>
          <operand>var:results</operand>
          <operator>
            <stm:group xmlns:stm="http://1060.org/stm">
              <stm:append xpath="/results">
                <stm:param xpath="/result:sequence/result:element/result" />
              </stm:append>
            </stm:group>
          </operator>
          <param>var:result</param>
          <target>var:results</target>
        </instr>
        <comment> ********** Exception: Catch any processing exceptions... ********** </comment>
        <exception>
          <comment> ********** Since this is a dumb example we'll simply log the exception, you can add more extensive error handling for your process if required... ********** </comment>
          <instr>
            <type>log</type>
            <operand>this:exception</operand>
          </instr>
        </exception>
        <comment> *********** Remove the first URI from the file listing before starting next iteration of the loop. If this isn't done we'll have an infinite loop!!! *********** </comment>
        <instr>
          <type>stm</type>
          <operand>var:fls</operand>
          <operator>
            <stm:group xmlns:stm="http://1060.org/stm">
              <stm:delete xpath="/descendant::uri[1]" />
            </stm:group>
          </operator>
          <target>var:fls</target>
        </instr>
      </seq>
    </while>
    <comment> *********** All done. Style the results for presentation. *********** </comment>
    <instr>
      <type>xslt</type>
      <operand>var:results</operand>
      <operator>
        <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
          <xsl:output method="html" />
          <xsl:template match="/results">
            <html>
              <body>
                <h1>Batch Results</h1>
                <table>
                  <tr bgcolor="#aaaaaa">
                    <td>file</td>
                    <td>elements</td>
                  </tr>
                  <xsl:for-each select="result">
                    <tr>
                      <td>
                        <xsl:value-of select="uri" />
                      </td>
                      <td>
                        <xsl:value-of select="count" />
                      </td>
                    </tr>
                  </xsl:for-each>
                </table>
              </body>
            </html>
          </xsl:template>
        </xsl:stylesheet>
      </operator>
      <target>this:response</target>
    </instr>
  </seq>
</idoc>

Deadlock Detector Exception

Searching a filesystem may take a long time and may exceed the time NetKernel allows a process to work before raising the NetKernel Deadlock Detector exception. If this happens, simply increase the deadlock detection period in your System Configuration.

© 2003-2007, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.