XML Pipelines: What, Why, Where, How?
Employing a pipeline approach to processing XML is increasingly regarded
as a robust and flexible design pattern for XML systems. This guide
presents an interactive discussion of the flexible ways in which XML pipelines
may be developed on NetKernel - it offers a horizontal slice across a number of NetKernel
technologies providing a general introduction within the context of XML pipelines.
-
Part 1 shows the use of a simple 'recursive pull' pattern to create an
XQuery pipeline.
-
Part 2 shows how using a declarative process scheduling language (DPML) allows pipelines to
be decoupled and flexibly connected.
-
Part 3 shows how NetKernel makes pipeline processing robust by providing in-pipeline
exception handling and pipeline breakpoints.
-
Part 4 shows heterogeneous pipelines connecting multiple pipelines sequenced using either
declarative or procedural languages. It also shows how NetKernel allows a pipeline to transparently
intermingle XML object models.
-
Part 5 shows forking and joining of asynchronous pipelines and shows how pipeline throughput can
be improved when processing XML obtained from high-latency sources.
Demo Application Specification
In order to demonstrate the design patterns this guide uses the same application process in each section.
The specification for the pipeline is as follows:
- Source
lear.xml
( the XML mark-up of Shakespeare's King Lear) and extract ACT 1
- Extract all
<SPEECH>
elements
- Extract all
<SPEECH>
elements where the <SPEAKER>
is 'GLOUCESTER'
- Extract all
<SPEECH>
elements where a child <LINE>
contains 'France'
The result of each stage of this pipeline will be
- Act 1 of King Lear
- All speeches from Act 1
- All speeches by GLOUCESTER from Act 1
- All speeches by GLOUCESTER from Act 1 containing the word 'France'
Experienced XML experts will rapidly recognise that the final result could be obtained in numerous ways using a single
operation (XSLT, XQuery etc etc). The point is not to demonstrate the specifics of a given XML technology but rather
to demonstrate the patterns that may be used for pipelining XML technologies and that literally any combination of XML
processing stages can be flexibly configured into robust pipeline units.
Source Code
All of the source files for these pipeline examples are provided demos/xquery/
directory of the demo-xml-tech-x.x.x.jar in the <install>/modules/ directory.
You may unzip this file to a new directory and experiment with the code. In order to pick up the unzipped module you must edit the <install>/etc/deployedModules.xml
file and change the entry for modules/demo-xml-tech-x.x.x.jar to your unzipped directory. After changing the deployedModules file you should perform a
cold restart
.
Note: <install> means the path to your base installation directory of NetKernel.