Short Description |
Ports |
Metadata |
XMLReader Attributes |
Details |
Examples |
Best Practices |
Compatibility |
See also |
XMLReader reads data from XML files using DOM technology. It can also read data from compressed files, input port, and dictionary.
Which XML Component? | |
---|---|
Generally, use XMLExtract. It is fast and has GUI to map elements to records. It is based on SAX. XMLReader can use more complex XPath expressions than XMLExtract, e. g. it allows you you to reference siblings. On the other hand, this XMLReader is slower and needs more memory than XMLExtract. XMLReader is based on DOM. XMLReader supersedes the original XMLXPathReader. XMLXPathReader can use more complex XPath expressions than XMLExtract. XMLXPathReader uses DOM. |
Component | Data source | Input ports | Output ports | Each to all outputs | Different to different outputs [1] | Transformation | Transf. req. | Java | CTL | Auto-propagated metadata |
---|---|---|---|---|---|---|---|---|---|---|
XMLReader | XML file | 0-1 | 1-n | |||||||
[1] XMLReader, XMLExtract and XMLXPathReader send data to ports as defined in their Mapping or Mapping URL attribute. |
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | For port reading. See Reading from Input Port. | One field (byte , cbyte ,
string ). | |
Output | 0 ... n-1 | For correct data records. Connect more than one output ports if your mapping requires that. | Any | |
n | Error port | Restricted format. See Metadata. |
XMLReader does not propagate metadata.
XMLReader has metadata templates on error port. There are two templates: XMLReader_TreeReader_ErrPortWithoutFile and XMLReader_TreeReader_ErrPortWithFile.
Table 53.15. Error Metadata for XMLReader
Field number | Field name | Data type | Description |
---|---|---|---|
0 | port | integer | number of the output port where errors occurred |
1 | recordNumber | integer | record number (per source and port) |
2 | fieldNumber | integer | field number |
3 | fieldName | string | field name |
4 | value | string | value which caused the error |
5 | message | string | error message |
6 | file | string | source name; This field is optional |
Input metadata has one field with datatype byte
, cbyte
or string
.
The metadata on each of the output ports does not need to be the same. Each of these metadata can use Autofilling Functions.
If you intend to use the last output port for error logging, metadata has to have a fixed format. Field names can be arbitrary, field types must be same as from the template.
Attribute | Req | Description | Possible values | |||
---|---|---|---|---|---|---|
Basic | ||||||
File URL | yes | Specifies which data source(s) will be read (XML file, input port, dictionary). See Supported File URL Formats for Readers. | ||||
Charset | Encoding of records that are read. When reading from files,
the charset is detected automatically (unless you specify it yourself).
| ISO-8859-1 (default) | <other encodings> | ||||
Data policy | Determines what should be done when an error occurs. See Data Policy for more information. | Strict (default) | Controlled | Lenient | ||||
Mapping | [1] | Mapping the input XML structure to output ports. See Mapping Definition for more information. | ||||
Mapping URL | [1] | External text file containing the mapping definition. See Mapping Definition for more information. | ||||
Implicit mapping |
If true, map element values to the fields having same name in record.
Example: An element (salary ) is automatically mapped onto field of the same name
(salary ).
| false (default) | true | ||||
Advanced | ||||||
XML features | Sequence of individual true /false expressions
related to XML features which should be validated.
The expressions are
separated from each other by semicolon. See XML Features for more
information. | |||||
[1] One of these has to be specified. If both are specified, Mapping URL has higher priority. |
Mapping Definition |
Context Tag Attributes |
Mapping Tag Attributes |
Input Mapping Attributes |
Reading Multivalue Fields |
Mapping Input Fields |
Records and fields to be send out to the output ports are specified using xml elements and attributes.
Each Context
element corresponds to one output port attached.
Each Mapping
element defines a mapping to one field.
See the example below.
Example 53.8. Mapping in XMLReader
<Context xpath="/employees/employee" outPort="0"> <Mapping nodeName="salary" cloverField="basic_salary"/> <Mapping xpath="name/firstname" cloverField="firstname"/> <Mapping xpath="name/surname" cloverField="surname"/> <Context xpath="child" outPort="1" parentKey="empID" generatedKey="parentID"/> <Context xpath="benefits" outPort="2" parentKey="empID;jobID" generatedKey="empID;jobID" sequenceField="seqKey" sequenceId="Sequence0"> <Context xpath="financial" outPort="3" parentKey="seqKey" generatedKey="seqKey"/> </Context> <Context xpath="project" outPort="4" parentKey="empID;jobID" generatedKey="empID;jobID"> <Context xpath="customer" outPort="5" parentKey="projName;projManager;inProjectID;Start" generatedKey="joinedKey"/> </Context> </Context>
Nested structure of <Context>
tags is
similar to the nested structure of XML elements in input XML files.
However, Mapping attribute does not need to copy all XML structure, it can start at the specified level inside the whole XML file.
The Mapping definition is specified in the Mapping URL attribute or in the Mapping attribute.
Every Mapping definition consists of
<Context>
tags.
Each <Context>
tag defines a mapping of particular xml subtree to record being sent to the specified output port.
Each <Context>
tag can surround a serie
of nested <Mapping>
tags.
These allow to map XML elements or attributes to Clover fields.
Each of these <Context>
and
<Mapping>
tags contains some Context Tag Attributes and Mapping Tag Attributes, respectively.
Empty Context Tag (Without a Child)
<Context xpath="xpathexpression" />
Non-Empty Context Tag (Parent with a Child)
<Context xpath="xpathexpression">
(nested Context and Mapping elements (only children,
parents with one or more children, etc.)
</Context>
Empty Mapping Tag (Renaming Tag)
xpath
is used:
<Mapping xpath="xpathexpression" />
nodeName
is used:
<Mapping nodeName="elementname" />
xpath |
outPort |
parentKey |
generatedKey |
sequenceId |
sequenceField |
namespacesPath |
xpath
Required
The xpath expression can be any XPath query.
Example: xpath="/tagA/.../tagJ"
outPort
Optional
Number of output port to which data is sent. If not defined, no data from this level of Mapping is sent out using such level of Mapping.
Example: outPort="2"
parentKey
Both parentKey
and
generatedKey
must be specified.
Sequence of metadata fields on the next parent level separated
by semicolon, colon, or pipe. Number and data types of all these
fields must be the same in the generatedKey
attribute or all values are concatenated to create a unique string
value. In such a case, key has only one field.
Example:
parentKey="first_name;last_name"
Equal values of these attributes assure that such records can be joined in the future.
generatedKey
Both parentKey
and
generatedKey
must be specified.
Sequence of metadata fields on the specified level separated
by semicolon, colon, or pipe. Number and data types of all these
fields must be the same in the parentKey
attribute or all values are concatenated to create a unique string
value. In such a case, key has only one field.
Example:
generatedKey="f_name;l_name"
Equal values of these attributes assure that such records can be joined in the future.
sequenceId
When a pair of parentKey
and
generatedKey
does not insure unique
identification of records, a sequence can be defined and
used.
Id of the sequence.
Example: sequenceId="Sequence0"
sequenceField
When a pair of parentKey
and
generatedKey
does not insure unique
identification of records, a sequence can be defined and
used.
A metadata field on the specified level in which the sequence
values are written. Can serve as parentKey
for
the next nested level.
Example: sequenceField="sequenceKey"
namespacePaths
Optional
Default namespaces that should be used for the
xpath
attribute specified in the
<Context>
tag.
Pattern:
namespacePaths='prefix1="URI1";...;prefixN="URIN"'
Example:
namespacePaths='n1="http://www.w3.org/TR/html4/";n2="http://ops.com/"'
.
Note | |
---|---|
Remember that if the input XML file contains a default
namespace, this |
xpath |
nodeName |
cloverField |
trim |
namespacePaths |
xpath
Either xpath
or nodeName
must be specified in <Mapping>
tag.
XPath query.
Example: xpath="tagA/.../salary"
nodeName
Either xpath
or nodeName
must be specified in <Mapping>
tag. Using
nodeName
is faster than using
xpath
.
XML node that should be mapped to Clover field.
Example: nodeName="salary"
cloverField
Required
Clover field to which XML node should be mapped.
Name of the field in the corresponding level.
Example: cloverField="SALARY"
trim
Optional
Specifies whether leading and trailing white spaces should be removed. By default, it removes both leading and trailing white spaces.
Example: trim="false"
(white spaces will
not be removed)
namespacePaths
.
Optional
Default namespaces that should be used for the
xpath
attribute specified in the
<Mapping>
tag.
Pattern:
namespacePaths='prefix1="URI1";...;prefixN="URIN"'
Example:
namespacePaths='n1="http://www.w3.org/TR/html4/";n2="http://ops.com/"'
.
Note | |
---|---|
Remember that if the input XML file contains a default
namespace, this |
cloverField
Required
Output Clover field to input should be mapped.
Example: cloverField="SALARY"
inputField
Required
Input field to be used.
Example: inputField="SALARY"
You can read only lists, however (see Multivalue Fields).
Note | |
---|---|
Reading maps is handled as reading pure |
Example 53.9. Reading lists with XMLReader
An example input file containing these elements (just a code snippet):
... <attendees>John</attendees> <attendees>Vicky</attendees> <attendees>Brian</attendees> ...
can be read back by the component with this mapping:
<Mapping xpath="attendees" cloverField="attendanceList"/>
where attendanceList
is a field of your metadata. The metadata
has to be assigned to the component's output edge. After you run the graph, the field
will get populated by XML data like this (that what you will see in View data):
[John,Vicky,Brian]
If you use input port reading in discrete
or source
mode,
you can map particular input fields to output fields using inputField
attribute.
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <Context xpath="/rootPath" outPort="0"> <Mapping cloverField="field2" inputField="field2"/> </Context>
Reading a XML File |
Mapping Input Fields to Output |
Sending Nested Elements to Different Output Ports |
Reading XML with Namespace |
This example shows the basic usage of XMLReader.
You have a retail.xml
file with data about your retail sale.
<?xml version="1.0" ?> <orders> <order id="1"> <firstname>John</firstname> <surname>Smith</surname> <emails> <email>[email protected]</email> <email>[email protected]</email> </emails> <item> <goodName>table</goodName> <items>1</items> </item> </order> <order id="2"> <firstname>Ellen</firstname> <surname>Smith</surname> <emails> <email>[email protected]</email> </emails> <item> <goodName>chair</goodName> <items>3</items> </item> <item> <goodName>tablecloth</goodName> <item>2</item> </item> </order> </orders>
Create a list containing order_id, customer firstname, surname and email(s).
Create a metadata having 4 fields: order_id (integer), name (string), surname (string), email (string[]).
Set up attributes File URL, Implicit mapping and Mapping.
Attribute | Value |
---|---|
File URL | ${DATAIN_DIR}/retail.xml |
Mapping | See the xml below. |
Implicit mapping | true |
If you set Implicit mapping to true
,
fields name and surname are populated by values of corresponding elements.
Content of Mapping attribute:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <Context xpath="/orders/order" outPort="0"> <Mapping cloverField="order_id" xpath="@id"/> <Mapping cloverField="email" xpath="./emails/email"/> </Context>
The XMLReader will send following 2 records to it's first output port.
1 John Smith [[email protected], [email protected]] 2 Ellen Smith [[email protected]]
This example shows reading input file while some input fields are mapped to output.
Given a list of customers and paths to the files with orders.
C001|./file001.xml C002|./file002.xml
Each file can contain one or more products:
<?xml version="1.0" ?> <products> <product>A</product> <product>B</product> </products>
Create a list with customers and products:
C001|A C001|B C002|E
Use File URL, Charset, and Mapping attributes.
Attribute | Value |
---|---|
File URL | port:$0.filename:source |
Charset | UTF-8 |
Mapping | See the code below |
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <Context xpath="/products/product" outPort="0"> <Mapping cloverField="productID" xpath="."/> <Mapping cloverField="customerID" inputField="ID"/> </Context>
This example shows reading input file with nested elements. The nested elements on different levels are sent out to the different output ports.
The input file countries-and-counties.xml
contains list of countries.
Each country has a name and contains several counties.
Each county has a name.
<?xml version="1.0"?> <countries> <country> <name>England</name> <county> <name>Bristol</name> </county> <county> <name>Cumbria</name> </county> <county> <name>Devon</name> </county> </country> <country> <name>Scotland</name> <county> <name>Edinburgh</name> </county> <county> <name>Fife</name> </county> </country> </countries>
Make a list of countries, and make a list of counties with corresponding countries.
Assign metadata country with field countryName to the edge on the first output port.
Assign metadata county with fields countryName and countyName to the edge on the second output port.
Use File URL, Charset, and Mapping attributes.
Attribute | Value |
---|---|
File URL | ${DATAIN_DIR}/countries-and-counties.xml |
Charset | UTF-8 |
Mapping | See the code below |
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <Context xpath="/countries/country" outPort="0"> <Mapping cloverField="countryName" xpath="name"/> <Context xpath="./county" outPort="1"> <Mapping cloverField="countryName" xpath="../name" /> <Mapping cloverField="countyName" xpath="name"/> </Context> </Context>
The records sent to the first output port are:
England Scotland
The records sent to the second output port are:
England | Bristol England | Cumbria England | Devon Scotland | Edinburgh Scotland | Fife
This example shows you way to read xml that contains different namespaces.
A web page contains svg graphics and links to other web pages.
The links (<a>
) are of two namespaces: xhtml
and svg
.
Get URLs of links that are from svg image.
<html xmlns="http://www.w3.org/1999/xhtml"> <head> </head> <body> <svg width="1024" height="768" xmlns="http://www.w3.org/2000/svg" version="1.1"> <a href="http://www.cloveretl.com"> <circle cx="512" cy="384" r="80"/> </a> </svg> <p> <a href="http://www.example.com">www.example.com</a> </p> </body> </html>
Use File URL, Charset, and Mapping attributes.
Attribute | Value |
---|---|
File URL | ${DATAIN_DIR}/page.xhtml |
Charset | UTF-8 |
Mapping | See the code below |
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <Context xpath="/xhtml:html//svg:a" namespacePaths='xhtml="http://www.w3.org/1999/xhtml";svg="http://www.w3.org/2000/svg"' outPort="0"> <Mapping cloverField="field1" xpath="@href"/> </Context>
The output contains on URL:
http://www.cloveretl.com
To avoid typing lines like:
<Mapping xpath="salary" cloverField="salary"/>
Switch on the implicit mapping and use explicit mapping only to populate fields with data from distinct elements.
The <Context>
element should be used only if you intend to send record corresponding to subtree to the output.
Use
<Context xpath="/elem1/elem11" outPort="0"> <Mapping cloverField="field1" xpath="elem111"/> </Context>
instead of
<Context xpath="/elem1"> <Context xpath="elem11" outPort="0"> <Mapping cloverField="field1" xpath="elem111"/> </Context> </Context>
We recommend users to explicitly specify Charset.
XMLReader is available since CloverETL 3.3.x.
As of Clover 3.3, reading multivalue fields is supported - you can read only lists, however (see Multivalue Fields).
Since Clover 4.1.0-M1 you can assign values of fields from input port to fields on output port.