ARP can be used both as a Jena subsystem, or as a standalone RDF/XML parser. This document gives a quick guide to using ARP standalone.
To load an RDF file:
Xerces is used for parsing the XML. The SAXEvents generated by Xerces are then analysed as RDF by ARP. It is possible to use a different source of SAX events.
Errors may occur in either the XML or the RDF part.
ARP arp = new ARP(); // initialisation - uses ARPConfig interface only. arp.getOptions().setLaxErrorMode(); arp.getHandlers().setErrorHandler(new ErrorHandler(){ public void fatalError(SAXParseException e){ // TODO code } public void error(SAXParseException e){ // TODO code } public void warning(SAXParseException e){ // TODO code } }); arp.getHandlers().setStatementHandler(new StatementHandler(){ public void statement(AResource a, AResource b, ALiteral l){ // TODO code } public void statement(AResource a, AResource b, AResource l){ // TODO code } }); // parsing. try { // Loading fixed input ... arp.load(new StringReader( "<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n" +"<rdf:Description><rdf:value rdf:parseType='Literal'>" +"<b>hello</b></rdf:value>\n" +"</rdf:Description></rdf:RDF>" )); } catch (IOException ioe){ // something unexpected went wrong } catch (SAXParseException s){ // This error will have been reported } catch (SAXException ss) { // This error will not have been reported. }
ARP reports events concerning:
User code is needed to respond to any of these events of interest. This is written by implementing any of the relevant interfaces: StatementHandler, org.xml.sax.ErrorHandler, NamespaceHandler, and ExtendedHandler.
An individual handler is set by calling the getHandlers method on the ARP instance. This returns an encapsulation of all the handlers being used. A specific handler is set by calling the appropriate set...Handler method on that object, e.g. setStatementHandler.
All the handlers can be copied from one ARP instance to another by using the setHandlersWith method:
ARP from, to; // initialize from and to // ... to.setHandlersWith(from.getHandlers());
The error handler reports both XML and RDF errors, the former detected by Xerces. See ARPHandlers.setErrorHandler for details of how to distinguish between them.
ARP can be configured to treat most error conditions as warnings or to be ignored, and to treat some non-error conditions as warnings or errors.
In addition, the behaviour in response to input that does not have an
<rdf:RDF>
root element is configurable: either to
treat the whole file as RDF anyway, or to scan the file looking
for embedded <rdf:RDF>
elements.
As with the handlers, there is an options object that encapsulates these settings. It can be accessed using getOptions, and then individual settings can be made using the methods in ARPOptions.
It is also possible to copy all the option settings from one ARP instance to another:
ARP from, to; // initialize from and to // ... to.setOptionsWith(from.getOptions());
The I/O how-to gives some more detail about the options settings, although it assumes the use of the Jena RDFReader interface.
It is possible to interrupt an ARP thread. See the I/O how-to for details.
It is possible to use ARP with other SAX input sources, e.g. from a non-Xerces parser, or from an in-memory XML source, such as a DOM tree.
Instead of an ARP instance, you create an instance of SAX2RDF using the newInstance method. This can be configured just like an ARP instance, following the initialization section of the sample code.
This is used like a SAX2Model instance as described elsewhere.
For very large files, ARP does not use any additional
memory except when either the
ExtendedHandler.discardNodesWithNodeID
returns false or when the
AResource.setUserData
method has been
used.
In these cases ARP needs to remember the rdf:nodeID
usage through the file life time.