Normally, both ARP and Jena are used to read files either from the local machine or from the Web. A different use case, addressed here, is when the XML source is available in-memory in some way. In these cases, ARP and Jena can be used as a SAX event handler, turning SAX events into triples, or a DOM tree can be parsed into a Jena Model.
To read an arbitrary SAX source as triples to be added into a Jena Model, it is not possible to use a Model.read() operation. Instead, you construct a SAX event handler of class SAX2Model, using the create method, install these as the handler on your SAX event source, and then stream the SAX events. It is possible to have fine-grained control over the SAX events, for instance, by inserting or deleting events, before passing them to the SAX2Model handler.
This code uses the Xerces parser as a SAX event stream, and adds the triple to a Model using default options.
// Use your own SAX source. XMLReader saxParser = new SAXParser(); // set up SAX input InputStream in = new FileInputStream("kb.rdf"); InputSource ins = new InputSource(in); ins.setSystemId(base); Model m = ModelFactory.createDefaultModel(); String base = "http://example.org/"; // create handler, linked to Model SAX2Model handler = SAX2Model.create(base, m); // install handler on SAX event stream SAX2RDF.installHandlers(saxParser, handler); try { try { saxParser.parse(ins); } finally { // MUST ensure handler is closed. handler.close(); } } catch (SAXParseException e) { // Fatal parsing errors end here, // but they will already have been reported. }
If your SAX event source is a subclass of XMLReader, then the installHandlers static method can be used as shown in the sample. Otherwise, you have to do it yourself. The installHandlers code is like this:
static public void installHandlers(XMLReader rdr, XMLHandler sax2rdf) throws SAXException { rdr.setEntityResolver(sax2rdf); rdr.setDTDHandler(sax2rdf); rdr.setContentHandler(sax2rdf); rdr.setErrorHandler(sax2rdf); rdr.setFeature("http://xml.org/sax/features/namespaces", true); rdr.setFeature( "http://xml.org/sax/features/namespace-prefixes", true); rdr.setProperty( "http://xml.org/sax/properties/lexical-handler", sax2rdf); }
For some other SAX source, the exact code will differ, but the required operations are as above.
The SAX2Model handler supports the setErrorHandler method, from the Jena RDFReader interface. This is used in the same way as that method to control error reporting.
A specific fatal error, new in Jena 2.3, is ERR_INTERRUPTED, which indicates that the current Thread received an interrupt. This allows long jobs to be aborted on user request.
The SAX2Model handler supports the setProperty method, from the Jena RDFReader interface. This is used in nearly the same way to have fine grain control over ARPs behaviour, particularly over error reporting, see the I/O howto. Setting SAX or Xerces properties cannot be done using this method.
If you are only treating some document subset as RDF/XML
then it is necessary to ensure that ARP knows the correct
value for xml:lang
and desirable
that it knows the correct mappings of namespace prefixes.
There is a second version of the
create
method,
which allows specification of the xml:lang
value from
the outer context. If this is inappropriate it is possible,
but hard work, to synthesis an appropriate SAX event.
For the namespaces prefixes, it is possible to call the startPrefixMapping SAX event, before passing the other SAX events, to declare each namespace, one by one. Failure to do this is permitted, but, for instance, a Jena Model will then not know the (advisory) namespace prefix bindings. These should be paired with endPrefixMapping events, but nothing untoward is likely if such code is omitted.
As with ARP, it is possible to use this functionality, without using other Jena features, in particular, without using a Jena Model. Instead of using the class SAX2Model, you use its superclass SAX2RDF. The create method on this class does not provide any means of specifying what to do with the triples. Instead, the class implements the ARPConfig interface, which permits the setting of handlers and parser options, as described in the documentation for using ARP without Jena.
Thus you need to:
None of the approaches listed here work with Java 1.4.1_04. We suggest using Java 1.4.2_04 or greater for this functionality. This issue has no impact on any other Jena functionality.
The DOM2Model subclass of SAX2Model, allows the parsing of a DOM using ARP. The procedure to follow is:
DOM2Model is a subclass of SAX2RDF, and handlers etc. can be set on the DOM2Model as for SAX2RDF. Using a null model as the argument to the factory indicates this usage.