Jena2 reification API proposal

version: 1.0
author: Chris Dollin, material from Brian McBride
date: March 31st 2003

1 introduction
1.1 status
1.2 context
2 presentation API
2.1 retrieval
2.2 creation
2.3 equality
2.4 isReified
2.5 fetching
2.6 listing
2.7 removal
2.8 input and output
3 performance

1 introduction

1.1 status

This document describes the reification API in Jena2, following discussions based on the 0.5a document. The essential decision made during that discussion is that reification triples are captured and dealt with by the Model transparently and appropriately.

1.2 context

The first Jena implementation made some attempt to optimise the representation of reification. In particular it tried to avoid so called 'triple bloat', ie requiring four triples to represent the reification of a statement. The approach taken was to make a Statement a subclass of Resource so that properties could be directly attached to statement objects.

There are a number of defects in the Jena 1 approach.

Not everyone in the team was bought in to the approach
The .equals() method for Statements was arguably wrong and also violated the Java requirements on a .equals()
The implied triples of a reification were not present so could not be searched for
There was confusion between the optimised representation and explicit representation of reification using triples
The optimisation did not round trip through RDF/XML using the the writers and ARP.

However, there are some supporters of the approach. They liked:

the avoidance of triple bloat
that the extra reifications statements are not there to be found on queries or ListStatements and do not affect the size() method.

Since Jena was first written the RDFCore WG have clarified the meaning of a reified statement. Whilst Jena 1 took a reified statement to denote a statement, RDFCore have decided that a reified statement denotes an occurrence of a statement, otherwise called a stating. The Jena 1 .equals() methods for Statements is thus inappropriate for comparing reified statements.

The goal of reification support in the Jena 2 implementation are:

to conform to the revised RDF specifications
to maintain the expections of Jena 1; ie they should still be able to reify everything without worrying about triple bloat if they want to
as far as is consistent with 2, to not break existing code, or at least make it easy to transition old code to Jena 2.
to enable round tripping through RDF/XML and other RDF representation langauges
enable a complete standard compliant implementation, but not necessarily as default

2 presentation API

Statement will no longer be a subclass of Resource. Thus a statement may not be used where a resource is expected. Instead, a new interface ReifiedStatement will be defined:

public interface ReifiedStatement extends Resource
    {
    public Statement getStatement();
    // could call it a day at that or could duplicate convenience
    // methods from Statement, eg getSubject(), getInt().
    ...
    }

The Statement interface will be extended with the following methods:

public interface Statement
    ...
    public ReifiedStatement createReifiedStatement();
    public ReifiedStatement createReifiedStatement(String URI);
/* */
    public boolean isReified();
    public ReifiedStatement getAnyReifiedStatement();
/* */
    public RSIterator listReifiedStatements();
/* */
    public void removeAllReifications();
    ...

RSIterator is a new iterator which returns ReifiedStatements. It is an extension of ResourceIterator.

The Model interface will be extended with the following methods:

public interface Model
    ...
    public ReifiedStatement createReifiedStatement(Statement stmt);
    public ReifiedStatement createReifiedStatement(String URI, Statement stmt);
/* */
    public boolean isReified(Statement st);
    public ReifiedStatement getAnyReifiedStatement(Statement stmt);
/* */
    public RSIterator listReifiedStatements();
    public RSIterator listReifiedStatements(Statement stmt);
/* */
    public void removeReifiedStatement(reifiedStatement rs);
    public void removeAllReifications(Statement st);
    ...

The methods in Statement are defined to be the obvious calls of methods in Model. The interaction of those models is expressed below. Reification operates over statements in the model which use predicates rdf:subject, rdf:predicate, rdf:object, and rdf:type with object rdf:Statement.

statements with those predicates are, by default, invisible. They do not appear in calls of listStatements, contains, or uses of the Query mechanism. Adding them to the model will not affect size(). Models that do not hide reification quads will also be available.

2.1 retrieval

The Model::as() mechanism will allow the retrieval of reified statements.

someResource.as( ReifiedStatement.class )

If someResource has an associated reification quad, then this will deliver an instance rs of ReifiedStatement such that rs.getStatement() will be the statement rs reifies. Otherwise a DoesNotReifyException will be thrown. (Use the predicate canAs() to test if the conversion is possible.)

It does not matter how the quad components have arrived in the model; explicitly asserted or by the create mechanisms described below. If quad components are removed from the model, existing ReifiedStatement objects will continue to function, but conversions using as() will fail.

2.2 creation

createReifiedStatement(Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a blank node.

createReifiedStatement(String URI, Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a Resource with the URI given.

2.3 equality

Two reified statements are .equals() iff they reify the same statement and have .equals() resources. Thus it is possible for equal Statements to have unequal reifications.

2.4 isReified

isReified(Statement st) is true iff in the Model of this Statement there is a reification quad for this Statement. It does not matter if the quad was inserted piece-by-piece or all at once using a create method.

2.5 fetching

getAnyReifiedStatement(Statement st) delivers an existing ReifiedStatement object that reifies st, if there is one; otherwise it creates a new one. If there are multiple reifications for st, it is not specified which one will be returned.

2.6 listing

listReifiedStatements() will return an RSIterator which will deliver all the reified statements in the model.

listReifiedStatements( Statement st ) will return an RSIterator which will deliver all the reified statements in the model that reifiy st.

2.7 removal

removeReifiedStatement(ReifiedStatement rs) will remove the reification rs from the model by removing the reification quad. Other reified statements with different resources will remain.

removeAllReifications(Statement st) will remove all the reifications in this model which reify st.

2.8 input and output

The writers will have access to the complete set of Statements and will be able to write out the quad components.

The readers need have no special machinery, but it would be efficient for them to be able to call createReifiedStatement when detecting an reification.

3 performance

Jena1's "statements as resources" approach avoided triples bloat by not storing the reification quads. How, then, do we avoid triple bloat in Jena2?

The underlying machinery is intended to capture the reification quad components and store them in a form optimised for reification. In particular, in the case where a statement is completely reified, it is expected to store only the implementation representation of the Statement.

createReifiedStatement is expected to bypass the construction and detection of the quad components, so that in the "usual case" they will never come into existance.

The details of this are described in a companion document.