Jena2 Database Interface - Porting Notes

Jena2 was designed to be portable to other SQL database engines. The Jena2 database dependencies are isolated in two places: the driver and the SQL scripts. These notes provide a brief overview of what is required to port Jena2 to a new SQL database. However, the persistence architecture is fairly general and in the future we want to investigate support for other persistence engines such as Berkeley DB and a native Java store.

This document is known to be incomplete but is provided in the spirit of something is better than nothing. Porting to Jena2 is possible. DB2 was ported to a preliminary release of Jena2 without too much effort. If you are working on a port, please do not hesitate to ask for help by posting to the jena-devel list.

Porting Parameters

Below is a list of parameters you need to know about your database engine before starting a port to Jena2.

Object Column Type and Length: This is the database column type that will be used to store character data in statement tables. It must be support case-sensitive matching. Typically, this is a Varchar column. But, in some some systems, Varchar is case-insensitive (e.g., MySQL) or there are variants of Varchar (e.g., Oracle). The maximum size of this column type should be determined in order to set the LONG_OBJECT_LENGTH_MAX value. LONG_OBJECT_LENGTH should be set to be equal or less than this maximum value depending on the degree of denormalization desired.

Index Column Type and Length: This is the database column type that will be used to store character data in the long object tables. Objects (literals and resources) whose length exceeds LONG_OBJECT_LENGTH are not stored in the statement table but are stored in long object tables (jena_long_lit, jena_long_uri). Each long value is decomposed into a head and a tail. The head is that part of the long object that can be indexed. The tail is the remaining (possibly large) portion that is stored as a blob. Also computed is a content-based hash over the tail. The head and hash enable exact matching.; This index column type is then the column type for the head of the long object. Typically, it is the same type as the object column type. The maximum length of this column type determines INDEX_KEY_LENGTH_MAX. INDEX_KEY_LENGTH should be set to be less than or equal this maximum value.

Maximum Table and Index Name Length: The maximum number of characters permitted in a table or index name is needed for defining TABLE_NAME_LENGTH_MAX.

Table and Index Name Case Sensitivity: Some database engines require table and index names to be in upper case. Set DB_NAMES_TO_UPPER appropriately for your database.

Sequences: Database engines have different ways of supporting automatic generation of unique identifiers. Some systems use sequences, other systems have special column types. It is assumed that the engine has some such mechanism. Jena will use it to generate identifiers for long objects, graphs, etc. In some cases, identifiers are allocated prior to inserting a row and in other cases the identifier is allocated during or after row insertion. PRE_ALLOCATE_ID will need to be set appropriately.

Transactions: It is assumed that the database engine supports transactions (bracketed by begin and end operations). But, some engines only support atomic operations or have variants that only support atomic operations (e.g., MySQL). Set IS_XACT_DB appropriately.

Quoting Character: Database engines differ in their choice of the quote character (e.g., single, double) to delimit character strings. Set QUOTE_CHAR to the quote character for your engine.

Strings Trimmed: Some database engines truncate trailing white space at the end of character strings. If the column type for your object or index columns types does this, set STRINGS_TRIMMED to true and define a string termination character (EOS) to prevent truncation of white space.

The driver code requires a few other miscellaneous parameters, e.g., database type, driver name, SQL file name. These should all be self-explanatory.

SQL Scripts

In the etc subdirectory of the Jena2 distribution there is a file of SQL scripts, one for each supported database engine. These are SQL commands for fixed, well known database operations, e.g., to create a statement table, to retrieve a long object. To add a new engine, simply copy and edit one of the existing script files. The edits should be consistent with the parameter values. You may need to define more or fewer parameters for certain operations such as table creation. It may help to look at a few different script files as examples.

Driver Code

Every database engine requires a Jena driver. Jena2 provides a generic driver (jena/db/DriverRDB.java) that provides default implementations for most of the driver methods. In the simplest case, the engine-specific driver only needs to override a few methods. As an example, the PostgreSQL driver is probably the simplest and the MySQL driver is almost as simple. The Oracle driver is more complicated because of the way Oracle handles large objects. The treatment of large objects is probably the most difficult part of porting.

Another customization that may be required for a new database engine is the dynamic generation of SQL code. For Fastpath queries and some types of reification queries, Jena2 generates SQL dynamically. If problems are encountered with these functions during development or testing, the generation code may need to be overridden in the engine-specific driver. The generation code is in DriverRDB.java and has method names of the form genSQLxxx.

Testing

Once the driver and SQL file have been created, the next step is to check that the ModelRDB tests pass. There are a number of JUnit tests in jena/db/test. These may be run from the script test-db.sh, after modification for driver details and database name.