Jena2 Database Interface - Porting Notes
Jena2 was designed to be portable to other SQL database engines. The Jena2
database dependencies are isolated in two places: the driver and the SQL
scripts. These notes provide a brief overview of what is required to port Jena2
to a new SQL database. However, the persistence architecture is fairly general
and in the future we want to investigate support for other persistence engines
such as Berkeley DB and a native Java store.
This document is known to be incomplete but is provided in the spirit of
something is better than nothing. Porting to Jena2 is possible. DB2 was ported
to a preliminary release of Jena2 without too much effort. If you are working on
a port, please do not hesitate to ask for help by posting to the jena-devel
list.
Porting Parameters
Below is a list of parameters you need to know about your database engine
before starting a port to Jena2.
- Object Column Type and Length
This is the database column type that will be used to store character data
in statement tables. It must be support case-sensitive matching. Typically,
this is a Varchar column. But, in some some systems, Varchar is
case-insensitive (e.g., MySQL) or there are variants of Varchar (e.g.,
Oracle). The maximum size of this column type should be determined in order to
set the LONG_OBJECT_LENGTH_MAX value. LONG_OBJECT_LENGTH should be set to be
equal or less than this maximum value depending on the degree of
denormalization desired.
Index Column Type and Length
This is the database column type that will be used to store character data
in the long object tables. Objects (literals and resources) whose length
exceeds LONG_OBJECT_LENGTH are not stored in the statement table but are
stored in long object tables (jena_long_lit, jena_long_uri). Each long value
is decomposed into a head and a tail. The head is that part of the long object
that can be indexed. The tail is the remaining (possibly large) portion that
is stored as a blob. Also computed is a content-based hash over the tail. The
head and hash enable exact matching.
This index column type is then the column type for the head of the long
object. Typically, it is the same type as the object column type. The maximum
length of this column type determines INDEX_KEY_LENGTH_MAX. INDEX_KEY_LENGTH
should be set to be less than or equal this maximum value.
Maximum Table and Index Name Length
The maximum number of characters permitted in a table or index name is
needed for defining TABLE_NAME_LENGTH_MAX.
Table and Index Name Case Sensitivity
Some database engines require table and index names to be in upper case.
Set DB_NAMES_TO_UPPER appropriately for your database.
Sequences
Database engines have different ways of supporting automatic generation of
unique identifiers. Some systems use sequences, other systems have special
column types. It is assumed that the engine has some such mechanism. Jena will
use it to generate identifiers for long objects, graphs, etc. In some cases,
identifiers are allocated prior to inserting a row and in other cases the
identifier is allocated during or after row insertion. PRE_ALLOCATE_ID will
need to be set appropriately.
Transactions
It is assumed that the database engine supports transactions (bracketed by
begin and end operations). But, some engines only support atomic operations or
have variants that only support atomic operations (e.g., MySQL). Set
IS_XACT_DB appropriately.
Quoting Character
Database engines differ in their choice of the quote character (e.g.,
single, double) to delimit character strings. Set QUOTE_CHAR to the quote
character for your engine.
- Strings Trimmed
Some database engines truncate trailing white space at the end of
character strings. If the column type for your object or index columns
types does this, set STRINGS_TRIMMED to true and define a string termination
character (EOS) to prevent truncation of white space.
The driver code requires a few other miscellaneous parameters, e.g., database
type, driver name, SQL file name. These should all be self-explanatory.
SQL Scripts
In the etc subdirectory of the Jena2 distribution there is a file of
SQL scripts, one for each supported database engine. These are SQL commands for
fixed, well known database operations, e.g., to create a statement table, to
retrieve a long object. To add a new engine, simply copy and edit one of the
existing script files. The edits should be consistent with the parameter values.
You may need to define more or fewer parameters for certain operations such as
table creation. It may help to look at a few different script files as examples.
Driver Code
Every database engine requires a Jena driver. Jena2 provides a generic driver
(jena/db/DriverRDB.java) that provides default implementations for most of the
driver methods. In the simplest case, the engine-specific driver only needs to
override a few methods. As an example, the PostgreSQL driver is probably the
simplest and the MySQL driver is almost as simple. The Oracle driver is more
complicated because of the way Oracle handles large objects. The treatment of
large objects is probably the most difficult part of porting.
Another customization that may be required for a new database engine is the
dynamic generation of SQL code. For Fastpath queries and some types of
reification queries, Jena2 generates SQL dynamically. If problems are
encountered with these functions during development or testing, the generation
code may need to be overridden in the engine-specific driver. The generation
code is in DriverRDB.java and has method names of the form genSQLxxx.
Testing
Once the driver and SQL file have been created, the next step is to check
that the ModelRDB tests pass. There are a number of JUnit tests in jena/db/test.
These may be run from the script test-db.sh
, after modification for
driver details and database name.