Jena2 Database Interface - Options for Initialization and Access
The following options are available for use with the persistence subsystem.
For each option Xyz, there are getXyz and setXyz methods in the associated
interface. Some options must be set when initializing (formatting) the database and others may
be set while accessing models.
Database Initialization Options
The following options may only be set before the database is
initialized. To set these options, invoke the associated set method on the database driver. There
is also a get method that may be called at any time to retrieve the option
value. If the database has already been formatted, i.e., if
IDBConnection.isFormatOK() returns
true, then these set methods will throw an
exception. These options are
persisted in the database. When (a model in) a previously formatted database
is opened, the option values in the database override (silently) any
user-specified values.
IRDBDriver Option |
Type |
Default |
Description |
LongObjectLength |
int |
* |
Maximum length of a literal or resource to be stored
in a statement table. |
LongObjectLengthMax |
int |
* |
Maximum
possible value for LongObjectLength. |
IndexKeyLength |
int |
* |
Maximum length of the key in a long object table. |
IndexKeyLengthMax |
int |
* |
Maximum possible value for IndexKeyLength. |
IsTransactionDb |
boolean |
* |
True if the database can support transactions. |
DoCompressURI |
boolean |
false |
If true, do prefix compression on long URIs. |
CompressURILength |
int |
100 |
URIs longer than this length will be compressed (if
doCompressURI is true). |
TableNamePrefix |
String |
jena_ |
The common prefix for all Jena table names in the database. |
* These options are
database-dependent. See the database-specific howto (HSQLDB,
MySQL,
Derby, Oracle,
PostgreSQL,
Microsoft SQL Server) for the default values.
- LongObjectLength
- This defines the maximum length of a value in a statement table where
the value may be either a literal or a resource URI. Values longer than this
length are stored in either the long literals or the long resources table.
Smaller values of LongObjectLength reduce database space consumption at the
cost of increased retrieval time. Each database engine has a maximum
permissible value for LongObjectLength which may be retrieved by calling getLongObjectLengthMax().
An attempt to set a larger length
will throw an exception.
- Note that LongObjectLength is an upper bound due to
the database encoding used for values. For example, if LongObjectLength is
ten, a literal string of ten or even nine characters would be stored as a
long object because, when stored, the string value is encoded with type
information which makes the actual stored value even longer.
- IndexKeyLength
- This defines the maximum length of an index key for long object values
(literals or resource URIs). Long objects are stored in three parts, a head,
a hash and a tail. The head is the prefix of the long value that can be
indexed. The hash is a content-based hash value of the remainder (the tail).
Exact matching is done by comparing the head and the hash value. In the
future, we plan to do prefix matching on the head for inequality and range
queries.
- Generally, there is no need to change IndexKeyLength. However,
smaller values could reduce database space consumption at the expense of
reducing the (future) effectiveness of inequality and range queries. Note
that IndexKeyLength is an upper bound due to the database encoding (see
comments in LongObjectLength). Each database engine has a maximum
permissible value for IndexKeyLength which may be retrieved by calling IndexKeyLengthMax().
An attempt to set a larger length
will throw an exception.
- IsTransactionDb
- Some database engines support a non-transactional configuration in which
begin-end transactions are not supported but the individual database
operations are atomic. MySQL has both transactional and non-transactional
configurations. This option can be used to set the transaction mode. Since
it affects the physical database structure, it can only be set prior to
database initialization. Applications must be careful when using
non-transactional configurations because the database may be left in an
inconsistent state if an application is interrupted in the middle of a
database operation.
- DoCompressURI
- By default, resource URIs are stored fully expanded in the database. If
DoCompressURI is true, URIs will be compressed by storing a prefix of the
URI (typically a namespace) in a separate table. This can be used to reduce
database space consumption. Ideally, it should not significantly increase
retrieval time since it is expected that the number of prefixes will be
relatively small and it should be possible to cache them in main memory for
expansion.
- Note that there is an interaction between
DoCompressURI and LongObjectLength. The prefix is compressed before the
object length is checked. For example, if LongObjectLength is ten and
DoCompressURI is true, the URI
myNamespace.com/foo:123 would be stored as a compressed URI directly in
a statement table. However, if DoCompressURI is false, then that URI would
be stored in the long resources table and the statement table would have a
reference to it.
- CompressURILength
- If DoCompressURI is true, this specifies the minimum length URI that
should be compressed. Resource URI's shorter than this value will be stored
fully expanded.
- TableNamePrefix
- Every database table created by Jena has a common prefix. This option
allows users to specify the prefix. It affects all Jena tables and indexes,
including the Jena system table. Consequently, with this option it is
possible to have multiple Jena persistent stores, each with different
formatting options (e.g., LongObjectLength, DoCompressURI, etc.) in a single
database instance, with each store having a distinct prefix.
- Note that this option differs from the previous
options in that it must be set on every connection that access the store.
Otherwise, the subsystem will assume the default prefix and will not be able
to locate the Jena system table which contains the configuration.
- The maximum
length of the prefix is database-dependent and an exception may be thrown if
the prefix is too long. Otherwise, it is the user's responsibility to ensure
that the prefix name conforms to the naming conventions for the underlying
database engine (e.g., certain prohibited special characters). Also, if the
database requires upper case table names (or lower case), the prefix will be
automatically (silently) converted to that convention.
- This option has subtle semantics and should be used
with care. Always use the following code sequence to ensure that the prefix
is set correctly for the database connection.
IDBConnection conn = ( make a database connection )
conn.getDriver().setTableNamePrefix("myNewNamePrefix");
Database Access Options
The following options may be set at any time and are not persistent. They
exist only for the duration of a database connection.
IRDBDriver Option |
Type |
Default |
Description |
StoreWithModel |
String |
null |
If not null or empty, subsequent models will share tables with the named
model. |
CompressCacheSize |
int |
50 |
The size of the URI prefix cache if DoCompressURI is true. |
- StoreWithModel
- By default, models are stored in separate database tables. This option
enables models to share tables. Once specified, all subsequently created
models created on the current connection are stored in the same tables as
the specified model. A model name of "DEFAULT" references the default
(unnamed) model.
- If the specified model does not exist, an exception
is thrown when attempting to create a new model that references it. This is
also true of the default model, i.e., it is not automatically created. If
the specified model name is null or the empty string, then subsequently
created models are stored in separate tables.
- CompressCacheSize
- If URI compression is enabled (DoCompressURI is true), an in-memory LRU
cache of URI prefixes is maintained to reduce the need to access the
database to expand compressed URIs. The cache size can be adjusted at any
time after a connection to the database is established.
-
Model Access Options
The following options affect the behavior of query processing. These
options are not persisted in the database. The options are set by calling the
associated set method on the database model (an instance of ModelRDB). There
is also a get method to retrieve the option value.
ModelRDB Option |
Type |
Default |
Description |
DoFastpath |
boolean |
true |
If true, enable query Fastpath. |
QueryOnlyAsserted |
boolean |
false |
If true,
query only asserted statement
tables. |
QueryOnlyReified |
boolean |
false |
If true, query
only reified statement
tables. |
QueryFullReified |
boolean |
false |
If true, Fastpath ignores partially reified
statements. |
DoDuplicateCheck |
boolean |
true |
If true, check if a statement is
already in the database before adding. it. |
- DoFastpath
- This option enables and disables Fastpath query processing. Generally,
it should be enabled but it may be useful to disable it for experiments or
debugging. For details on Fastpath processing and explanations of the
three query options in this table, see the Fastpath
notes.
- QueryOnlyAsserted
- When true, querying will only be done on asserted statement tables; the
reified tables are ignored. For applications that use only asserted
statements this may provide a performance improvement for certain types of
queries (specifically, those with unknown predicates), especially if the
database is remote from the application.
- QueryOnlyReified
- When true, querying will only be done on reified statement tables; the
asserted statement tables are ignored. For applications that use only
reified statements this may provide a performance improvement for certain
types of queries (specifically, those with unknown predicates), especially
if the database is remote from the application.
- QueryFullReified
- See the Fastpath
notes.
- DoDuplicateCheck
- When a statement is added to a persistent model, Jena first checks if
the statement already exists in the model. This prevents the occurrence of
duplicate rows in the statement tables. However, if a user knows that the
rows to be inserted do not already exist, DoDuplicateCheck may be disabled
to reduce overhead for adding statements. This can substantially reduce load times.
- Note that, once set, the value applies not just to
the specified model but to any model subsequently created in the database
during the user's session (the life of the database connection). The setting
for existing models is not affected.
- When duplicate checking is disabled, if an
application attempts to insert a duplicate statement in a model, the result
depends on the database engine and configuration. In general, the insert
will succeed, no indication will be provided to the application and the
database will contain duplicate statements. If this is undesirable, one
option is to create a unique index on the subject, predicate and object
columns of the statement table. This can easily be done by modifying the
templates for creating statement tables in the database-specific SQL
template files, e.g., see CreateStatementTable and
CreateReifStatementTable in the file 'etc/mysql.sql'.
If this is done then the database engine will generate an error when a
duplicate statement is added and an exception will be thrown to the
application.
-