Chapter 1.  Sleepycat Java Collections API Overview

Table of Contents

Using Data Bindings
Selecting Binding Formats
Selecting Data Bindings
Implementing Bindings
Using Bindings
Secondary Key Creators
Using Sleepycat Java Collections API
Using Transactions
Transaction Rollback
Access Method Restrictions
Using Stored Collections
Stored Collection and Access Methods
Stored Collections Versus Standard Java Collections
Other Stored Collection Characteristics
Why Java Collections for Berkeley DB Java Edition
Serialized Object Storage

The Sleepycat Java Collections API is a Java framework that extends the well known Java Collections design pattern such that collections can now be stored, updated and queried in a transactional manner. The Sleepycat Java Collections API is a layer on top of JE.

Together the Sleepycat Java Collections API and Berkeley DB Java Edition provide an embedded data management solution with all the benefits of a full transactional storage and the simplicity of a well known Java API. Java programmers who need fast, scalable, transactional data management for their projects can quickly adopt and deploy the Sleepycat Java Collections API with confidence.

This framework was first known as Greybird DB written by Mark Hayes. Sleepycat Software has collaborated with Mark to permanently incorporate his excellent work into our distribution and support it as an ongoing part of Berkeley DB and Berkeley DB Java Edition. The repository of source code that remains at Sourceforge at version 0.9.0 is considered the last version before incorporation and will remain intact but will not be updated to reflect changes made as part of Berkeley DB or Berkeley DB Java Edition.

JE provides a Java API that can be roughly described as a map and cursor interface, where the keys and values are represented as byte arrays. The Sleepycat Java Collections API is a layer on top of JE. It adds significant new functionality in several ways.

Note that the Sleepycat Java Collections API does not support caching of programming language objects nor does it keep track of their stored status. This is in contrast to "persistent object" approaches such as those defined by ODMG and JDO (JSR 12). Such approaches have benefits but also require sophisticated object caching. For simplicity the Sleepycat Java Collections API treats data objects by value, not by reference, and does not perform object caching of any kind. Since the Sleepycat Java Collections API is a thin layer, its reliability and performance characteristics are roughly equivalent to those of Berkeley DB, and database tuning is accomplished in the same way as for any Berkeley DB database.

There are several important choices to make when developing an application using the Sleepycat Java Collections API.

  1. Choose the Format for Keys and Values

    For each database you may choose a binding format for the keys and values. For example, the tuple format is useful for keys because it has a deterministic sort order. The serial format is useful for values if you want to store arbitrary Java objects. In some cases a custom format may be appropriate. For details on choosing a binding format see Using Data Bindings .

  2. Choose the Binding for Keys and Values

    With the serial data format you do not have to create a binding for each Java class that is stored since Java serialization is used. But for other formats a binding must be defined that translates between stored byte arrays and Java objects. For details see Using Data Bindings .

  3. Choose Secondary Indices and Foreign Key Indices

    Any database that has unique keys may have any number of secondary indices. A secondary index has keys that are derived from data values in the primary database. This allows lookup and iteration of objects in the database by its index keys. A foreign key index is a special type of secondary index where the index keys are also the primary keys of another primary database. For each index you must define how the index keys are derived from the data values using a SecondaryKeyCreator. For details see the SecondaryDatabase, SecondaryConfig and SecondaryKeyCreator classes.

  4. Choose the Collection Interface for each Database

    The standard Java Collection interfaces are used for accessing databases and secondary indices. The Map and Set interfaces may be used for any type of database. The Iterator interface is used through the Set interfaces. For more information on the collection interfaces see Using Stored Collections .

Any number of bindings and collections may be created for the same database. This allows multiple views of the same stored data. For example, a data store may be viewed as a Map of keys to values, a Set of keys, or a Collection of values. String values, for example, may be used with the built-in binding to the String class, or with a custom binding to another class that represents the string values differently.

It is sometimes desirable to use a Java class that encapsulates both a data key and a data value. For example, a Part object might contain both the part number (key) and the part name (value). Using the Sleepycat Java Collections API this type of object is called an "entity". An entity binding is used to translate between the Java object and the stored data key and value. Entity bindings may be used with all Collection types.

Please be aware that the provided Sleepycat Java Collections API collection classes do not conform completely to the interface contracts defined in the java.util package. For example, all iterators must be explicitly closed and the size() method is not available. The differences between the Sleepycat Java Collections API collections and the standard Java collections are documented in Stored Collections Versus Standard Java Collections .

Using Data Bindings

Data bindings determine how keys and values are represented as stored data (byte arrays) in the database, and how stored data is converted to and from Java objects.

The selection of data bindings is, in general, independent of the selection of access methods and collection views. In other words, any binding can be used with any access method or collection.

Note

In this document, bindings are described in the context of their use for stored data in a database. However, bindings may also be used independently of a database to operate on an arbitrary byte array. This allows using bindings when data is to be written to a file or sent over a network, for example.

Selecting Binding Formats

For the key and value of each stored collection, you may select one of the following types of bindings.

Binding Format Ordered Description
SerialBinding No The data is stored using a compact form of Java serialization, where the class descriptions are stored separately in a catalog database. Arbitrary Java objects are supported.
TupleBinding Yes The data is stored using a series of fixed length primitive values or zero terminated character arrays (strings). Class/type evolution is not supported.
Custom binding format User-defined The data storage format and ordering is determined by the custom binding implementation.

As shown in the table above, the tuple format supports ordering while the serial format does not. This means that tuples should be used instead of serial data for keys in an ordered database.

The tuple binding uses less space and executes faster than the serial binding. But once a tuple is written to a database, the order of fields in the tuple may not be changed and fields may not be deleted. The only type evolution allowed is the addition of fields at the end of the tuple, and this must be explicitly supported by the custom binding implementation.

The serial binding supports the full generality of Java serialization including type evolution. But serialized data can only be accessed by Java applications, its size is larger, and its bindings are slower to execute.

Selecting Data Bindings

There are two types of binding interfaces. Simple entry bindings implement the EntryBinding interface and can be used for key or value objects. Entity bindings implement the EntityBinding interface and are used for combined key and value objects called entities.

Simple entry bindings map between the key or value data stored by Berkeley DB and a key or value object. This is a simple one-to-one mapping.

Simple entry bindings are easy to implement and in some cases require no coding. For example, a SerialBinding can be used for keys or values without writing any additional code.

Entity bindings must divide an entity object into its key and value data, and then combine the key and value data to re-create the entity object. This is a two-to-one mapping.

Entity bindings are useful when a stored application object naturally has its primary key as a property, which is very common. For example, an Employee object would naturally have an EmployeeNumber property (its primary key) and an entity binding would then be needed. Of course, entity bindings are more complex to implement, especially if their key and data formats are different.

Note that even when an entity binding is used a key binding is also usually needed. For example, a key binding is used to create key objects that are passed to the Map.get() method. A key object is passed to this method even though it may return an entity that also contains the key.

Implementing Bindings

There are two ways to implement bindings. The first way is to create a binding class that implements one of the two binding interfaces, EntryBinding or EntityBinding. For tuple bindings and serial bindings there are a number of abstract classes that make this easier. For example, you can extend TupleBinding to implement a simple binding for a tuple key or value. Abstract classes are also provided for entity bindings and are named after the format names of the key and value. For example, you can extend TupleSerialBinding to implement an entity binding with a tuple key and serial value.

Another way to implement bindings is with marshalling interfaces. These are interfaces which perform the binding operations and are implemented by the key, value or entity classes themselves. With marshalling you use a binding which calls the marshalling interface and you implement the marshalling interface for each key, value or entity class. For example, you can use TupleMarshalledBinding along with key or value classes that implement the MarshalledTupleEntry interface.

Using Bindings

Bindings are specified whenever a stored collection is created. A key binding must be specified for map, key set and entry set views. A value binding or entity binding must be specified for map, value set and entry set views.

Any number of bindings may be created for the same stored data. This allows multiple views over the same data. For example, a tuple might be bound to an array of values or to a class with properties for each object.

It is important to be careful of bindings that only use a subset of the stored data. This can be useful to simplify a view or to hide information that should not be accessible. However, if you write records using these bindings you may create stored data that is invalid from the application's point of view. It is up to the application to guard against this by creating a read-only collection when such bindings are used.

Secondary Key Creators

Secondary Key Creators are needed whenever database indices are used. For each secondary index (SecondaryDatabase) a key creator is used to derive index key data from key/value data. Key creators are objects whose classes implement the SecondaryKeyCreator interface.

Like bindings, key creators may be implemented using a separate key creator class or using a marshalling interface. Abstract key creator classes and marshalling interfaces are provided in the com.sleepycat.bind.tuple and com.sleepycat.bind.serial packages.

Unlike bindings, key creators fundamentally operate on key and value data, not necessarily on the objects derived from the data by bindings. In this sense key creators are a part of a database definition, and may be independent of the various bindings that may be used to view data in a database. However, key creators are not prohibited from using higher level objects produced by bindings, and doing so may be convenient for some applications. For example, marshalling interfaces, which are defined for objects produced by bindings, are a convenient way to define key creators.