Berkeley DB Reference Guide: Using Stored Collections

Berkeley DB Reference Guide:
Java API

Using Stored Collections

collection_fig_1

The implementation of stored collections and related transactional access methods.

When a stored collection is created it is based on either a DataStore or a DataIndex . When a data store is used, the primary key of the data store is used as the collection key. When a data index is used, the index key is used as the collection key. Indexed collections can be used for reading elements and removing elements but not for adding or updating elements.

Stored Collection and Access Methods

The access method of the data store or index restricts the use of the stored collection in certain respects. Most of these restrictions have to do with List interfaces; for Map interfaces, most all access modes are fully supported since the Berkeley DB model is map-like.

SortedSet and SortedMap interfaces may only be used if keys are ordered. This means ordered keys are required for creating a StoredSortedEntrySet , StoredSortedKeySet , StoredSortedMap , or StoredSortedValueSet .
All iterators for stored collections implement the ListIterator interface as well as the Iterator interface. ListIterator.hasPrevious() and ListIterator.previous() work for all access methods. However, the following ListIterator method behavior is dependent on the access method.
- ListIterator.nextIndex() and ListIterator.previousIndex() only work when record number keys are used, and throw UnsupportedOperationException otherwise.
- ListIterator.add() inserts before the current position and renumbers following keys if the RECNO-RENUMBER access method is used.
- For all access methods other than RECNO-RENUMBER:
  - ListIterator.add() throws UnsupportedOperationException if duplicates are not allowed.
  - ListIterator.add() inserts a duplicate before the current position if duplicates are unsorted.
  - ListIterator.add() inserts a duplicate in sorted order if duplicates are sorted.
- ListIterator.set() throws UnsupportedOperationException if duplicates are sorted, since updating with sorted duplicates would change the iterator position.
Map.Entry.setValue() throws UnsupportedOperationException if duplicates are sorted.
Only the access methods that use a record number key may be used with a List view.
To create a stored List that supports the List.add() method, only the RECNO-RENUMBER access method may be used.
For List access methods that do not support List.add() (RECNO, QUEUE, and BTREE-RECNUM):
- List.add() and ListIterator.add() always throw UnsupportedOperationException
- List.remove() and ListIterator.remove() do not cause list indices to be renumbered. However, iterators will skip the removed values.
For these access methods, stored Lists are most useful as read-only collections where indices are not required to be sequential.
When the access method allows duplicate keys the Collection interfaces are modified in several ways as described in the next section.

Differences between Stored Collections and Standard Java Collections

Stored collections have the following differences with the standard Java collection interfaces. Some of these are interface contract violations.

The Java collections interface does not support duplicate keys (multi-maps or multi-sets). When the access method allows duplicate keys, the collection interfaces are defined as follows.

Map.entrySet() may contain multiple Map.Entry objects with the same key.
Map.keySet() always contains unique keys, it does not contain duplicates.
Map.values() contains all values including the values associated with duplicate keys.
Map.put() appends a duplicate if the key already exists rather than replacing the existing value, and always returns null.
Map.remove() removes all duplicates for the specified key.
Map.get() returns the first duplicate for the specified key.
StoredMap.duplicates(Object) is an additional method for returning the values for a given key as a Collection .

Other differences are:

All iterators for stored collections must be explicitly closed with StoredIterator.close() . The static method StoredIterator.close(java.util.Iterator) allows calling close for all iterators without harm to iterators that are not from stored collections, and also avoids casting. If a stored iterator is not closed, unpredictable behavior including process death may result.
Collection.size() and Map.size() always throws UnsupportedOperationException . This is because the number of records in a database cannot be determined reliably or cheaply.
Because the size() method cannot be used, the bulk operation methods of standard Java collections cannot be passed stored collections as parameters, since the implementations rely on size(). However, the bulk operation methods of stored collections can be passed standard Java collections as parameters.
storedCollection.addAll(standardCollection); is allowed while standardCollection.addAll(storedCollection); is not allowed.
This restriction applies to the standard collection constructors that take a Collection parameter (copy constructors), the Map.putAll() method, and the following Collection methods: addAll(), containsAll(), removeAll() and retainAll().
The ListIterator.nextIndex() method returns Integer.MAX_VALUE for stored lists when positioned at the end of the list, rather than returning the list size as specified by the ListIterator interface. Again, this is because the database size is not available.
Comparator objects cannot be used and the SortedMap.comparator() and SortedSet.comparator() methods always return null. Comparators are not supported in Java API because of the impact on performance -- with a stored collection, keys and values would have to be converted from byte arrays to objects each time two data items are compared.
The natural ordering of a stored collection is data byte order, whether the data classes implement the Comparable interface or not. The Comparable interface is not supported for the same reason that the Comparator interface is not supported.
The Object.equals() method is not used to determine whether a key or value is contained in a collection, to locate a value by key, etc. Instead the byte array representation of the keys and values are used. However, the equals() method is called for each key and value when comparing two collections for equality. It is the responsibility of the application to make sure that the equals() method returns true if and only if the byte array representations of the two objects are equal. Normally this occurs naturally since the byte array representation is derived from the object's fields.

Other Stored Collection Characteristics

The following characteristics of stored collections are extensions of the definitions in the java.util package. These differences do not violate the Java collections interface contract.

All stored collections are thread safe (can be used by multiple threads concurrently) except for iterators, whenever the Berkeley DB Concurrent Data Store or Transactional Data Store environment is used. Locking is handled by the Berkeley DB environment. To access a collection from multiple threads, creation of synchronized collections using the Collections class is not necessary except when using the Data Store environment. Iterators, however, should always be used only by a single thread.
All stored collections may be read-only if desired by passing false for the writeAllowed parameter of their constructor. Creation of immutable collections using the Collections class is not necessary.
A stored collection is partially read-only if an index is used. Specifically, values may not be added or updated. The following methods will throw UnsupportedOperationException when an index is used: Map.put() , Collection.add() , List.set() , ListIterator.set() and Map.Entry.setValue() . However, removing values via an index is allowed.
SortedMap.entrySet() and SortedMap.keySet() return a SortedSet , not just a Set as specified in Java collections interface. This allows using the SortedSet methods on the returned collection.
SortedMap.values() returns a SortedSet , not just a Collection , whenever the keys of the map can be derived from the values using an entity binding. Note that the sorted set returned is not really a set if duplicates are allowed, since it is technically a collection; however, the SortedSet methods (for example, subSet()), can still be used.
For SortedSet and SortedMap views, additional subSet() and subMap() methods are provided that allow control over whether keys are treated as inclusive or exclusive values in the key range.
Keys and values are stored by value, not by reference. This is because objects that are added to collections are converted to byte arrays (by bindings) and stored in the database. When they are retrieved from the collection they are read from the database and converted from byte arrays to objects. Therefore, the object reference added to a collection will not be the same as the reference later retrieved from the collection.
A runtime exception, RuntimeExceptionWrapper , is thrown whenever database exceptions occur which are not runtime exceptions. The RuntimeExceptionWrapper.getCause() method can be called to get the underlying exception.
All iterators for stored collections implement the ListIterator interface as well as the Iterator interface. This is to allow use of the ListIterator.hasPrevious() and ListIterator.previous() methods, which work for all collections since Berkeley DB provides bidirectional cursors.
All stored collections have a StoredCollection.iterator(boolean) method that allows creating a read-only iterator for a writable collection. For the standard Collection.iterator() method, the iterator is read-only only when the collection is read-only. Read-only iterators are important for using the Berkeley DB Concurrent Data Store environment, since only one write cursors may be open at one time.
Iterator stability for stored collections is greater than the iterator stability defined by the Java collections interfaces. Stored iterator stability is the same as the cursor stability defined by Berkeley DB.
When an entity binding is used, updating (setting) a value is not allowed if the key in the entity is not equal to the original key. For example, calling Map.put() is not allowed when the key parameter is not equal to the key of the entity parameter. Map.put() , List.set() , ListIterator.set() , and Map.Entry.setValue() will throw IllegalArgumentException in this situation.
Adding and removing items from stored lists is not allowed for sublists. This is simply an unimplemented feature and may be changed in the future. Currently for sublists the following methods throw UnsupportedOperationException : List.add() , List.remove() , ListIterator.add() and ListIterator.remove() .
Two extension methods allow adding a new record with an automatically assigned key: StoredList.append(java.lang.Object) and StoredMap.append(java.lang.Object) . Record number assignment by the database itself is supported for QUEUE, RECNO and RECNO-RENUMBER databases. An application-defined PrimaryKeyAssigner may also be used.

Why Java Collections for Berkeley DB?

The Java collections interface was chosen as the best Java API for Berkeley DB given these requirements:

provide the Java developer with an API that is as familiar and easy to use as possible
provide access to all, or a large majority, of the features of the underlying Berkeley DB storage system
compared to the Berkeley DB API, provide a higher-level API that is oriented toward Java developers
for ease of use, support object-to-data bindings, per-thread transactions, and some traditional database features such as foreign keys
provide a thin layer that can be thoroughly tested and which does not significantly impact the reliability and performance of Berkeley DB

Admittedly there are several things about the Java Collections API that don't quite fit with Berkeley DB or with any transactional database, and therefore there are some new rules for applying the Java Collections API. In addition, the Java API data store, index and foreign key APIs are outside the scope of the Java Collections API and are therefore somewhat disjoint. However, these disadvantages are considered to be smaller than the disadvantages of the alternatives:

A new API not based on the Java Collections API could have been designed that maps well to Berkeley DB but is higher-level. However, this would require designing an entirely new model. The exceptions for using the Java Collections API are considered easier to learn than a whole new model. A new model would also require a long design stabilization period before being as complete and understandable as either the Java Collections API or the Berkeley DB API.
The ODMG API or another object persistence API could have been implemented on top of Berkeley DB. However, an object persistence implementation would add much code and require a long stabilization period. And while it may work well for applications that require object persistence, it would probably never perform well enough for many other applications.

In fact both of these alternatives were started and then abandoned for the reasons given.