The JE Application

This section provides a brief overview to the major concepts and operations that comprise a JE application. This section is concluded with a summary of the decisions that you need to make when building a JE application.

Note that the core JE classes are all contained in the com.sleepycat.je package. In addition, this book describes some classes that are found in com.sleepycat.je.bind. The bind APIs are used for converting Java objects in and out of byte arrays.

Databases and Database Environments

To use a JE database, you must first open a JE database environment. Database environments require you to identify the directory on disk where the environment lives. This location must exist before you create the environment.

You open a database environment by instantiating an Environment object. Your Environment instance is called an environment handle.

Once you have opened an environment, you can use it to open any number of databases. Each such database is encapsulated by a Database object. You are required to provide a string that uniquely identifies the database when you open it. Like environments, the Database instance is sometimes referred to as a database handle.

You use the environment handle to manage database environments and database opens through methods available on the Environment class. You use the database handle to manage individual databases through methods available on the Database class.

You use environment handles to close environments, and you use database handles to close databases.

Note that for both databases and environments, you can optionally allow JE to create them if they do not exist at open time.

Environments are described in greater detail in Database Environments. Databases are described in greater detail in Databases.

Database Records

Database records are represented as simple key/data pairs. Both record keys and record data must be instances of a DatabaseEntry class. DatabaseEntry only supports storage of Java byte arrays. For complex objects, Java serialization can be used to obtain a byte array representation of the object, but for performance reasons this is discouraged. To help you with byte array conversions, Sleepycat provides the bind APIs.

Database records and byte array conversion are described in Database Records.

Putting and Getting Database Records

You store records in a Database by putting the record into to the Database. You can put records by using a Database handle directly. JE automatically determines the record's proper placement in the database's internal B-Tree using whatever key and data comparison functions that are available to it.

You can also retrieve, or get, records using the Database handle. Gets are performed by providing the key (and sometimes also the data) of the record that you want to retrieve.

You can also use cursors for database puts and gets. Cursors are essentially a mechanism by which you can iterate over the records in the database. Like databases and database environments, cursors must be opened and closed. Cursors are managed using the Cursor class.

Databases are described in Databases. Cursors are described in Using Cursors.

Duplicate Data

At creation time, databases can be configured to allow duplicate data. Remember that JE database records consist of a key/data pair. Duplicate data, then, occurs when two or more records have identical keys, but different data. By default, a Database does not allow duplicate data.

If your Database contains duplicate data, then a simple database get based only on a key returns just the first record that uses that key. To access all duplicate records for that key, you must use a cursor.

Replacing and Deleting Entries

How you replace database records depends on whether duplicate data is allowed in the database.

If duplicate data is not allowed in the database, then simply calling Database.put() with the appropriate key will cause any existing record to be updated with the new data. Similarly, you can delete a record by providing the appropriate key to the Database.delete() method.

If duplicate data is allowed in the database, then you must position a cursor to the record that you want to update, and then perform the put operation using the cursor.

To delete records, you can use either Database.delete() or Cursor.delete(). If duplicate data is not allowed in your database, then these two method behave identically. However, if duplicates are allowed in the database, then Database.delete() deletes every record that uses the provided key, while Cursor.delete() deletes just the record at which the cursor is currently positioned.

Secondary Databases

Secondary Databases provide a mechanism by which you can automatically create and maintain secondary keys or indices. That is, you can access a database record using a key other than the one used to store the record in the first place.

When you are using secondary databases, the database that holds the data you are indexing is called the primary database.

You create a secondary database by opening it and associating it with an existing primary database. You must also provide a class that generates the secondary's keys (that is, the index) from primary records. Whenever a record in the primary database is added or changed, JE uses this class to determine what the secondary key should be.

When a primary record is created, modified, or deleted, JE automatically updates the secondary database(s) for you as is appropriate for the operation performed on the primary.

You manage secondary databases using the SecondaryDatabase class. You identify how to create keys for your secondary databases by implementing the SecondaryKeyCreator.createSecondaryKey(). method.

Secondary databases are described in Secondary Databases

Transactions

Transactions provide a high level of safety for your database operations by allowing you to manage one or more database operations as if they were a single unit of work. Transactions provide your database operations with recoverability, atomicity, and isolation.

Transactions provide recoverability by allowing JE to undo any transactionally protected operations that may have been in progress at the time of an application failure.

Transactions provide atomicity by allowing you to group many database operations into a single unit of work. Either all operations succeed or none of them do. This means that if one write operation fails for any reason, then all other writes contained within that transaction also fail. This ensures that the database is never partially updated as the result of an only partially successful chain of read/write operations.

Transactions provide isolation by ensuring that the transaction will never write to a record that is currently in use (for either read or write) by another transaction. Similarly, any record to which the transaction has written can not be read outside of the transaction until the transaction ends. (Note that the exception to this second rule is that you can configure your Database or Cursor to perform dirty reads – that is, read records modified but not yet committed by a transaction).

Essentially, transactional isolation provides a transaction with the same unmodified view of the database that it would have received had the operations been performed in a single-threaded application.

Transactions may be long or short lived, they can encompass as many database operations as you want, and they can span databases so long as all participating databases reside in the same environment.

Transaction usage results in a performance penalty for the application because they generally require more disk I/O than do non-transactional operations. Therefore, while most applications will use transactions for database writes, their usage is optional. In particular, processes that are performing read-only access to JE databases might not use transactions. Also, applications that use JE for an easily recreated cache might also choose to avoid transactions.

You manage transactions using the Transaction class. Transactions are described in Transactions

JE Resources

JE has some internal resources that you may want to manage. Most important of these is the in-memory cache. You should carefully consider how large the JE cache needs to be. If you set this number too low, JE will perform potentially unnecessary disk I/O which will result in a performance hit. If you set it too high, then you are potentially wasting RAM that could be put to better purposes.

Note that the size that you configure for the in-memory cache is a maximum size. At application startup, the cache starts out fairly small (only about 7% of the maximum allowed size for the cache). It then grows as is required by your application's database operations. Also, the cache is not pinned in memory – it can be paged out by your operating system's virtual memory system.

Beyond the cache, JE uses several background threads to keep the cache within its size limits, to clean the JE log files, and to flush database changes seen in the cache to the backing data files. For the majority of JE applications, the default behavior for the background threads should be acceptable and you will not need to manage their behavior. Note that background threads are started no more than once per process upon environment open.

For more information on sizing the cache and on the background threads, see Administering Berkeley DB Java Edition Applications

Application Considerations

When building your JE application, be sure to think about the following things:

  • What data do you want to store? What is best used for the primary key? What is the best representation for primary record data? Think about the most efficient way to move your keys and data in and out of byte arrays. See Database Records for more information.

  • Does the nature of your data require duplicate record support? Remember that duplicate support can be configured only at database creation time. See Opening Databases for more information.

    If you are supporting duplicate records, you may also need to think about duplicates comparators (not just key comparators). See Using Comparators for more information.

  • What secondary indexes do you need? How can you compute your secondary indexes based on the data and keys stored in your primary database? Indexes are described in Secondary Databases.

  • What cache size do you need? See Sizing the Cache for information on how to size your cache.

  • Does your application require transactions (most will). Transactions are described in Transactions.