PERST - Simple, Fast, Convenient Object Oriented Database

Introduction
Features
- Persistency by reachability
- Semitransparent object loading
- Automatic schema evolution
- Relations
- Indices
- Projection
- Transaction model
- Replication
JSQL
- Overview
- JSQL formal grammar
- Arrays
- Strings
- References
- Functions
- Optimization
- Database
Using AspectJ
Using JAssist
Perst implementation issues
- Memory allocation
- Logless transactions
Where to use?
Quick start
JDK 1.5
Perst Lite
Full text search: Perst+Lucene
Continuous
Using Perst under Android: TestIndex benchmark
Using Perst Lite under J2MEL: ProScout midlet sample
Tuning
Tricks and tips
Perst API JavaDOC
Perst API for JDK 1.5 Javadoc
Perst Lite API Javadoc
RDF support
Perst license

Introduction

Perst is a very simple object-oriented embedded database. It is easy to use and provides high performance. It is intended to be used in applications that need to deal with persistent data in a more sophisticated way than the load/store object tree provided by a standard serialization mechanism. Although Perst is very simple, it provides fault-tolerant support (ACID transactions) and concurrent access to the database.

The main advantage of Perst is its tight integration with programming languages. There is no gap between the database and the application data models - Perst directly stores language objects. So there is no need to pack or unpack code, an operation which has to be performed for traditional relational databases. Also Perst (unlike many other OODBMS) requires no special compiler or preprocessor. Yet, it is able to provide a high level of transparency.

Features

Let us now describe the key features of the Perst architecture.

Persistency by reachability

Perst implements the persistency by reachability approach. An object of any class derived from the Persistent base class is considered persistent-capable. It is automatically made persistent and stored in the storage when it is referenced from some other persistent object and when the store method of that object is invoked. So there is no need (but it is possible) to explicitly assign objects to the storage.

The storage has one special root object. The root object is the only persistent object accessed in a special way (using Storage.getRoot method). All other persistent objects are accessed in the normal way: either by traversing by reference from another persistent object, or by using object containers (Index, Link or Relation). Unlike many other OODBMS, there can be only one root in the storage. If you need to have several named roots, you should create an Index object and use it as the root object.

Perst requires that each persistent-capable class be derived from the Persistent class. This makes it impossible to store ‘foreign’ classes in the storage. This is the ‘price to pay’ for Perst’s ease of use and the absence of any specialized preprocessors or compilers. Also, components of persistent-capable objects are restricted to the following types:

Scalar types: Any valid Java scalar type: boolean, integer or real. For example, boolean, int, short, double, ...
String type: java.lang.String type
Date type: java.util.Date type
Reference to the persistent object: Any class inherited from Persistent or any interface extending the IPersistent interface.
Value type: Any class implementing the IValue interface with the same restrictions on the types of components as for persistent-capable objects. If perst.implicit.values property is true, then any class which is not derived from IPersistent will be treated as a value. Values are stored inside the persistent object containing them. The value class should have a default constructor (a constructor with an empty parameter list) or no constructors at all. The declared type of the field referencing value should be the same as the actual type of referenced object (so it is not possible to assign an object of the derived class to this field). Also, the value class should not have recursive references (direct or indirect) – e.g. field of value class V cannot have type V.
Raw binary type: If perst.serialize.transient.objects property is true, then Perst will serialize all objects of non-persistent and non-value classes (classes which are not derived from IPersistent or IValue interfaces) using the standard Java serialization mechanism. The object closure will be packed into an array of bytes which is stored in the database. The class should implement the java.io.Serializable interface and should not contain references to persistent objects.
Array type: A One-dimensional array with type of components as described above.
Link type: One-to-many link or, from an implementation point of view, a dynamic array of persistent objects. Links are created using the Storage.createLink method.

Unfortunately, it is not possible to detect if an object is changed or not without saving the old state of the object and performing a field-by-field comparison with the new state of the object. But, the overhead of such a solution (both in terms of space and CPU resources) is very high. In Perst it is the responsibility of the programmer to save objects in the storage. This can be done by the Persistent.store or Persistent.modify methods.

The Persistent.store method writes an object, as well as all objects which are not yet persistent but referenced from this object, to the storage. So, if you create a tree of objects and assign a reference to the root of this tree to some persistent object X, it is only necessary to invoke the store() method in this object X. But, if you update one of the elements in this tree, you should invoke store() individually for each such element (X.store() will NOT automatically save referenced objects now).

The Persistent.modify method marks an object as modified but doesn't immediately write it to the storage. All objects marked as modified will be stored in the storage during transaction commit (store method will be invoked for each modified object). So, using the modify method is preferable if an object is updated multiple times within a transaction. In this case, instead of storing it several times, it will be stored only once - at the end.

Perst doesn't support nested non-static classes. The reason is that the non-static class contains a final field pointing to the outer class. As far as a field is final, it is not possible to assign a value to it during object loading in the same way as for all other fields of a persistent object.

Semitransparent object loading

Perst does not use any special compiler or preprocessor. Since Java doesn't provide runtime behavior reflection (changing behavior of an object at runtime), it is not possible to implement completely transparent persistence (where there are no differences between accesses to transient and persistent objects). Instead of that, Perst proposes transparent behavior in most cases with some exceptions.

The IPersistent interface declares the recursiveLoading method. The default implementation of this method in the Persistent class always returns true. In this case, Perst will recursively load any object referenced by a target object when the target object is loaded. Thus, it causes implicit loading of the cluster of all referenced objects from storage to the main memory. This is similar to how a serialization mechanism works.

To avoid an overflow of the main memory caused by recursive loading of all objects from the storage, a programmer has to overload recursiveLoading method in some classes and return false in it. Objects referenced by such an object will not be implicitly loaded and the programmer has to explicitly invoke Persistent.load method to load them. So, the recursiveLoading method can be used to control loading of objects from storage to main memory.

Also, it is important to notice that containers (Link, Relation, Index) always load member objects on demand (they do not perform automatic loading of all objects in the containers). Since access to the container members is always performed through methods of the container class, the programmer will never notice it - container methods will always return loaded object(s).

Perst uses a default constructor (constructor without parameters) to create an object loaded from storage. This means that:

All persistent-capable classes should have a default constructor (or have no constructor at all, in which case it will be automatically generated by the compiler). The default constructor should not necessarily be public; it can have any access type. Perst is also able to generate default constructors for persistent classes using sun.reflect.ReflectionFactory. But this will work only with Sun's JDK.
The default constructor should initialize only the transient fields.
The default constructor is used to create an instance of the persistent object loaded from the storage. So, at the time of execution of the default constructor, the fields of the constructed object are not yet assigned the values stored in the database. If you need these values to be able to perform initialization of the transient fields, then you need to perform this initialization by the onLoad method which is invoked after fetching all values of the non-transient fields of the persistent object from the storage.

So, summarizing all of the above, the proposed mechanism is convenient and easy to use because it doesn't require the programmer to explicitly load any referenced object. From another point of view, it is flexible by providing the programmer control on object loading. Only those classes (usually containers) which explicitly control loading of their members (by overloading recursiveLoading to return a false value) should be aware about calling the Persistent.load method.

Automatic schema evolution

Perst supports lazy automatic schema evolution. When a class format is changed, Perst performs a conversion of the loaded object to the new format. If this object is modified, it will be stored in the new format. So, the object is converted to the new format on demand. Perst is able to handle the following schema changes:

Compatible change of scalar type (change which cannot cause data truncation). For example, it is possible to change the int type to long or to float. But changing the type from int to short is not possible. More precisely, Perst is able to perform any conversion which is supported by Java reflection mechanism (field type XXX can be changed to YYY if the java.lang.reflect.Field.setXXX method can be applied to the component with type YYY).
Reordering of components within a class or moving the components to a base or derived class.
Adding/removing of classes from the class inheritance hierarchy.
Changing the format of classes with value semantics.

All other schema modifications (such as renaming fields, incompatible change of field type) cannot be handled by the Perst automatic schema modification mechanism. In this case, you can use the Perst XML export/import mechanism. A database can be exported to XML format using the Storage.exportXML method and then recreated with new class definitions. After it is saved, data can be imported using the Storage.importXML method.

Relations

Java references provide a way to implement one-to-one relation between objects. But in many cases, one-to-many and many-to-many relations are needed. Perst provides the Link interface to deal with relations of this kind. This interface allows you to add/delete/inspect members of the relation. Members of the relation can be accessed by index or be extracted as an array of objects.

Relations can be of two kinds: embedded (where references to the related objects are stored in relation to the owner object itself) and standalone (where the relation is a separate object, which contains the references to the relation owner and the relation members). Both kinds of relations implement the Link interface. An embedded relation is created by the Storage.createLink method and a standalone relation is represented by the persistent class created by the Storage.createRelation method.

So, a relation between two classes A and B can be implemented in the following way:

Relation type Object A Object B

one-to-one B bref; A aref;

one-to-many Link bref; A aref;

many-to-one B bref; Link aref;

many-to-many Link bref; Link aref;

Relation type	Object A	Object B
one-to-one	`B bref;`	`A aref;`
one-to-many	`Link bref;`	`A aref;`
many-to-one	`B bref;`	`Link aref;`
many-to-many	`Link bref;`	`Link aref;`

Indices

Usually objects are accessed by traversing from one object to another using references or links. But it is frequently needed to locate an object by its key. In JDK, the Hashtable or the HashMap class can be used for this purpose. In databases, usually a more sophisticated search is required. I do not want to implement the complete SQL language in Perst, because it immediately makes the DBMS huge and slow. But, in most cases, an application performs only very simple queries using an exact match or a key range. This is done in Perst by Index and IndexField interfaces. The first interface is used for an independent specification key and its associated value. IndexField interface allows you to index objects using one of the fields of this object (key field).

Indices are created in Perst using the Storage.createIndex or the Storage.createFieldIndex methods. There can be several index implementations but, right now, only one implementation based on B+Tree is provided (because B+Tree is the most efficient structure for disk-based databases). Methods of the Index and the FieldIndex interfaces allow you to add, remove and locate objects by the key. It is possible to perform a search either by specifying the exact key value or by specifying a range of key values (the high or low boundary or both of them can be skipped or can be declared as being exclusive or inclusive). So it is possible to perform the following types of search:

key equals VAL
key belongs to [MIN_VAL, MAX_VAL]
key belongs to [MIN_VAL, MAX_VAL)
key belongs to (MIN_VAL, MAX_VAL]
key belongs to (MIN_VAL, MAX_VAL)
key is greater than MIN_VAL
key is greater than or equal to MIN_VAL
key is less than MAX_VAL
key is less than or equal to MAX_VAL

There are several different ways of selecting objects using index:

IPersistent get(Key key): Get an object by its key. This method should be used for unique indices, to locate an object by the exact key value.
IPersistent[] get(Key from, Key till): Get an array of objects with keys belonging to the specified range. Either the ‘from’ boundary, the ‘till’ boundary or both can be null. Both boundaries can be inclusive or exclusive.
Iterator iterator(): Get an iterator which will traverse all objects in the index in the ascending order of their keys.
Iterator iterator(Key from, Key till, int order): Get an iterator for objects with keys belonging to the specified range. Either the ‘from’ boundary, the ‘till’ boundary or both can be null. Both boundaries can be inclusive or exclusive. Objects can be traversed in the ascending or descending order of their keys.

If you need a set of persistent objects you should use the Storage.createSet method. Set is implemented using B+Tree where the object id (OID) is used as a key.

Perst also supports spatial indices (org.garret.perst.SpatialIndex) and generic indices with a user-defined comparator (org.garret.perst.SortedCollection). The spatial index is implemented using Guttman's R-Tree with quadratic split algorithm. It allows for efficient search of R2 objects. Sorted Collection provides almost the same methods as FieldIndex but it uses a user-defined comparator to compare collection members. A sorted collection is implemented using a T-Tree and is especially efficient for main-memory databases.

The table below summarizes information about all indices supported by Perst:

Interface	Description	Key type	Implementation	Created by
`Index`	Index with an explicitly specified key used for exact match or range queries	scalar, string or reference	B+Tree	`Storage.createIndex(Class type, boolean unique)`
`Index`	The same as above but assuming that there can be a lot of duplicate key values	scalar, string or reference	B+Tree	`Storage.createThinkIndex(Class type)`
`Index`	Random access index optimized for accessing elements both by key and by position	scalar, string or reference	B+Tree	`Storage.createRandomAccessIndex(Class type, boolean unique)`
`FieldIndex`	Index constructed for one of the object fields	scalar, string or reference	B+Tree	`Storage.createFieldIndex(Class type, String fieldName, boolean unique)`
`FieldIndex`	Random access field index optimized for accessing elements both by key and by position	scalar, string or reference	B+Tree	`Storage.createRandomAccessFieldIndex(Class type, String fieldName, boolean unique)`
`MultidimensionalIndex`	Multidimensional index allows to select objects using search conditions for various fields. This index provides better performance than traditional indices if there is no single field in the query with good selectivity. In case of traditional indices (like B-Tree) it is necessary to join large results sets or perform sequential search among large number of objects. Compound index allows to select object only if all keys are specified (or at least prefix keys). And multidimensional index allows to perform search simultaneously taking in account all specified restrictions. But as far as it is implemented using binary tree which nodes contains references to the indexed objects, this index is efficient mostly for in-memory databases or in cases when all traversed objects fits in memory.	scalar, string or any other comparable type	KD-Tree	`Storage.createMultidimensionalIndex(Class cls, String[] fieldNames)` or `Storage.createMultidimensionalIndex(MultidimensionalComparator comparator)`
`BitIndex`	Bit index for searching object by bitmap of properties	persistent object	B+Tree	`Storage.createBitIndex()`
`IPersistentSet`	Set of persistent objects	persistent object	B+Tree	`Storage.createSet()`
`IPersistentSet`	Scalable set of persistent objects (can efficiently handle both small and large number of members)	persistent object	Link or B+Tree	`Storage.createScalableSet()`
`IPersistentList`	List of persistent objects providing random access	persistent object	B+Tree	`Storage.createList()`
`IPersistentList`	Scalable list of persistent objects (can efficiently handle both small and large number of members)	persistent object	Link or B+Tree	`Storage.createScalableList()`
`IPersistentMap`	Scalable map of persistent objects (can efficiently handle both small and large number of members)	persistent object	Sorted array or B+Tree	`Storage.createMap()`
`SpatialIndex`	Index for spatial objects with integer coordinates	Rectangle	R-Tree	`Storage.createSpatialIndex()`
`SpatialIndexR2`	Index for spatial objects with real coordinates	RectangleR2	R-Tree	`Storage.createSpatialIndexR2()`
`SortedCollection`	Index with a user-defined comparator	any	T-Tree	`Storage.createSortedCollection(PersistentComparator comparator, boolean unique)`

Projection

Using Perst indices, a programmer can easily implement most simple SQL queries, like:

    select * from T where x=?;

Perst relations can be used to implement simple joins, like the following SQL query fetching all orders to a particular vendor:

    select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';

In Perst it is possible to first select a vendor using an index search and then traverse the corresponding relations to locate all the orders to this vendor.

But sometimes, it is necessary to implement more complex queries. This is also possible in Perst, without the need to write queries in some special (non-procedural) language. The Projection class is used to combine the results of several simple operations (such as an index search). Let us start explanation of this class with an example. Consider that we need to know the details of all the orders with names starting with 'D1' shipped by vendors having names starting with 'V1' or 'V3'. The SQL statement which performs this query is as follows:

    select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id
        and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');

And now, we shall see how this can be done in Perst. Consider that we have indices for details and vendors:

    FieldIndex detailIndex = db.createFieldIndex(Detail.class, "name", true);
    FieldIndex vendorIndex = db.createFieldIndex(Vendor.class, "name", true);

Set of requested orders can be obtained in the following way:

    //  Projection from vendors to orders (using "orders" field of Link type)
    Projection v2o = new Projection(Vendor.class, "orders");
    //  Projection from details to orders (using "orders" field of Link type)
    Projection d2o = new Projection(Detail.class, "orders");
    // Select vendors with name like 'V1%'
    v2o.project(vendorIndex.getPrefix("V1"));
    // Select vendors with name like 'V3%'
    v2o.project(vendorIndex.getPrefix("V3"));
    // Select details with name like 'D1%'
    d2o.project(detailIndex.getPrefix("D1"));
    // Join projections
    v2o.join(d2o);
    // Get array of requested orders
    Order[] orders = (Order[])v2o.toArray(new Order[v2o.size()]);

As you can see, the Projection class is used for several purposes:

To combine the results of several simple operations (using the OR operator).
To eliminate duplicates resulting from such a merge.
To get a set of related objects (perform projection using specified projection field).
To join several projections (analogue of SQL JOIN).

It is possible to derive your own class from the Projection class to implement more sophisticated projections than using a single projection field.

If you need to perform a selection sort in some particular order then the most efficient way is to use the FieldIndex method for it. When you select objects using index, the selected objects are sorted by the search key. If you need to perform a selection sort by a field which is not the search key, then you can use the Arrays.sort method. For example, if in the query described above we need to sort orders by the price field, it can be done with the following statement:

    Array.sort(orders, new Comparator() { 
       public int compare(Object a, Object b) { return ((Order)a).price - ((Order)b).price; }
    });

Transaction model

Perst preserves consistency of data in case of a system or an application failure. The transaction mechanism is used to implement an all-or-nothing database update. Transactions in Perst are started implicitly when an update operation is performed the first time and finished explicitly by the

commit,
rollback

or close methods.

The committing of a transaction causes a synchronous write of changed pages to the disk. Synchronous writes (where an application is blocked until data is really flushed to the disk) are very expensive operations. The average positioning time for modern disks is about 10 msec, so they are usually not able to perform more than 100 writes in random places in one second. As a transaction usually consists of several updated pages, this leads to an average performance of about 10 transaction commits per second.

Performance can be greatly increased if you minimize the number of commits (larger transactions). Perst uses a shadow mechanism for transaction implementation. When an object is changed for the first time during a transaction, a shadow of the object is created and the original object is kept unchanged. If the object is updated multiple times during the transaction, the shadow is created only once. Because it uses shadows, Perst does not need a transaction log file. Therefore, in Perst, long transactions will not cause a transaction log overflow as in most other DBMSes. Quite the contrary, if you do not call commit at all. Perst works as a DBMS without transaction support, adding almost no overhead for supporting transactions.

The only drawback of long transactions is the possibility of losing a lot of changes in case of a fault. Perst will preserve consistency of the database, but all changes made since the last commit will be lost.

Perst also supports per-thread transactions. Three types of per-thread transactions are supported: exclusive, cooperative and serializable. In an exclusive transaction, only one thread can update the database. In cooperative mode, multiple transactions can work concurrently and the commit() method will be invoked only when the transactions of all the threads are terminated. Serializable transactions can also work concurrently. But unlike cooperative transactions, the threads are isolated from each other. Each thread has its own associated set of modified objects and committing the transaction will cause saving of only these objects to the database. To synchronize access to the objects in case of a serializable transaction, a programmer should use lock methods of the IResource interface. A shared lock should be set before read access to any object, and an exclusive lock before write access. Locks will be automatically released when the transaction is committed (so a programmer should not explicitly invoke the unlock() method). In this case it is guaranteed that the transactions are serializable.

It is not possible to use the IPersistent.store() method in serializable transactions (you should use IPersistent.modify() instead). That is why it is also not possible to use the Index and the FieldIndex containers since they are based on the B-Tree and a B-Tree directly accesses database pages and uses the store() method to assign an OID to an inserted object. You should use the SortedCollection based on T-Tree instead or an alternative B-Tree implementation (see "perst.alternative.btree" property).

Starting from version 2.66, Perst provides multi-client access to a database. This is implemented using OS file locks. Databases are accessed in multiple-readers-single-writer mode. In order to enable multi-client support, it is necessary to set the "perst.multiclient.support" property. Then, all accesses to a database should be performed within explicit transactions, i.e. transactions should be started using the beginThreadTransaction method. If a transaction is read-only, then the Storage.READ_ONLY_TRANSACTION mode should be used. Several read-only transactions can be executed in parallel. But if a transaction may modify the contents of the database, then the Storage.READ_WRITE_TRANSACTION mode should be used and the transaction will have exclusive access to the database.

Transaction access to the database is synchronized for the different processes in one system as well as for the different threads within one process. Transactions can be nested - you can invoke beginThreadTransaction as many times as you need, assuming that the same number of endThreadTransaction calls are invoked. So, you can wrap the code of each method accessing the database with the beginThreadTransaction and the endThreadTransaction methods. It is not possible to upgrade a transaction from shared (READ_ONLY_TRANSACTION) to exclusive (READ_WRITE_TRANSACTION). Hence, you should always use the READ_WRITE_TRANSACTION if the transaction may modify the database.

The outermost endThreadTransaction will cause the transaction to commit and release the OS file lock. Unlike endThreadTransaction, the rollbackThreadTransaction doesn't consider the counter of nested transactions - it aborts all nested transactions. If some process accessing the database crashes, the file lock held by this process is automatically released. According to the specification of the java.nio.channels.FileChannel.lock method, some operating systems may not support shared locks. In this case the only choice is to always use the READ_WRITE_TRANSACTION mode. Because the java.nio package was introduced in JDK 1.4, Perst is not able to provide multi-client access with earlier JDK releases.

Replication

Perst supports the master-slave replication mechanism. It allows us to have a single master node and several slave nodes. All updates of the database can happen only at the master node and are replicated at the slave nodes. The slave nodes are able to execute read-only requests working with the shadow snapshot of the database (state after the last commit).

Replication is page based - all dirty pages from the master node page pool are sent to the slave nodes. A slave node detects the moment of transaction commit (when the root page is sent to a slave node and the state index is changed in the database header) and sets an exclusive lock in this case. The read-only transactions at the slave nodes set a shared lock and work with the previous (consistent) state of the database. When a transaction commit is performed by the master node, a slave node uses this lock to transfer the database into the next consistent state. Starting from this moment, the read-only transactions at a slave node will work with the new, consistent state of the database.

Perst supports two replication models: static and dynamic. In the static model, all slave nodes have to be defined when an application is started. The master node tries to connect to all the specified slave nodes. After establishing all possible connections with the active slave nodes, it starts working, replicating the changes to these slave nodes. The content of all statically defined slave nodes is considered to be the same as that of the master node. Thus, when a database is opened, the master expects that all the connected slave nodes have the same state as its own state and will send to them only the pages notified by the application after it is opened. If you are going to add a new static slave node, you should first (before the database is opened) manually copy the database from the master node to this new slave node.

In the dynamic model, it is possible to add new replicated nodes at each moment in time. In this case, Perst transfers the complete database to the new node, synchronizing its state with the master node state. When the new replicated slave node is synchronized with the master node, it will act in the same way as a static, replicated node: the master will send to both of them all the updates performed by the application. The dynamic mode is especially useful for main-memory databases (with an infinite page pool and a NullFile stub). In this case, the size of the database is not expected to be very large and its transfer to a new replicated node should not take a lot of time.

As far as replicated nodes are able to execute read-only queries, the replication mechanism can also be used to improve performance by distributing the load between several nodes. To be able to execute read-only transactions at a slave node, an application should use per-thread transactions wrapped by the Storage.beginThreadTransaction(Storage.REPLICATION_SLAVE_TRANSACTION) and the Storage.endThreadTransaction() method invocations. It is possible to run many such transactions concurrently. Only when the master node commits a transaction, a slave node has to set an exclusive lock blocking all read-only transactions (certainly it has to first wait for the completion of all currently executing read-only transactions. So, these transactions should be short enough to avoid blocking the replication mechanism for a long time). This exclusive lock is released very fast - immediately after placing a new root page in the pool. Now read-only transactions will see the new state of the database.

When an application running at the master node closes the storage, the master sends a CLOSE request to all the slave nodes causing them to close their connection with the master. In case of abnormal termination of one of the slave nodes, the master continues to work with the other slave nodes. If no more slave nodes are online, the master continues to work autonomously. The Perst replication mechanism doesn't enforce any particular strategy for the recovery of nodes. It is up to the application how to handle the situation of the master crashing and choosing a new master node.

JSQL

Overview

JSQL is a subset of the SQL language, which can be used to select object instances according to the selection condition. JSQL uses a notation that is more popular in object-oriented programming than for a relational database. Table rows are considered as object instances and the table - as a class of these objects. Unlike SQL, JSQL is oriented to work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of JSQL from standard SQL are:

There are no joins of several tables and nested sub queries. A Query always returns a set of objects from one table.
The standard Java types are used for atomic table columns.
There are no null values, only null references.
Arrays can be used as record components. A special exists quantor is provided for locating elements in arrays.
User methods can be defined for table records (objects) as well as for record components.
References between objects are supported, including user-defined reference resolvers.
As long as the query language is deeply integrated with the Java language, the case sensitive mode is used for the language identifiers as well as for the keywords.
No implicit conversion of the integer and floating types is done to string representation. If such a conversion is needed, it should be done explicitly.

In Perst, there is the org.garret.perst.Query class which allows you to execute JSQL queries. To execute a JSQL query it is necessary to provide the object iterator class (java.util.Iterator) of iterated objects and the string representing the query condition. It is also possible to specify indices, resolvers, query parameters. The result of a query execution is another iterator which can be used to traverse through the selected objects (objects matching the specified search criteria).

All the Perst collection classes contain the select method which allows you to filter collection members using a JSQL query. This method is given the class of objects stored in the collection (for some collections such as FieldIndex this is known and should not be specified) and the JSQL condition. As a result of its work, select returns the iterator of the selected objects.

JSQL formal grammar

The following rules in BNF-like notation specifiy the grammar of the JSQL query language search predicate:

Grammar conventions
Example Meaning

expression non-terminals

not terminals

| disjoint alternatives

(not) optional part

{1..9} repeat zero or more times

Grammar conventions
Example	Meaning
expression	non-terminals
not	terminals
\|	disjoint alternatives
(not)	optional part
{1..9}	repeat zero or more times

select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction 
        | conjunction or disjunction
conjunction ::= comparison 
        | comparison and conjunction
comparison ::= operand = operand 
        | operand != operand 
        | operand <> operand 
        | operand < operand 
        | operand <= operand 
        | operand > operand 
        | operand >= operand 
        | operand (not) like operand 
        | operand (not) like operand escape string
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication 
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power 
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string 
        | true | false | null 
	| current | first | last
	| ( expression ) 
        | not comparison
	| - term
	| term [ expression ] 
	| identifier . term 
	| function term
        | count (*) 
        | contains array-field (with expression) (group by identifier having expression)
        | exists identifier : term
function ::= abs | length | lower | upper
        | integer | real | string | 
        | sin | cos | tan | asin | acos | 
        | atan | log | exp | ceil | floor 
        | sum | avg | min | max 
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= field (asc | desc)
field ::= identifier { . identifier }
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate any SQL reserved words.

List of reserved words
abs acos and asc asin

atan avg between by contains

cos ceil count current desc

escape exists exp false floor

group having in integer is

length like log lower max

min not null or real

sin string sum tan true

upper with

List of reserved words
abs	acos	and	asc	asin
atan	avg	between	by	contains
cos	ceil	count	current	desc
escape	exists	exp	false	floor
group	having	in	integer	is
length	like	log	lower	max
min	not	null	or	real
sin	string	sum	tan	true
upper	with

JSQL extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. The result of applying and/or operator to integer operands is an integer value with its bits set by bit-AND/bit-OR operation. Bit operations can be used for efficient implementation of small sets. Also raising to a power operation ^ is supported by JSQL for integer and floating point types.

Arrays

JSQL is able to work with Java array types. JSQL provides a set of special constructions for dealing with arrays:

It is possible to get the number of elements in the array by using the length() function.
Array elements can be fetched by the [] operator. If the index expression is out of the array range, then an exception will be raised.
The in operator can be used for checking if the array contains the value specified by the left operand. This operation can only be used for arrays of atomic types: with boolean, numeric, reference or string components.
Iteration through the array elements is performed by the exists operator. The variable specified after the exists keyword can be used as an index in arrays in the expression preceded by the exists quantor. This index variable will iterate through all possible array index values, until the value of the expression becomes true or the index runs out of range. Condition
```
        exists i: (contract[i].company.location = 'US')
```
will select all details which are shipped by companies located in US, while the query:
```
        not exists i: (contract[i].company.location = 'US')
```
will select all details which are shipped only by companies outside US.
Nested exists clauses are allowed. Use of nested exists quantors is equivalent to nested loops using correspondening index variables. For example, the query
```
        exists column: (exists row: (matrix[column][row] = 0))
```
will select all records, containing 0 in the elements of the matrix field, which have an array of integer type. This construction is equivalent to the following two nested loops:
```
       boolean result = false;
       for (int column = 0; column < matrix.length(); column++) { 
            for (int row = 0; row < matrix[column].length(); row++) { 
                 if (matrix[column][row] == 0) { 
                     result = true;
                     break;
                 }
            }
       }
```
The order of using indices is significant! The result of the following query execution
```
        exists row: (exists column: (matrix[column][row] = 0))
```
will be completely different from the result of the previous query. The program may simply hang in the last case due to occurrence of an infinite loop for empty matrices.
It is possible to use the following clause
contains array-field (with expression) (group by array-component-field having expression-with-aggregate-functions)
to select the arrays with elements matching the specified with condition, and group the elements in the array to apply aggregate functions.
If the with expression is specified in the contains clause, this construction is equivalent to the exists statement with the same condition, but, with two exceptions:
- In the with expression it is possible to referee components of the array elements without any indices or array field, i.e. the array element class is added to the current namespace.
- Unlike the exists clause which uses the IndexOutOfRangeError exception to notify the end of iteration, contains with will just iterate through all the array elements and not throw any exception. Hence, the performance of contains with is better.
- While it is possible to have nested contains clauses, the with expression can deal with elements of only one (the innermost) array.

Strings

All strings in JSQL have a varying length and a programmer should not worry about the specification of maximal length for character fields. All operations that are acceptable for arrays are also applicable to strings. In addition to them, strings have a set of their own operations. First of all, strings can be compared with each other using the standard relation operators.

The like construction can be used for matching strings with a pattern containing special wildcard characters '%' and '_'. The '_' character matches any single character, while the '%' character matches any number of characters (including zero characters). The extended form of the like operator with the escape part can be used to handle the '%' and '_' characters in the pattern as normal characters if they are preceded by a special escape character, specified after the escape keyword.

It is possible to search for a substring within a string by using the in operator. The expression ('blue' in color) will be true for all records which contain 'blue' in their color field. Strings can be concatenated by the + or || operators. The latter one was added only for compatibility with the ANSI SQL standard. Because JSQL doesn't support implicit conversion to string type in expressions, the semantic of operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as is used for accessing structure components. For example, the following query:

        company.address.city = 'Chicago'

will access records referenced by the company component of the Contract record and extract the city component of the address field of the referenced record from the Supplier table.

References can be checked for null by is null or is not null predicates. Also, references can be compared for equality with each other as well as with the special null keyword. When a null reference is dereferenced, an exception will be raised by JSQL.

There is a special keyword current, which can be used to get a reference to the current record during a table search. Usually, the current keyword is used for the comparison of the current record identifier with other references or for locating the current record identifier within an array of references. For example, the following query will search the Contract table for all active contracts (assuming that the field canceledContracts has Contract[] type):

        current not in supplier.canceledContracts

A programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such methods should have no arguments, except for a pointer to the object to which they belong (the this pointer in Java), and should return an atomic value (of boolean, numeric, string or reference type). Also, the method should not change the object instance (immutable method). So, user-defined methods can be used for creating virtual components - components which are not stored in a database, but instead are calculated using the values of other components. For example, you can use the standard Java java.util.Date class with such methods as getYear(), getMonth() . . . . Thus, it is possible to specify queries like: "delivery.getYear = 99" in an application where the delivery record field has the java.util.Date type.

Functions

Predefined functions
Name	Argument type	Return type	Description
abs	integer	integer	absolute value of the argument
abs	real	real	absolute value of the argument
sin	real	real	sin (rad)
cos	real	real	cos (rad)
tan	real	real	tan (rad)
asin	real	real	arcsin
acos	real	real	arccos
atan	real	real	arctan
exp	real	real	exponent
log	real	real	natural logarithm
ceil	real	real	the smallest integer value that is not less than the argument
floor	real	real	the largest integer value that is not greater than the argument
integer	real	integer	conversion of real to integer
length	array	integer	number of elements in array
lower	string	string	lowercase string
real	integer	real	conversion of integer to real
string	integer	string	conversion of integer to string
string	real	string	conversion of real to string
upper	string	string	uppercase string

Optimization

JSQL provides a mechanism for the fast location of objects by a unique primary key. When a query condition is checked for the equality of a scalar (numeric or string) field and a constant value or, if the query condition is a conjunction (logical AND) of the operands and the left operand is checked for the equality of a scalar field and a constant value, then, JSQL tries to apply the index to locate an object by this key value. If an object is found but the query condition is a conjunction, then, JSQL also checks that the value of the right operand is true.

To be able to use indices, a programmer has to obtain a query object and register the indices in it. JSQL supports the Perst Index and FieldIndex indices. The indices are located by JSQL by their field name. It is assumed that class objects returned by the index iterator are the same as specified in the select query. If you are going to use the same query instance for selecting data from different collections (with different collection member classes), please check that there are no name conflicts for the key fields, i.e. index corresponding to one collection will not be applied for another collection.

Maintaining indices is the responsibility of a programmer. JSQL is only using the indices for searching elements by their keys.

Database

For those of you who prefer to deal with records in tables instead of objects, Perst provides the Database class. This class emulates a relational database on top of Perst. Certainly, even with this class, Perst provides a very object-oriented way of dealing with persistent data. The Database class allows you to create/drop tables, add/drop indices, create/update/delete records. It automatically updates indices when a new record is added or removed. Also, the Database class makes it possible to prepare and execute queries.

The Database class is used as a wrapper for the underlying Perst storage. The storage passed to the constructor of the Database class should be opened and either not initialized, or initialized before by the same Database constructor. It is not possible to attach a database to the Perst Storage with an existing root object other than one set by the Database constructor.

A table in a database is associated with the Java class. Almost all methods of the database class require passing of the Class parameter or persistent object instance. In the latter case, class of this object is used. If Perst finds no table associated with the class, it tries to find a table for the superclass and so on... Hence, it is possible to have a "polymorphic" table - a table containing records (objects) of multiple classes derived from this one. Methods dealing with an object instance have an overloaded version where the class can be specified explicitly. This makes it possible to place an object in a table associated with some particular interface.

A database provides two ways of querying tables: using simple and prepared queries. In the first case, Database.select directly returns the result of query execution (iterator). In the second case, it is possible to execute a prepared query multiple times, specifying different values of parameters.

Please find more information on the Database class methods in Javadoc and look at the JSQL-SSD example - Supplier-Shipment-Detail database built using the Database class.

Using AspectJ

The main idea of aspect oriented programming (AOP) is to separate the different aspects of systems, implement them independently and then automatically combine the different aspects into a single program. AspectJ is a very popular tool which brings the ideas of AOP to the Java language.

Concerning the interface to a database system, there is the object persistence aspect which should control transparent object loading and storing to/from the storage. Without AspectJ, Perst uses a programmer controlled, recursive object loading mechanism (when some object is requested, all objects referenced from this object will also be fetched unless they redefine recusriveLoading method to return false. To store a modified object in a database, a programmer should explicitly call the store() or the modify() method. With AspectJ this is not needed: persistence aspect will automatically load referenced object and mark it as modified when values of some of the object’s fields are changed. Also, there is no need to derive all the persistent-capable classes from the Persistent base class; it can be easily done using the AspectJ declaration construction.

The AspectJ interface to Perst was developed by Patrick Morris-Suzuki. It is located in the package: org.garret.perst.aspectj. To build it, you can use perst/src/compile_aspectj.bat batch file (AspectJ should be properly installed before). It will build perst/lib/perst_aspectj.jar file. Each persistent-capable method must implement org.garret.perst.aspectj.AutoPersist, which is a marker interface with no methods. However, the user can avoid this by writing a simple aspect to define the persistent-capable classes like this:

public aspect PersistentClasses {
    declare parents: mypackage.* implements AutoPersist;
}

Then you can build your application like this:

ajc *.java -aspectpath /perst/lib/perst_aspectj.jar

Restrictions of AspectJ API to Perst:

Persistent-capable objects still need an empty default constructor (which will be used by a database to create an instance of an object when it is loaded from the storage).
The hashCode() and equals() methods will work correctly only after an object is assigned its OID.
The AspectJ interface is not able to automatically load objects when its fields are accessed from the outside. Hence, this interface correctly handles only the invocation of methods and accesses to self variables. While in principle, it is also possible to automatically handle access to foreign object components with AspectJ, it is done in a very inefficient way and as it contradicts good OOP practice, I switched off this feature. (You can switch it on by uncommenting one code fragment in the PerstistenceAspect file.)

In the perst/tst/aspectj directory there is an example of an application using AspectJ interface to Perst: Guess. It is the same example as "Guess an animal" located in the perst/tst directory (which does not use AspectJ), but it doesn't contain explicit calls to the store() method. Also, the original Guess example recursively loaded the while game tree when the root object is accessed; whereas the AspectJ version fetches object instances only on demand. You can build the examples using the compile.bat script and run them using the Guess.bat script.

Using JAssist

JAssist is yet another popular product for the transformation of Java byte code. It is also possible to build a transparent API for Perst using the JAssist package. Despite the fact that it is much simpler and more compact than AspectJ, this API makes it possible to implement an even more convenient and flexible Perst interface than AspectJ with fewer limitations.

With JAssist you can use a normal Java compiler. The Perst translator for JAssist automatically makes all classes matching specific name patterns persistent capable. With this translator, it is not required for the application classes to be derived from the Persistent class and provide an empty default constructor.

To be able to use the JAssist interface, you should use the special JAssist class loader instead of the default Java class loader to load all classes which you want to make persistent capable. Classes with fully qualified names matching one of the specified patterns will be transformed during loading. Other classes are loaded unchanged.

I have slightly modified the JAssist code to be able to distinguish access to the this field and to the fields of foreign objects. This makes it possible to eliminate extra overhead for the this field access. If you are going to use the original JAssist library, then you should comment my code using !isSelfReader() condition. In this case Perst will not be able to handle access to the foreign (non this) fields. You should use the getter/setter methods instead. Alternatively, you can replace this condition with true, allowing access to foreign fields but with significant degradation of performance and increased code size, because in this case before ALL accesses to fields of a persistent-capable object, a call to load() method will be inserted. The modified versions of the JAssist files are included in the Perst distribution in the perst/src/org/garret/jassist/edit directory.

Transformation includes the following steps:

Add a special constructor which will be used by Perst to create an object instance when it is loaded from storage (it is not the default constructor which may or may not exist).
If a class is not derived from the persistent class and its superclass is java.lang.Object, then, replace the superclass with org.garret.perst.Persistent.
Insert a call to Persistent.load() method at the beginning of each instance method.
Insert a call to Persistent.modify() method before each write to a field.
Insert the recursiveLoading method returning false.
Add pack/unpack methods to the class and create a load factory class.

The JAssist interface has only one restriction: you should not access the fields of any persistent object other than a this object in a constructor. Below is an example using the JAssist interface:

package com.mycompany.mypackage;
import org.garret.perst.jassist.PerstTranslator;
import javassist.*;
public class MyApp { 
    public static void main(String[] args) { 
        Translatator trans = new PerstTranslator(new String[]{"com.mycompany.*"});
        ClassPool pool = ClassPool.getDefault(trans);
        Loader cl = new Loader(pool);
        cl.run("Main", args);
    }
}

In this example all the classes from com.mycompany.mypackage except MyApp will be loaded by the JAssist class loader and automatically made persistent capable.

To build the JAssist interface, you should use the compile-aspectj.bat file in the perst/src directory. The javassist.jar file is included in the Perst distribution and located in the perst/lib directory. You can find examples using the JAssist interface in the perst/tst/jassist directory.

Perst implementation issues

Below is a more detailed description of Perst implementation. This section can be skipped if you are not interested in the details of implementation.

Memory allocation

Memory allocation is performed in Perst by bitmaps. The memory is allocated in chunks called allocation quantum. In the current version of Perst, the size of the allocation quantum is 64 bytes. It means that the size of all allocated objects is aligned on the 64 byte boundary. Each 64 bytes of database memory is represented by one bit in the bitmap. To locate a hole of requested size in the bitmap, Perst sequentially searches the bitmap pages for the corresponding number of successive cleared bits. Perst uses three arrays indexed by bitmap byte, which makes possible fast calculation of the hole offset and the size within the byte.

Perst performs cyclic scanning of the bitmap pages. It keeps the identifier of the current bitmap page and the current position within the page. Every time an allocation request arrives, scanning of the bitmap starts from the current position. When the last allocated bitmap page is scanned, scanning continues from the beginning (from the first bitmap page) until the current position is reached. When no free space is found after a full cycle through all bitmap pages, a new block of memory is allocated. The size of the extension is the larger of the size of the allocated object and the extension quantum. The extension quantum is a parameter of the database, specified in the constructor. A bitmap is extended to be able to map additional space. If the virtual space is exhausted and no more bitmap pages can be allocated, then the OutOfMemory exception is raised.

Allocating memory using bitmaps provides a high locality of references (objects are mostly allocated sequentially) and also minimizes the number of modified pages. Minimization of the number of modified pages is significant when the commit operation is performed and all dirty pages should be flushed to the disk. When all cloned objects are placed sequentially, the number of modified pages is minimal and so, the transaction commit time is also reduced. Using the extension quantum also helps to preserve sequential allocation. Once a bitmap is extended, objects will be allocated sequentially until the extension quantum will be completely used up. Only after reaching the end of the bitmap, scanning restarts from the beginning, searching for holes in previously allocated memory.

To reduce the number of bitmap page scans, Perst associates a descriptor with each page, which is used to remember the maximal size of the hole on the page. Calculation of the maximal hole size is performed in the following way: if an object of size M cannot be allocated from this bitmap page, then the maximal hole size is less than M. Thus, M is stored in the page descriptor if the previous value of the descriptor is larger than M. For the next allocation of an object of size greater or equal to M, we will skip this bitmap page. The page descriptor is reset when some object is deallocated from this bitmap page.

Some database objects (like hash table pages) should be aligned on a page boundary to provide more efficient access. The Perst memory allocator checks the requested size and whether it is aligned on a page boundary. Then, the address of the allocated memory segment is also aligned on a page boundary. A search for a free hole will be faster in this case, because Perst increases the step size of the current position increment according to the value of the alignment.

To be able to deallocate memory used by an object, Perst needs to keep information about the object size somewhere. The Perst memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by a record header, which contains the record size and a pointer to an L2-list linking all records in the table. So, the size of the table record object can be extracted from the record header. Page objects always occupy the whole database page and are allocated at the positions aligned on a page boundary. Page objects have no headers. Perst distinguishes page objects from normal objects by using a special marker in the object index.

Shadow transaction mechanism

Each record (object) in Perst has a unique identifier (OID). Object identifiers are used to implement references between objects. To locate an object by reference, its OID is used as an index in the array of object offsets within the file. This array is called the object index and an element of this array - an object handle. There are two copies of object indices in Perst, one of which is current and the other, shadow. The header of a database contains pointers to both object indices and indicates which index is current at this moment.

When an object is modified the first time, it is cloned (a copy of the object is created) and the object handle in the current index is changed to point to the newly created object copy. The shadow index still contains the handle which points to the original version of the object. All changes are done with the object copy, leaving the original object unchanged. Perst makes a mark in a special bitmap page of the object index which contains the modified object handle.

When a transaction is committed, Perst first checks if the size of the object index was increased during the current transaction. If so, it also reallocates the shadow copy of the object index. Then, Perst frees memory for all "old objects", i.e. objects which were cloned within a transaction. Memory cannot be deallocated before a commit, because we want to preserve the consistent state of the database by keeping a cloned object unchanged. If we deallocate memory immediately after cloning, a new object can be allocated in place of the cloned object and we lose consistency. As far as memory deallocation is performed in Perst by bitmaps using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in the bitmap page, which also should be cloned before modification. Cloning the bitmap page will require new space for allocating the page copy, and we can reuse the space of deallocated objects. However, it is not acceptable due to the reason explained above - we will lose database consistency. That is why deallocation of an object is done in two steps. When an object is cloned, all bitmap pages used for marking the object’s space are also cloned (if not cloned before). So when a transaction is committed, we only clear bits in the bitmap pages and no more requests for allocation of memory can be generated at this moment.

After deallocation of old copies, Perst flushes all modified pages to the disk to synchronize content of the memory and the disk files. After that Perst changes the current object index indicator in the database header to switch the roles of the object indices. Now the object index, which was current becomes shadow, and the shadow index becomes current. Then, Perst again flushes the modified page (i.e. page with the database header) to the disk, transferring the database to a new, consistent state. After that Perst copies all modified handles from the new object index to the object index which was previously shadow and now becomes current. At this moment, contents of both indices are synchronized and Perst is ready to start a new transaction.

The bitmap of the modified object index pages is used to minimize the time taken for committing a transaction. Not the whole object index, but only its modified pages should be copied after committing of the transaction bitmap is cleared.

When a transaction is explicitly aborted by the Storage.rollback method, the shadow object index is copied back to the current index, eliminating all changes done by the aborted transaction. After the end of copying, both indices are identical again and the database state corresponds to the moment before the start of the current transaction.

The allocation of object handles is done by the free handles list. The header of the list is also shadowed and two instances of the list header are stored in the database header. Switching between them is done in the same way as switching of object indices. When there are no more free elements in the list, Perst allocates handles from the unused part of a new index. When there is no more space in the index, it is reallocated. An object index is the only entity in a database which is not cloned on modification. Instead of this, two copies of the object index are always used.

There are some predefined OID values in Perst. OID 0 is reserved as the invalid object identifier. OIDs starting from 1 are reserved for bitmap pages. The number of bitmap pages depends on the database’s maximum virtual space. For one terabyte virtual space, 8 kb page size and 64 byte allocation quantum, 32 K bitmap pages are required. So, 32 K handles are reserved in the object index for the bitmap. Bitmap pages are allocated on demand, when the database size is extended. Hence, the OID of the first user object will be 0x8002.

The recovery procedure is trivial in Perst. There are two instances of object index, one of which is current and the other corresponds to the consistent database state. When a database is opened, Perst checks the database header to detect if the database was closed normally. If not (the dirty flag is set in the database header), then Perst performs a database recovery. Recovery is very similar to the rollback of a transaction. The indicator of the current index in a database object header is used to determine the index corresponding to the consistent database state and the object handles from this index are copied to another object index, eliminating all changes done by the uncommitted transaction. As long as the only action performed by the recovery procedure is copying of an object’s index (in reality only handles having different values in the current and shadow indices are copied to reduce the number of modified pages) and the size of the object index is small, a recovery can be performed very fast. Fast recovery procedures reduce the "out-of-service" time of an application.

Where should Perst be used?

Perst is a simple and fast embedded database engine. If your applications need an embedded database engine and do not need to execute complex SQL queries, and the only thing you need is to be able to store/fetch/locate objects in the database using navigation through references or indexed search by keys, then Perst is what you need. It will provide much better performance than a relational database and other (more sophisticated) object-oriented databases.

The table below summarizes the special features of Perst:

Tight and transparent integration with the programming language.
No gap between the database and the application data model.
Easy to use.
Minimal requirements (The Perst package itself is only 51 kb in size and it can be configured to use minimal memory and disk space during its work)
High performance (no overheads of communication, locking, parsing and executing queries).
Fault tolerance (transaction support).
Fast recovery after fault.
Zero administration effort (database consists of the single file), there is no need to define or tune any database parameters. A situation like a transaction log overflow can never happen.

Certainly nothing in this world can have only positive features. Perst lacks the following features:

A procedural query language.
Remote access by multiple clients (unless you will implement you own server).
Data distribution.
Lack of support for any standard (for example, ODMG).

Quick start

Perst is distributed together with the perst.jarbuild. You can also build it yourself using the compile.bat script in the src directory. The only thing you need to work with the database is to include this JAR in your class path. A template for a Perst application is as follows:


import org.garret.perst.*;

public class YourPersistentClass extends Persistent {
    int    x;
    String y;
    Link   links;
    ...


    void doUpdate() { 
        x = 1;
        y = "Hello World";
        links = getStorage().createLink();
        store(); // save changes in the database
    }
}

public class YourRoot extends Persistent {
    YourPersistentClass ref;
    Index myIndex;

    void foo() { 
        ref.doUpdate(); // referenced object has no to be explictely loaded
    }

    
};


public class YourApplication {
    static public void main(String[] args) { 
        String databaseFilePath = args[0]; // path to the database file
        int pagePoolSize = Integer.parseInt(args[1], 10); // size of page pol in bytes

        Storage storage = StorageFactory.getInstance().createStorage();
        storage.open(databaseFilePath, pagePoolSize);

        YourRoot root = (YourRoot)storage.getRoot();
        if (root == null) { 
            // Storage was not initialized yet
            root = new YourRoot();
            root.myIndex = storage.createIndex(String.class, true); // unique index
            storage.setRoot(root);
        }

        root.foo(); // all objects referenced from root are implicitly loaded
        YourPersistentClass obj = root.myIndex.get(new Key("John")); // find object with key "John"
        
        ... // do something else with database
        
        storage.commit(); // commit transaction

        ... // do something else with database

        storage.close(); // commit the last transaction and close the database
    }
}

You can also look at these Perst examples:

Guess.java: A very simple game: "Guess An Animal". This program stores a user's answers in a tree structure and uses this information to construct its questions.
TestIndex.java: Test for a B+Tree. This test stores, finds and deletes objects to the index and also measures performance.
TestIndex2.java: Test for a T-Tree. The same test as TestIndex.java but implemented using SortedCollection (T-Tree).
TestKDTree.java: Test for multidimensional index (KD-Tree). This test stores, finds and deletes objects to the multidimensional index and also measures performance. It illustrates use of Query-By-Example search using multidimensional index.
TestCompoundIndex.cs: Test for a B+Tree compound indices.
TestIndexIterator.java: Test for iterators of a B+Tree.
TestLink.java: Test of relation handling between objects in Perst using the Link interface.
TestSSD.java: Supplier-Shipment-Detail example. This example illustrates how joins can be implemented in Perst.
JsqlSSD.java: Supplier-Shipment-Detail example. The same data model as in the example above, but implemented using JSQL and Database class.
TestVersion.java: Test of versioning support. This test illustrates usage of the Version and VersionHistory classes and their performance.
TestSOD.java: Supplier-Order-Detail example. This example illustrates alternative approach for implementing many-to-many relations based on using the Projection class.
TestRtree.java: Test of the spatial index.
TestR2.java: Test of the R2 spatial index.
TestTtree.java: Test of sorted collection.
TestBit.java: Test of bit index.
TestRaw.java: Test of using Java serialization mechanism for storing in database alien objects.
TestBlob.java: Test and example of handling large binary objects in Perst.
TestAlloc.java: Test of custom allocator and multifile.
TestTimeSeries.java: Test and example of handling time series data.
TestXML.java: Test of XML import/export.
TestBackup.java: Test of database online backup.
TestConcur.java: Test of the Perst locking mechanism and concurrent access to the database.
TestServer.cs: Test of concurrent access to the database using serializable per-thread transactions.
TestGC.java: Test of implicit memory deallocation (garbage collection) in Perst.
TestThinkIndex.java: Test of indices with a large number of duplicated key values.
TestSet.java: Test of IPersistentSet.
TestList.java: Test of IPersistentList.
TestRndIndex.java: Test of the random access index (access to elements both by key and position).
TestMod.java: Example of updating persistent objects and maintaining indices for them.
TestMap.java: Test of the persistent map class.
TestReplic.java: Test of the Perst replication mechanism (static slave nodes).
TestReplic2.java: Test of the Perst advanced replication mechanism for a main-memory database (dynamic slave nodes).
TestJSQL.java: Test of JSQL.

The tests are located in the tst directory and can be built using the compile.bat script.

JDK 1.5

Perst also provides an API for JDK 1.5. It includes generic (parameterized) versions of the Perst container classes. To build it, change the directory to perst/src15 and run the compile.bat script. It will build perst15.jar (a precompiled version is included in the distribution). You can also try the TestIndex example for JDK 1.5 located in the perst/src15/tst directory. This example illustrates using the Perst generic containers and the new Java foreach construction.

Perst Lite

Many embedded systems are based on a restricted Java environment - JDK 1.1, J2ME . . . . Perst provides a version of the database for such environments. The JDK 1.1 version of Perst is compatible with CLDC 1.1. It does not use reflection, serialization and the JDK 1.2 collection classes. The main differences between the mainstream and JDK 1.1 versions of Perst (Perst Lite) are the following:

Things that are missing in Perst Lite:

Field indices
Compound indices for random access and alternative B-Tree
XML import/export
Garbage collection
Memory usage information
Queries (SubSQL)
File locking
Multi-client access to the database
Support for value and raw types
Database compression
User-defined class loaders
Automatic schema evolution
Custom allocators
Multifiles
Scalable map

Changes in functionality:

The pack/unpack methods must be provided by the programmer.
The persistent-capable classes must have a public default constructor.
Object cache is resetted after the transaction commit
The number of objects involved in a transaction is limited by the amount of physical memory.

Perst Lite does not use the Java reflection mechanism. So, to be able to store/fetch persistent objects to/from the database, the programmer must provide special pack/unpack routines. Perst provides the ISerializableinterface, which defines the following methods:

/**
 * Interface which should be implemented by any object that can be serialized.
 * Implementation of writeObject readObject methods should 
 * store/fetch content of the object to/from the provided stream.
 * The order of storing/fetching fields should be the same in both methods.
 */
public interface ISerializable { 
    /**
     * Store object fields in the stream.
     * @param out stream to which object is written
     */
    void writeObject(IOutputStream out);    
    /**
     * Fetch object fields from the stream
     * @param in stream from which object data should be fetched
     */
    void readObject(IInputStream in);
}

So, each persistent-capable class should implement this interface (actually it should be derived from the org.garret.perst.Persistent class which provides the basic implementation of this interface) and implement the writeObject/readObject methods. The interfaces IOutputStream and IInputStream provide methods for storing and fetching respectively, values of different types. Below are example definitions of pack/unpack methods:

public class Test extends Persistent { 
    int    i;
    float  f;
    String s;
    Test   o;

    public writeObject(IOutputStream out) {
        super.writerObject(out);
        out.writeInt(i);
        out.writeFloat(f);
        out.writeString(s);
        out.writeObject(o);
    }

    public void readObject(IInputStream in) {
        super.readObject(in);
        i = in.readInt();
        f = in.readFloat();
        s = in.readString();
        o = (Test)in.readObject();
    }
}

Also, because of restrictions on the JDK 1.1 reflection mechanism, a persistent-capable class should be:

Public.
Have a default public constructor (or no constructor at all).

Because Perst Lite does not use reflection, it is not able to provide the functionality required for dynamic access to object components. So, field indices, XML import/export and SubSQL query language are not supported in this version. However, because in Perst Lite, there is no overhead caused by the reflection mechanism, the speed of fetching/storing objects is about four times faster than in the full Perst version.

Perst Lite MIDP storage support

A native storage subsystem of the Mobile Information Device Profile (MIDP) is the Record Management System (RMS), an API that gives MIDP applications local, on-device data persistence. On many MIDP enabled devices today, such as Blackberry devices, RMS is the only facility for local data storage. Other devices support a conventional file system through optional packages. Perst Lite supports both the RMS and extended configurations. Perst Lite is built to support three storage configurations: JDK 1.1 with the standard file system support (perst11.jar), MIDP/CLDC 1.1 (RMS storage only, perst-rms.jar) and MIDP/CLDC 1.1 with the optional JSR 75 package (perst-jsr75.jar).

Note for Blackberry users:

Blackberry devices impose a 64 K limit on each data store. In order to overcome this limit, Perst Lite supports multi-volume storage, which is the ability to split the database between several different data stores. No configuration is required by the application: Perst Lite detects the end-of-storage and automatically extends the database with a new record store.
In order to build Blackberry applications using Perst, it is necessary to take the perst-rms.jar and �preverify� it using the preverifyutility, that is usually installed in the %BLACKBERY_IDE_HOME%\bin directory. The preverified version of the Perst library is then used with the project.

JSR 75 performance issues

JSR 75 makes it possible to work with files at the deive file system. But it is oriented mostly on sequential access to the data: playing music or video, displaying text,... JSR 75 provides no way to read from the arbitrary position in the file except using java.io.InputStream.skip method which is very naively implemented in some J2ME runtimes: it just reads data from the stream bye-by-byte until reaches the requested position. For example at Nokia 6110 (S60 3-rd edition) performing 10000 index searches through 10000 records takes ... more than 3 hours! And the database size in this example is only about 600kb. So this size can easily be loaded in memory.

That is why Perst Lite includes special version of PagePool - InfinitePagePool which preloads the whole database file in memory. It was also possible with mainstream Perst version to use infinite page pool. But this implementation of infinite page pool for Perst Lite is much more efficient and requires less memory. But the main difference is that it loads the while database in memory during open. With this implementation search time of the test mentioned above is about 8 seconds Compare it with 3 hours!

Importing data to the Perst Lite database

Usually applications needs to load some initial data in the database. Since performance of mobile phones and similar devices where Perst Lite is used is very limited, import may require significant amount of time. This is why it may be very useful to prepare database at PC and upload it to the target device.

With JSR 75 it is more or less straightforward - you should create database file at the PC using perst11.jar or any simulator which stores files created using JSR 75 API in as normal files in PC file system. But JSR 75 is not available everywhere. And format of RMS storage is opaque and vendor specific. But it is possible to import data in the RMS record storage at low level (as Perst pages) - it is much faster then just inserting data in the database (because there no need to construct index, allocate space, pack objects,...). Implementation of org.garret.perst.impl.RandomAccessFile on top of RMS includes static importData method which can be perform such import from the arbitrary input stream:

    /**
     * Import data from specified input stream to record store
     * @param storeName storage name prefix. If there is limitation for storage size, then several storages with 
     * this prefix and suffixes 0, 1, 2,... will be created.
     * @param in input stream. Data can be imported from resource file (jar file), from network connection or from other
     * data source
     * @return true if data was successfully imported, false if storage is not empty or read operation from input stream
     * is failed
     */
    public static boolean importData(String storageName, InputStream in) throws IOException, RecordStoreException;

Using this methods it is possible to prepare database at PC database file (using emulator or perst11.jar library), include it in the midlet .jar file as resource file and import this file to the RMS storage:

    protected void startApp()
    {
        storage = StorageFactory.getInstance().createStorage();
        org.garret.perst.impl.RandomAccessFile.importData("testindex.dbs", getClass().getResourceAsStream("/testindex.dbs"));
        storage.open("testindex.dbs");
        ...
    }

The importData method imports the resource file only when RMS storage is empty. So the is no need to programmer to perform extra checking - if database is initialized or not.

It is possible to use importData method not only for importing resource files from .jar file but also for importing data from any data source, for example from network connection.

SerGen utility

Since the manual description of all class components can be difficult and error prone work, Perst Lite provides a special utility called SerGen which can do this work for you, i.e. generate the writeObject/readObject methods. The only thing you have to do is to add stubs for the writeObject/readObject methods in the persistent-capable classes and specify the path to them as arguments to the SerGen utility. It is necessary to specify stubs in the list of persistent-capable classes so that the SerGen utility does not have to load the whole project, resolve all class paths and source locations, and try to guess the right place and indentation of where to insert these generated methods. Rather, SerGen uses a simplified solution that only requires the programmer to explicitly specify the classes and stubs for the writeObject/readObject methods. It solves both problems: SerGen does not have to guess which classes are persistent capable or where to generate the bodies of these methods.

The SerGen utility is invoked as follows:

      java SerGen list-of-files

Here, list-of-files is a list of source files (*.java) or directories where these source files are located. SerGen recursively traverses these directories, selects files with the .java extension and tries to locate the writeObject/readObject methods in them. If the methods are located, then the file is patched: SerGen first generates a XXX.java.tmp file, removes the original file, and renames XXX.java.tmp to XXX.java. The paths of all the patched files are reported on the console output.

Full text search: Perst+Lucene

Perst itself doesn't provide full text search capabilities. But it is possible to integrate Perst with the free open source Lucene search engine. By default, Lucene stores the full text search index in the file system. But it is also possible to store the Lucene index in a Perst database (using the Perst Blob class). Lucene works with an inverse index using the Directory interface. Default implementations of this interface include a file system and in-memory directory. Perst provides its own implementation of the Lucene Directory interface allowing to store the index directly in a Perst database.

The advantages of storing a Lucene index inside a Perst database are the following:

Use a single data storage for persistent objects, normal (B-Tree) indices and Lucene full text search indices. It allows to administer only one file, which significantly simplifies copy/backup/restore operations.
The Perst transaction mechanism preserves database consistency (including the consistency of the Lucene full text search index) in case of application or system failure. When the Lucene index is stored in the file system, abnormal program termination can cause corruption of the full text search index.

However, a file system is more efficient for random access to large blocks of data compared to Perst Blobs - the type of access that is actually used by Lucene. Consequently, performance of the file system -based Lucene search engine can be somewhat better than one using Perst storage. Following are results of a comparison of native file system (WinXP/NTFS) and the Perst implementation of Lucene storage. This test indexes 10,000 documents containing 10 fields with 10 randomly generated words:

Operation Lucene+file system Lucene+Perst

Build index 56 sec 84 sec

Index search 114 sec 118 sec

Operation	Lucene+file system	Lucene+Perst
Build index	56 sec	84 sec
Index search	114 sec	118 sec

To create Perst storage of a Lucene index you should use the PerstDirectory class which is located in the perst/lucene directory:

        Storage db = StorageFactory.getInstance().createStorage();
        db.open("file.dbs", pagePoolSize);
        Directory directory = new PerstDirectory(db);

Please refer to the Lucene programming API documentation for how to use this directory (it can be used like any other Lucene directory implementation, for example the standard FSDirectory directory. For instance, to build or update the index you should create an index writer:

        IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), create);

Please look at the PerstIndexer.java and PerstSearcher.java examples in the same directory. These examples can work with an index stored in a Perst database, as well as in a standard file system (if -fs option is specified).

To store references to Perst persistent objects inside a Lucene database, it is possible to add a special field to the document containing the document's OID (object identifier). The OID of the persistent object can be obtained using the Persistent.getOid() method and vice versa - the persistent object can be accessed by its OID using the Storage.getObjectByOID() method.

But the most convenient way of integrating full text search inside Perst is to use the "Continuous" package. This package provides a transparent RDBMS-like interface, version control, optimistic locking and built-in full text search capabilities (based on the Lucene search engine). For more information please read the Continuous section in the Perst manual.

Continuous

Perst is an embedded database system which is intended to be used for the storage of personal application data. All the Perst algorithms were designed to provide the best performance in the case when a single application is accessing a database. But, nowadays standalone applications are being replaced more and more by services - Web based server applications, for example. In such cases, providing concurrent access and high scalability will be the most important priorities.

Another important aspect missed in Perst is full text search. Actually in real life, strict search is rarely needed - it is hard to assume that a user knows the precise name of a book or author. Certainly, the implementation of your own full text search engine is a very complex task, may be even more complex than the implementation of Perst itself. Fortunately there is a good, powerful, fast, well-known and free full text search engine for Java and .Net - Lucene. It would be nice if Perst and Lucene could work together.

Versioning is yet another useful feature required by many applications. Most informational systems dealing with important data (such as finance information) have to store all versions of the documents to be able to retrieve the state of the system at a particular moment of time. Also, such popular services such as a wiki are based on versioning. Perst provides some primitives for version control but they were not integrated in the database. It is the responsibility of the programmer to maintain version histories and locate particular versions.

From my experience with Perst, I found out that the object-oriented model based on transparent persistence used in Perst is hard to understand for many people not familiar with object-oriented databases. That is why people prefer to use the org.garret.perst.Database wrapper class, which provides some kind of abstraction of the traditional interface to a relational database system. Unfortunately, it is not possible to hide all the details from a programmer and some operations like excluding/including objects in indices should still be explicitly performed by the programmer.

The main idea behind the development of the Continuous package was to provide an extension of Perst which could address all the issues discussed below:

Allows the execution of concurrent user transactions more efficiently
Integrates full text search capabilities.
Implements a version control system which should be as transparent as possible for a programmer.
Provides a simple and convenient interface which can be used easily by people who mostly have experience working with relational databases systems.

I hope that the approach provided by the Continuous package can actually be a good compromise in reaching all these goals. What makes me think so is that the ideas used to solve one of these issues, also help in other tasks. For example, multi-versioning is not useful by itself, but allows you to achieve isolation of client transactions without setting a large number of locks (which would result in a reduced level of concurrency and increased probability of a deadlock). Each client transaction works with its own snapshot of the database as seen at the moment the transaction begins, in isolation from other transactions. Only during the transaction commit, a check is performed to verify that changes done by the transaction do not conflict with changes committed by other transactions. So, it is similar to optimistic locking. A simplified table-like data structure, similar to the one used in relational databases, allows you to hide manipulations with indices and a full text search engine from the programmer.

The Lucene full text search engine can be used with Perst in standalone mode - when the Lucene indices are stored in a standard OS file system. This provides the highest level of performance, but as far as neither Lucene, nor the file system guarantee the durability and consistency of the file data in case of a power or system failure, such an incident can cause corruption of the full text search index and necessitate its rebuild (which can take a significant amount of time). Continuous provides its own implementation of the abstract Lucene directory, allowing the storage of Lucene data directly in a Perst database. As far as Perst BLOBs are optimized for sequential access to the BLOB data, they are not able to efficiently perform random access operations used by Lucene and so performance in this case will be worse than in the case of using the standard Lucene FSDirectory (which stores data in OS files). But, the main bonus of the Perst Lucene directory provider is that the Lucene indices becomes transactional - they should survive application and system faults and any inconsistency between normal and full text search data is not possible at all.

To create a full text search index for Perst objects using the Continuous package you only needed to mark particular class fields with the FullTextSearchable annotation attribute:

class Address extends IValue 
{ 
    @FullTextSearchable
    private String country;

    @FullTextSearchable
    private String city;

    @FullTextSearchable
    private String street;

    @Indexable
    private int zipCode;
    ...
}

class Company extends CVersion
{
    @FullTextSearchable
    @Indexable(unique=true)
    private String name;

    @FullTextSearchable
    private Address location;    
    ...
}

When objects of this class are stored in the database (the transaction is committed), the Continuous package will automatically extract the value of fields marked with @FullTextSearchable, parse out the individual words, and build a document with this text and associate the document with this persistent object. You can search objects in the same way as with the standard Lucene API, and the search result will contain references to Perst documents.

The @Indexable indicates the field is a standard Perst b-tree index (optionally unique).

Pre-requirements of the Continuous package:

JDK 1.5
Lucene 2.1
Perst JDK 1.5 interface (perst15.jar)

It is possible to build the package using either the compile.bat script in the perst/continuous/src directory or using build.xml in perst/continuous and the ant tool. The directory perst/continuous/tst contains two examples using the Continuous package. The bank example can also be used as a test for measuring the performance and scalability of Perst with the Continuous package.

Javadoc documentation of the Continuous package can be found here.

Tuning

This section contains several hints on how to adjust Perst parameters and increase database performance.

The speed of accessing data from a disk is several times slower than the speed of accessing data in main memory. That is why caching of data is the key to increase database performance. Perst uses a pool of pages to optimize access to the disk. The page pool size can be specified in the Storage.open method (by default it is 4 Mb). Usually, increasing the page pool size leads to better performance. But you will notice the following things:

Maximal size of memory used by a Java application is usually limited by the JVM. It can be extended using the -Xmx parameter, but if you do not specify it, the maximal size of the page pool should not be greater than 32 Mb (at least for Sun's JVM).
If you specify a very large pool size, leaving no free memory for other applications and the operating system, then swapping will cause significant performance degradation.
File systems maintain file buffers (it is not possible for Java to avoid it). Thus, data is cached twice. Certainly accessing data from a page pool is much faster than accessing it from the file system cache, because in this case no system calls and context switches are needed.
It is not possible to specify an empty page pool (leaving all caching for the file system). When data is accessed from the page, it is pinned in the page pool. So, the page pool should contain enough entries to keep all pinned pages. Hence, do not make the page pool size less than 64 kb.

If you think that all your data should fit in the main memory, you can use the Storage.INFINITE_PAGE_POOL constant in the Storage.open method. In this case, the page pool will be dynamically extended when a new page is requested. As a result, all pages will be cached and present in memory. So, each page is read from the disk only once. Also, in this case a strong object cache is used instead of a weak object cache. It means that all the fetched objects are also pinned in memory and an object is unpacked only once. It is important to notice that the amount of used main memory will be greater than the database size: all objects will be present in memory in packed (inside the page containing the object) and in unpacked (referenced from the object cache) form.

In some applications (such as for mobile devices) persistency is not needed. But, such Perst container classes as Link, Index, FieldIndex, SpatialIndex, can still be used by the applications. In this case, you can use the NullFile implementation of the IFile interface together with Storage.INFINITE_PAGE_POOL to create a transient, in-memory database. Data in this case will never be written on the disk.

There are some constants defined in the StorageImpl class which have influence on the initial and maximal database size. If you want to change them, you will have to rebuild Perst. Below is a detailed description of these parameters:

Parameter Default value Description

dbDefaultInitIndexSize 1024 Initial object index size. The object index is increased on demand. Reallocation of the index is an expensive operation and so, to minimize the number of such reallocations, the object index size is always doubled. Specifying a larger initial index size allows a reduction in the number of future reallocations and so, increases performance a little bit. But it also leads to a larger initial size of the database file. With the default value of this parameter, the initial database size is about 50 kb.

dbDefaultExtensionQuantum 4Mb Database extension quantum. Memory is allocated by scanning a bitmap. If there is no hole large enough, then the database is extended by this value. Increasing the value of this parameters leads to less frequent rescanning of the allocation bitmap from the very beginning. It leads to faster allocation speeds and better locality of reference for the created objects (because there is a greater chance that they will be allocated sequentially). On the other hand, it leads to less efficient memory usage. Reducing the value of this parameter forces the reuse of existing holes in the memory allocation bitmap.

dbObjectCacheInitSize 1319 Size of object cache. Perst needs this cache to check if an object with a given OID is already present in memory. This cache uses weak references to allow the garbage collector to do its work. When some threshold of filling the cache is reached, the cache is reallocated by doubling its size. Once again, increasing this parameter can save some number of cache reallocations.

Parameter	Default value	Description
`dbDefaultInitIndexSize`	1024	Initial object index size. The object index is increased on demand. Reallocation of the index is an expensive operation and so, to minimize the number of such reallocations, the object index size is always doubled. Specifying a larger initial index size allows a reduction in the number of future reallocations and so, increases performance a little bit. But it also leads to a larger initial size of the database file. With the default value of this parameter, the initial database size is about 50 kb.
`dbDefaultExtensionQuantum`	4Mb	Database extension quantum. Memory is allocated by scanning a bitmap. If there is no hole large enough, then the database is extended by this value. Increasing the value of this parameters leads to less frequent rescanning of the allocation bitmap from the very beginning. It leads to faster allocation speeds and better locality of reference for the created objects (because there is a greater chance that they will be allocated sequentially). On the other hand, it leads to less efficient memory usage. Reducing the value of this parameter forces the reuse of existing holes in the memory allocation bitmap.
`dbObjectCacheInitSize`	1319	Size of object cache. Perst needs this cache to check if an object with a given OID is already present in memory. This cache uses weak references to allow the garbage collector to do its work. When some threshold of filling the cache is reached, the cache is reallocated by doubling its size. Once again, increasing this parameter can save some number of cache reallocations.

Now, some hints on how to increase Perst performance and reduce the size of used main memory:
If your database performs a lot of updates of persistent data, then the main limiting factor is the speed of writing changes to the disk; especially a synchronous write to the disk performed by commit. If you will do a commit after each update, then the average speed will be about 10 updates per second (this estimation is based on the assumption than average disk access time is about 10 msec and each transaction commit usually requires writing about 10 pages at random places in the file). But, it is possible to dramatically increase update performance if you group several updates in one transaction. Perst creates a shadow of an object when it is updated the first time inside a transaction. If an object will be updated once in n transactions, then n shadows will be created. If an object will be updated n times inside one transaction, then a shadow will be created only once. This explains the advantage of having one large transaction.

But, if you will perform an update of a large number of objects in one transaction and for each updated object a shadow is created, then it leads to a significant increase in the database file size. If you update each object in the database inside one transaction, the database size will be almost doubled! However, if you perform each update in a separate transaction, then the size of the database will be almost unchanged (because the space of allocated shadow objects will be reused in this case). So the best approach is to perform a commit after 100 or 1000 updates; it will reduce the overhead of each commit and minimize the database size.

If your persistent object forms a tree or graph where all objects can be accessed by reference from the root object, then once you load the root object in the main memory and store a reference to it in some variable, GC will never be able to collect any instance of a loaded persistent object (because it will be accessible from the root object). So, when you application accesses more and more objects from the storage, at some moment of time all of them will have to be loaded in the main memory. This can cause space exhaustion. To prevent this situation you should avoid storing references to container objects which contain references to a large number of members in variables. This is especially true when storing the root object. In this case, GC is able to do its work and throw out from the memory, objects which are not used at this moment (to which there are no references from the local variable). But, it is important to note that objects accessible through the index by a key can be normally deallocated by the garbage collector. So, special care is not needed in this case.

Tricks and Tips

When to use what collection structure

Link: To be used for relatively small collections (objects < 100).
Relation: Is essentially a Link with the addition of being a 1-n relation. The Projection class can be used for 'query like' purposes.
FieldIndex: To be used for large collections (objects > 100). Indexed on a known attribute or a number of attributes (>1 attributes constitutes a 'composite key'). FieldIndex is implemented using a B+Tree. The B-Tree page size is 4 kb, so the minimal size occupied by the index is also 4 kb. Thus, care should be taken as to when and where it should be used.
Index: To be used for large collections (objects > 100). Indexation is done whilst adding the objects to the collection. Index is implemented using a B+Tree.
MultidimensionalIndex: Is useful for Query-By-Examples search including range queries or when search criteria includes various restrictions for different fields. Index is implemented using a KD-Tree.
IPersistentMap: To be used for small or large maps. For small maps, a sorted array is used. For large maps a B-Tree is used. It implements the java.util.Map interface.
BitIndex: Used to locate objects from a set of its boolean properties.
IPersistentList: Provides efficient random access to a very large sequence of objects by using the integer index (position).
IPersistentSet: Very convenient for storing objects in a set (there can be only one instance of an object in the set). IPersistentSet is implemented using a B+Tree.
SortedCollection: Is the best for in-memory databases with complex user-defined keys. It is implemented using a T-Tree (a structure optimized for in-memory databases) and does not store values of keys inside the T-Tree pages, using the provided comparator methods instead.
SpatialIndex: Quickly access two objects with two integer coordinates (such as spatial data). SpatialCollection is implemented using Guttman's R-Tree with the quadratic split algorithm.
SpatialIndexR2: Quickly access two objects with two real coordinates (such as spatial data). SpatialCollectionR2 is implemented using Guttman's R-Tree with the quadratic split algorithm. Unlike SpatialIndex it doesn't pin all its pages in memory, and so, is able to handle larger collections.

When to use values

When a class is to be treated as a storable object *within* another class, one can make it implement the IValue interface. However, care should be taken to never allow the IValue attribute to be null. Example: TimeSeriesTick in the TimeSeriesTest.

Values are always stored in context of the persistent object referencing them. Only those fields of the value objects, which are known at runtime, are stored. If you assigned an object derived from the declared class of the field to the value field, then, the derived fields will not be saved.

Can the Key class be stored in the database?

A Key class is not persistent capable and thus cannot be stored. If one wants to have it readily available one can make it transient and instantiate it through the onLoad() method or via the default constructor of the class.

How to define constructors for persistent objects?

The normal default constructor of a class is always used by Perst when loading objects. This implies that, when one needs to create attributes that are to remain, one has to either:

Check the OID to determine if the object exists (if OID exists, then, the object already exists in the database).
Leave the default constructor as it is and introduce an alternate constructor that is only called on new object creation.

Initialization of transient attributes or resettable attributes should be done via the default constructor, or the onLoad() method that is called when an object is retrieved from the db.

When is the redefinition of recursiveLoading method needed?

By default, Perst recursively loads all referenced objects. Hence, once you access some persistent object, the complete closure of persistent objects, directly or indirectly reachable from this object by reference, is loaded. So, in theory, loading of the root object can cause loading of all the objects in the database. If this is not desired (because loading of all objects will take a significant amount of time or because it will cause memory exhaustion), then you should stop recursion by redefining the recursiveLoading method in some classes. You should explicitly load all objects referenced from the object with disabled recursive loading.

But really, the latter situation is better. It is important to notice that all the Perst collection classes always load their members on demand. It means that once you have a reference, for example, to FieldIndex, only the header of this collection object will be loaded and the members will be loaded on demand. This is true for all other Perst collections: Link, Relation, Index, SortedCollection, IPersistentSet. Since persistent objects in a database are usually accessible through collections, in most cases explicit loading of the persistent objects is not needed. But be careful: if in addition to placing all your objects in some index, you also link them, for example, in an L2-List, then fetching a single object will cause loading of all other objects from this collection!

How to store classes not derived from Persistent

Perst allows you to store transient objects using the standard Java serialization mechanism. To enable this feature, you should set Storage.setProperty("perst.serialize.transient.objects"). Perst will correctly handle reference to a persistent object from the serialized transient objects. But it is important to notice that the identities of the saved transient objects are not preserved. If there was a transient object O referenced from persistent objects P1 and P2, then, when objects P1 and P2 are stored, an instance of object O will be serialized both in P1 and P2’s records. And, when objects P1 and P2 will be reloaded from the database, they will refer to different instances O1 and O2. Also, please notice that it is impossible to mark transient objects as modified. If some of the referenced transient objects are updated, you should also update the persistent object referencing (directly or indirectly) these objects.

When to commit a transaction?

The commit operation requires synchronous write to the disk. It is a very slow operation (modern disks are not able to perform more than 100 such operations per second). So, to improve performance it is better not to perform commit so frequently. But certainly you should realize than once some problem arrives (the application or system crashes or just the power fails), then all the uncommitted data will be lost.

How to deallocate objects

Perst doesn't automatically exclude deleted objects from any indices containing them. So, it is the responsibility of a programmer to remove an object from all indices before it is deallocated.
Perst also doesn't remove any object referenced from the removed object. If this is needed, the programmer should do it himself.
Explicit deletion of objects can cause two problems:

Dangling references (references to the removed objects).
Garbage in the database (unreferenced objects).

The first problem is the most critical and can cause the corruption of database data. To prevent this problem it is strongly recommended to use the Perst garbage collector instead of using explicit memory deallocation. If this is not possible (due to performance or some other reasons), it can still be used for debugging. Since Perst GC is able to detect both problems: it will cause a StorageError(StorageError.INVALID_OID) exception if a reference to the deleted object is found and return a non zero number of collected objects if there is garbage in the database.

Why does an application working normally in the single-threaded mode get assertion failures or some other exceptions if the database is updated by more than one thread?

Perst doesn't synchronize itself with the access of an application to the persistent objects. It is the responsibility of the application to set proper locks to avoid concurrent access to the same object. Just using beginThreadTransaction and endThreadTransaction is not enough for proper concurrent work with persistent objects. The following alternatives are possible:

Access a database from only one thread.
Access a database from multiple threads but use a global lock to provide mutual exclusion of each thread. So, thread-lock a mutex, then perform some operations with the database, commit transaction and unlock mutex.
Use per-thread transactions in the EXCLUSIVE_TRANSACTION mode. This approach is similar to 2), but Perst will provide exclusion of threads itself.
Use per-thread transactions in the SERIALIZABLE_TRANSACTION mode + object level locking + alternative B-Tree implementation.

Please notice that in alternatives 1-3 only one thread is accessing the database at each moment of time; so, it is not correct to say it is concurrent execution. But it doesn't mean that with approach 4 you will get the best performance. This is because of the locking overhead and alternative B-Tree implementation which is less efficient than the original implementation for very large databases. Also, approach 4 is the most difficult to program because setting proper locks is the responsibility of the programmer and incorrect locking can cause synchronization problems: race conditions or deadlocks (two or more threads mutually block each other).

Please look at the tst/TestServer.java example which illustrates how to use the per-thread transactions and locking when a database is concurrently accessed from multiple threads.

Why is there a significant slowdown in the speed of inserting new records in a database when the size of the database is increased? How is it possible to improve insertion performance?

The larger a database is, the higher is the probability of a page cache miss. In other words, when a database is small, all pages are cached in the page pool and no disk accesses are needed at all. Once the database becomes larger than the page pool size, some of the pages have to be thrown away from the cache (and also loaded on demand). When the database size is increased, the probability of locating a requested page in the page pool becomes smaller and hence, the number of disk reads as well as the time taken for database operations are increased.

If you insert keys in the index in any random order, it is most likely that loaded pages will be located at random offsets within the database file. Thus, access to the pages requires positioning of the disk head. The average disk positioning time for modern disks is about 10 msec. It means that a disk is able to perform about 100 disk accesses per second. If a database is so large that each operation with a B-Tree requires loading of all the pages in the path from the root to a leaf page (for a large number of objects the path will be at least 3 pages), then, the database will be able to perform about 30 inserts (or searches) per second. Inserting 1 million objects in this worst case scenario will take about 10 hours! Fortunately, real performance is significantly better (because of page cache hits, file system cache...).

There are two obvious ways of improving performance:

Increase the page pool size (but it should not be larger than the available physical memory, otherwise you will get swapping and performance degradation).
Insert objects into the index in sorted order. In this case the same B-Tree pages will be accessed for inserting subsequent objects and the number of disk accesses will be decreased dramatically. If you are importing data from some text file, I suggest you sort it first by the index key using the "sort" utility. If this is not possible, objects can be loaded in the memory, stored in some array and then this array can be sorted using the standard JDK Arrays.sort method. If the number of loaded objects is too large to fit in the memory, I recommend you load some number of objects which fit in the memory, sort them and then insert them in the index. Then, load the next portion of objects, sort them, insert them in the index and so on.

What is the most efficient way of storing multimedia data?

Usually databases with multimedia data (images, video, texts . . . ) are very large and most of the space is taken by the BLOB (binary large object) storing this multimedia data. To be able to efficiently perform a search in such a database it is necessary to separate multimedia data and the metadata describing it (name, description, keywords, categories . . .). The size of the description data is much smaller than the size of the multimedia data itself, so it can completely fit in the memory (in the page pool) and be searched very fast.

Perst provides four approaches which should be used together to provide the best performance for multimedia applications:

Blob class: allows efficient storage of large binary objects.
Custom allocator: allows the allocation of instances of specified classes in a special way (in separate space).
Multifile: a virtual file consisting of several physical files (segments). Each segment can be placed on a separate disk, making it possible to create very large databases and what is more important in our case: place BLOB objects in separate files.
LRU page pool limit: prevents the caching of BLOB pages. By default Perst uses the LRU algorithm for finding a candidate for replacement. But for BLOBs this strategy is not optimal and fetching a BLOB can cause the whole page pool to be flushed if LRU discipline is used. There is a high probability that the fetched BLOB pages will not be used any more. Hence, it is preferable not to cache BLOB pages at all (throw away such a page immediately when it is not used any more). Prohibiting the caching of pages in the file having offsets larger than some threshold value in conjunction with using a custom allocator for the BLOBs makes it possible to switch off caching for BLOB pages.

Now, we shall see how to use all these four approaches. BLOBs can be created using the Storage.createBlob method. BLOBs allow incremental writing and retrieving of BLOB content. To create a multifile, you should create a multifile description file describing its segments. Each line of this file should contain the path to the physical file and the size of this segment (in kilobytes). Size of the segment should be aligned on the database page boundary (it should contain an integer number of pages). For the last segment, the size can be skipped. The file can look something like this (tst/testalloc.mfd):

    "testalloc1.dbs" 1125899906842624
    "testalloc2.dbs"

To open a multifile it is necessary to pass the Storage.open method path to the multifile description file prepended by the '@' character.

Please notice that space in the multifile segments in allocated on demand, so it is possible to specify a very large segment size without the risk of running out of space on the disk. In this way, we can place BLOBs in separate files. For example, in testalloc.mfd the multifile description shown above the size of the first segment, is set to 2^60 bytes. The first segment will be used to store all the database objects except BLOBs. Since 2^60 is very large number, we should not be afraid that at some point in time, the space in the first segment will get exhausted (space on the disk gets exhausted earlier).

The custom allocator is a persistent object implementing the org.garret.perst.CustomAllocator interface. Programmers can provide their own implementation of this interface or create a bitmap allocator using the Storage.createBitmapAllocator method. This method takes four parameters: size of the allocation quantum, base address of the allocator, extension quantum, and the limit of allocated space. By setting the allocator's base address equal to the base address of the multifile segment ((1L << 60) in case of above multifile definition). Then, it is necessary to register the allocator. This should be done using the Storage.registerCustomAllocator method, associating this allocator with the particular class. Space in the storage, for all instances belonging to this class or derived from it, will be allocated using this allocator. If we want to separate BLOBs from other data we should bind this allocator with the BLOB class.

And now the last thing: we want to disable caching for BLOB pages. Perst has the "perst.page.pool.lru.limit" property which can be used to disable caching for all pages in the file which have an offset larger than the specified limit. The default value of this parameter is 2^60 (1L << 60). This means that all pages with an offset larger than 2^60 will not be cached in the page pool. Hence, in our example, as far as we allocate BLOBs in the segment starting at address 2^60, the caching of BLOB pages will be disabled automatically (there is no need to explicitly set the "perst.page.pool.lru.limit" property).

The approaches described in this section may seem to be too complicated. But actually, they are not so difficult. Please have a look at the tst/TestAlloc.java example which illustrates how it is possible to store files in a Perst database. You will see that not a very big effort is needed to efficiently store BLOBs.

How to create a read-only compressed database

Perst provides CompressedFile which uses the ZLIB compression library. This file can be used in the read-only mode, so, the database should be prepared previously.

First, create and initialize a database. Then you should use the org.garret.perst.CompressDatabase utility to compress the database file. This is a command line utility. The first mandatory parameter specifies the path to the database file and the second optional parameter specifies the compression level. This utility creates a compressed file in the same directory where the original file is located but with the ".dbz" extension. This utility splits files into 128 kb blocks and compresses each block separately. You can open a compressed file using a standard zip utility. Inside, you will find several entries: 000000000000, 000000131072, 000000262144 . . .. Each entry corresponds to a 128 kb block of the original file. If you extract and concatenate all of them, you will get the original database file. When the compressed file is ready, you can open it using org.garret.perst.CompressedFile.

Please see the TestBlob example. First, you should fill the database, by running TestBlob. Then, you should compress the database using the CompressDatabase utility. After this you can run TestBlob zip which will extract data from the compressed database. In this example, the original database file size is 315392 bytes and the size of the compressed database is 24436 bytes.