The main advantage of Perst is its tight integration with programming languages. There is no gap between the database and the application data models - Perst directly stores language objects. So there is no need to pack or unpack code, an operation which has to be performed for traditional relational databases. Also Perst (unlike many other OODBMS) requires no special compiler or preprocessor. Yet, it is able to provide a high level of transparency.
Persistent
base class is considered persistent-capable. It is automatically made
persistent and stored in the storage when it is referenced from some other
persistent object and when the
store
method of that object is invoked.
So there is no need (but it is possible) to explicitly assign objects to the
storage.
The storage has one special root object. The root
object is the only persistent object accessed in a special way (using Storage.getRoot
method). All other persistent objects are accessed in the normal way:
either by traversing by reference from another persistent object, or by using
object containers (Index
,
Link
or Relation
). Unlike many other OODBMS, there can be only one root in the
storage. If you need to have several named roots, you should create an Index
object
and use it as the root object.
Perst requires that each persistent-capable class be
derived from the Persistent
class. This makes it impossible to store ‘foreign’
classes in the storage. This is the ‘price to pay’ for Perst’s
ease of use and the absence of any specialized preprocessors or compilers.
Also, components of persistent-capable objects are restricted to the following
types:
boolean, int, short, double, ...
java.lang.String
type
java.util.Date
type
Persistent
or any interface extending the IPersistent
interface.
IValue
interface with the same restrictions on the types of components as
for persistent-capable objects. If perst.implicit.values
property is true
, then any class which is not derived from IPersistent
will be treated as a value. Values are stored inside the persistent object
containing them. The value class should have a default constructor (a constructor
with an empty parameter list) or no constructors at all. The declared type of
the field referencing value should be the same as the actual type of referenced
object (so it is not possible to assign an object of the derived class to this
field). Also, the value class should not have recursive references (direct or
indirect) – e.g. field of value class V cannot have type V.
perst.serialize.transient.objects
property is true, then Perst will serialize all objects of
non-persistent and non-value classes (classes which are not derived from IPersistent
or IValue
interfaces) using the standard Java serialization mechanism. The
object closure will be packed into an array of bytes which is stored in the
database. The class should implement the
java.io.Serializable
interface and should not contain references to persistent objects.
Storage.createLink
method.
Unfortunately, it is not possible to detect if an object is
changed or not without saving the old state of the object and performing a
field-by-field comparison with the new state of the object. But, the overhead
of such a solution (both in terms of space and CPU resources) is very high. In
Perst it is the responsibility of the programmer to save objects in the
storage. This can be done by the Persistent.store
or Persistent.modify
methods.
The Persistent.store
method writes an object, as well as all objects which are not yet
persistent but referenced from this object, to the storage. So, if you create a
tree of objects and assign a reference to the root of this tree to some
persistent object X, it is only necessary to invoke the store()
method in this object X. But, if you update one of the elements in this
tree, you should invoke store()
individually for each such element (X.store()
will NOT
automatically save referenced objects now).
The Persistent.modify
method marks an object as modified but doesn't immediately write it
to the storage. All objects marked as modified will be stored in the storage
during transaction commit (store
method will be invoked for each modified object). So, using the modify
method is preferable if an object is updated multiple times within a
transaction. In this case, instead of storing it several times, it will be
stored only once - at the end.
Perst doesn't support nested non-static classes. The reason is that the non-static class contains a final field pointing to the outer class. As far as a field is final, it is not possible to assign a value to it during object loading in the same way as for all other fields of a persistent object.
The IPersistent
interface declares the recursiveLoading
method. The default implementation of this method in the Persistent
class always returns true
. In this case, Perst will recursively load any object referenced by
a target object when the target object is loaded. Thus, it causes implicit
loading of the cluster of all referenced objects from storage to the main
memory. This is similar to how a serialization mechanism works.
To avoid an overflow of the main memory caused by recursive
loading of all objects from the storage, a programmer has to overload recursiveLoading
method in some classes and return false
in it. Objects
referenced by such an object will not be implicitly loaded and the programmer
has to explicitly invoke Persistent.load
method to load them. So, the recursiveLoading
method
can be used to control loading of objects from storage to main memory.
Also, it is important to notice that containers (Link, Relation, Index
) always load member objects on demand (they do not perform
automatic loading of all objects in the containers). Since access to the
container members is always performed through methods of the container class,
the programmer will never notice it - container methods will always return
loaded object(s).
Perst uses a default constructor (constructor without parameters) to create an object loaded from storage. This means that:
sun.reflect.ReflectionFactory
. But this will work only with Sun's JDK.
onLoad
method which is
invoked after fetching all values of the non-transient fields of the persistent
object from the storage.
So, summarizing all of the above, the proposed mechanism is
convenient and easy to use because it doesn't require the programmer to
explicitly load any referenced object. From another point of view, it is
flexible by providing the programmer control on object loading. Only those
classes (usually containers) which explicitly control loading of their members
(by overloading recursiveLoading
to return a false
value) should be aware about calling the Persistent.load
method.
int
type to long
or
to float
. But changing the type from int
to
short
is not possible. More precisely, Perst is able to perform any
conversion which is supported by Java reflection mechanism (field type XXX can
be changed to YYY if the java.lang.reflect.Field.setXXX
method can be applied to the component with type YYY).
All other schema modifications (such as renaming fields,
incompatible change of field type) cannot be handled by the Perst automatic
schema modification mechanism. In this case, you can use the Perst XML
export/import mechanism. A database can be exported to XML format using the Storage.exportXML
method and then recreated with new class definitions. After it is
saved, data can be imported using the
Storage.importXML
method.
Link
interface to deal with relations of this kind. This interface allows you to
add/delete/inspect members of the relation. Members of the relation can be accessed
by index or be extracted as an array of objects.
Relations can be of two kinds: embedded (where
references to the related objects are stored in relation to the owner object
itself) and standalone (where the relation is a separate object, which
contains the references to the relation owner and the relation members). Both
kinds of relations implement the Link
interface. An embedded relation is created by the Storage.createLink
method and a standalone relation is represented by the persistent class created by the Storage.createRelation method
.
So, a relation between two classes A and B can be implemented in the following way:
Relation type | Object A | Object B |
---|---|---|
one-to-one | B bref; | A aref; |
one-to-many | Link bref; | A aref; |
many-to-one | B bref; | Link aref; |
many-to-many | Link bref; | Link aref; |
Hashtable
or the HashMap
class can be used for this purpose. In databases, usually a more
sophisticated search is required. I do not want to implement the complete SQL
language in Perst, because it immediately makes the DBMS huge and slow. But, in
most cases, an application performs only very simple queries using an exact
match or a key range. This is done in Perst by Index
and IndexField
interfaces. The first interface is used for an independent specification key
and its associated value. IndexField
interface allows you to index objects using one of the fields of
this object (key field).
Indices are created in Perst using the Storage.createIndex
or the Storage.createFieldIndex
methods. There can be several index implementations but, right now,
only one implementation based on B+Tree is provided (because B+Tree is the most
efficient structure for disk-based databases). Methods of the Index
and the FieldIndex
interfaces allow you to add, remove and locate objects by the key.
It is possible to perform a search either by specifying the exact key value or
by specifying a range of key values (the high or low boundary or both of them
can be skipped or can be declared as being exclusive or inclusive). So it is
possible to perform the following types of search:
There are several different ways of selecting objects using index:
IPersistent
get(Key key)
IPersistent[]
get(Key from, Key till)
null
. Both
boundaries can be inclusive or exclusive.
Iterator
iterator()
Iterator
iterator(Key from, Key till, int order)
null
. Both
boundaries can be inclusive or exclusive. Objects can be traversed in the
ascending or descending order of their keys.
If you need a set of persistent objects you should use the Storage.createSet
method. Set is implemented using B+Tree where the object id (OID)
is used as a key.
Perst also supports spatial indices (org.garret.perst.SpatialIndex
) and generic indices with a user-defined comparator (org.garret.perst.SortedCollection
). The spatial index is implemented using Guttman's R-Tree with
quadratic split algorithm. It allows for efficient search of R2 objects. Sorted
Collection provides almost the same methods as FieldIndex
but it uses
a user-defined comparator to compare collection members. A sorted collection is
implemented using a T-Tree and is especially efficient for main-memory
databases.
Interface | Description | Key type | Implementation | Created by |
---|---|---|---|---|
Index |
Index with an explicitly specified key used for exact match or range queries | scalar, string or reference | B+Tree | Storage.createIndex(Class
type, boolean unique) |
Index |
The same as above but assuming that there can be a lot of duplicate key values | scalar, string or reference | B+Tree | Storage.createThinkIndex(Class
type) |
Index |
Random access index optimized for accessing elements both by key and by position | scalar, string or reference | B+Tree | Storage.createRandomAccessIndex(Class
type, boolean unique) |
FieldIndex |
Index constructed for one of the object fields | scalar, string or reference | B+Tree | Storage.createFieldIndex(Class
type, String fieldName, boolean unique) |
FieldIndex |
Random access field index optimized for accessing elements both by key and by position | scalar, string or reference | B+Tree | Storage.createRandomAccessFieldIndex(Class
type, String fieldName, boolean unique) |
MultidimensionalIndex |
Multidimensional index allows to select objects using search conditions for various fields. This index provides better performance than traditional indices if there is no single field in the query with good selectivity. In case of traditional indices (like B-Tree) it is necessary to join large results sets or perform sequential search among large number of objects. Compound index allows to select object only if all keys are specified (or at least prefix keys). And multidimensional index allows to perform search simultaneously taking in account all specified restrictions. But as far as it is implemented using binary tree which nodes contains references to the indexed objects, this index is efficient mostly for in-memory databases or in cases when all traversed objects fits in memory. | scalar, string or any other comparable type | KD-Tree | Storage.createMultidimensionalIndex(Class cls, String[] fieldNames) or Storage.createMultidimensionalIndex(MultidimensionalComparator comparator)
|
BitIndex |
Bit index for searching object by bitmap of properties | persistent object | B+Tree | Storage.createBitIndex() |
IPersistentSet |
Set of persistent objects | persistent object | B+Tree | Storage.createSet() |
IPersistentSet |
Scalable set of persistent objects (can efficiently handle both small and large number of members) | persistent object | Link or B+Tree | Storage.createScalableSet() |
IPersistentList |
List of persistent objects providing random access | persistent object | B+Tree | Storage.createList() |
IPersistentList |
Scalable list of persistent objects (can efficiently handle both small and large number of members) | persistent object | Link or B+Tree | Storage.createScalableList() |
IPersistentMap |
Scalable map of persistent objects (can efficiently handle both small and large number of members) | persistent object | Sorted array or B+Tree | Storage.createMap() |
SpatialIndex |
Index for spatial objects with integer coordinates | Rectangle | R-Tree | Storage.createSpatialIndex() |
SpatialIndexR2 |
Index for spatial objects with real coordinates | RectangleR2 | R-Tree | Storage.createSpatialIndexR2() |
SortedCollection |
Index with a user-defined comparator | any | T-Tree | Storage.createSortedCollection(PersistentComparator
comparator, boolean unique) |
select * from T where x=?;Perst relations can be used to implement simple joins, like the following SQL query fetching all orders to a particular vendor:
select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';In Perst it is possible to first select a vendor using an index search and then traverse the corresponding relations to locate all the orders to this vendor.
But sometimes, it is necessary to implement more complex queries.
This is also possible in Perst, without the need to write queries in some
special (non-procedural) language. The Projection
class is
used to combine the results of several simple operations (such as an index
search). Let us start explanation of this class with an example. Consider that
we need to know the details of all the orders with names starting with 'D1'
shipped by vendors having names starting with 'V1' or 'V3'. The SQL statement
which performs this query is as follows:
select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');And now, we shall see how this can be done in Perst. Consider that we have indices for details and vendors:
FieldIndex detailIndex = db.createFieldIndex(Detail.class, "name", true); FieldIndex vendorIndex = db.createFieldIndex(Vendor.class, "name", true);Set of requested orders can be obtained in the following way:
// Projection from vendors to orders (using "orders" field of Link type) Projection v2o = new Projection(Vendor.class, "orders"); // Projection from details to orders (using "orders" field of Link type) Projection d2o = new Projection(Detail.class, "orders"); // Select vendors with name like 'V1%' v2o.project(vendorIndex.getPrefix("V1")); // Select vendors with name like 'V3%' v2o.project(vendorIndex.getPrefix("V3")); // Select details with name like 'D1%' d2o.project(detailIndex.getPrefix("D1")); // Join projections v2o.join(d2o); // Get array of requested orders Order[] orders = (Order[])v2o.toArray(new Order[v2o.size()]);As you can see, the
Projection
class is used for several purposes:
It is possible to derive your own class from the Projection
class
to implement more sophisticated projections than using a single projection
field.
If you need to perform a selection sort in some particular
order then the most efficient way is to use the FieldIndex
method for
it. When you select objects using index, the selected objects are sorted by the
search key. If you need to perform a selection sort by a field which is not the
search key, then you can use the Arrays.sort
method. For
example, if in the query described above we need to sort orders by the price
field, it can be done with the following statement:
Array.sort(orders, new Comparator() { public int compare(Object a, Object b) { return ((Order)a).price - ((Order)b).price; } });
commit,
rollback
or close
methods.The committing of a transaction causes a synchronous write of changed pages to the disk. Synchronous writes (where an application is blocked until data is really flushed to the disk) are very expensive operations. The average positioning time for modern disks is about 10 msec, so they are usually not able to perform more than 100 writes in random places in one second. As a transaction usually consists of several updated pages, this leads to an average performance of about 10 transaction commits per second.
Performance can be greatly increased if you minimize the number of commits (larger transactions). Perst uses a shadow mechanism for transaction implementation. When an object is changed for the first time during a transaction, a shadow of the object is created and the original object is kept unchanged. If the object is updated multiple times during the transaction, the shadow is created only once. Because it uses shadows, Perst does not need a transaction log file. Therefore, in Perst, long transactions will not cause a transaction log overflow as in most other DBMSes. Quite the contrary, if you do not call commit at all. Perst works as a DBMS without transaction support, adding almost no overhead for supporting transactions.
The only drawback of long transactions is the possibility of losing a lot of changes in case of a fault. Perst will preserve consistency of the database, but all changes made since the last commit will be lost.
Perst also supports per-thread transactions. Three types of
per-thread transactions are supported: exclusive, cooperative and serializable.
In an exclusive transaction, only one thread can update the database. In
cooperative mode, multiple transactions can work concurrently and the commit()
method will be invoked only when the transactions of all the threads are
terminated. Serializable transactions can also work concurrently. But unlike
cooperative transactions, the threads are isolated from each other. Each thread
has its own associated set of modified objects and committing the transaction
will cause saving of only these objects to the database. To synchronize access
to the objects in case of a serializable transaction, a programmer should use
lock methods of the IResource
interface. A shared lock should be set before read access to any
object, and an exclusive lock before write access. Locks will be
automatically released when the transaction is committed (so a programmer
should not explicitly invoke the unlock()
method). In
this case it is guaranteed that the transactions are serializable.
It is not possible to use the IPersistent.store()
method in serializable transactions (you should use IPersistent.modify()
instead). That is why it is also not possible to use the Index and the
FieldIndex containers since they are based on the B-Tree and a B-Tree directly
accesses database pages and uses the store()
method to
assign an OID to an inserted object. You should use the SortedCollection
based on T-Tree instead or an alternative B-Tree implementation
(see "perst.alternative.btree" property).
Starting from version 2.66, Perst provides multi-client
access to a database. This is implemented using OS file locks. Databases are
accessed in multiple-readers-single-writer mode. In order to enable
multi-client support, it is necessary to set the
"perst.multiclient.support" property. Then, all accesses to a
database should be performed within explicit transactions, i.e. transactions
should be started using the beginThreadTransaction method. If a transaction is
read-only, then the
Storage.READ_ONLY_TRANSACTION
mode should be used. Several read-only transactions can be executed
in parallel. But if a transaction may modify the contents of the database, then
the Storage.READ_WRITE_TRANSACTION
mode should be used and the transaction will have exclusive access
to the database.
Transaction access to the database is synchronized for the different
processes in one system as well as for the different threads within one
process. Transactions can be nested - you can invoke beginThreadTransaction
as many times as you need, assuming that the same number of endThreadTransaction
calls are invoked. So, you can wrap the code of each method
accessing the database with the
beginThreadTransaction
and the endThreadTransaction methods
. It is not possible to upgrade a transaction from shared (READ_ONLY_TRANSACTION
) to exclusive (READ_WRITE_TRANSACTION
). Hence, you should always use the READ_WRITE_TRANSACTION
if the transaction may modify the database.
The outermost endThreadTransaction
will cause the transaction to commit and release the OS file lock. Unlike endThreadTransaction
, the rollbackThreadTransaction
doesn't consider the counter of nested transactions - it aborts all
nested transactions. If some process accessing the database crashes, the file
lock held by this process is automatically released. According to the
specification of the
java.nio.channels.FileChannel.lock
method, some operating systems may not support shared locks. In
this case the only choice is to always use the READ_WRITE_TRANSACTION mode
. Because the java.nio
package was introduced in JDK 1.4, Perst is not able to provide
multi-client access with earlier JDK releases.
Replication is page based - all dirty pages from the master node page pool are sent to the slave nodes. A slave node detects the moment of transaction commit (when the root page is sent to a slave node and the state index is changed in the database header) and sets an exclusive lock in this case. The read-only transactions at the slave nodes set a shared lock and work with the previous (consistent) state of the database. When a transaction commit is performed by the master node, a slave node uses this lock to transfer the database into the next consistent state. Starting from this moment, the read-only transactions at a slave node will work with the new, consistent state of the database.
Perst supports two replication models: static and dynamic. In the static model, all slave nodes have to be defined when an application is started. The master node tries to connect to all the specified slave nodes. After establishing all possible connections with the active slave nodes, it starts working, replicating the changes to these slave nodes. The content of all statically defined slave nodes is considered to be the same as that of the master node. Thus, when a database is opened, the master expects that all the connected slave nodes have the same state as its own state and will send to them only the pages notified by the application after it is opened. If you are going to add a new static slave node, you should first (before the database is opened) manually copy the database from the master node to this new slave node.
In the dynamic model, it is possible to add new replicated nodes at each moment in time. In this case, Perst transfers the complete database to the new node, synchronizing its state with the master node state. When the new replicated slave node is synchronized with the master node, it will act in the same way as a static, replicated node: the master will send to both of them all the updates performed by the application. The dynamic mode is especially useful for main-memory databases (with an infinite page pool and a NullFile stub). In this case, the size of the database is not expected to be very large and its transfer to a new replicated node should not take a lot of time.
As far as replicated nodes are able to execute read-only
queries, the replication mechanism can also be used to improve performance by
distributing the load between several nodes. To be able to execute read-only
transactions at a slave node, an application should use per-thread transactions
wrapped by the
Storage.beginThreadTransaction(Storage.REPLICATION_SLAVE_TRANSACTION)
and the Storage.endThreadTransaction()
method invocations. It is possible to run many such transactions
concurrently. Only when the master node commits a transaction, a slave node has
to set an exclusive lock blocking all read-only transactions (certainly it has
to first wait for the completion of all currently executing read-only
transactions. So, these transactions should be short enough to avoid blocking
the replication mechanism for a long time). This exclusive lock is released
very fast - immediately after placing a new root page in the pool. Now
read-only transactions will see the new state of the database.
When an application running at the master node closes the storage, the master sends a CLOSE request to all the slave nodes causing them to close their connection with the master. In case of abnormal termination of one of the slave nodes, the master continues to work with the other slave nodes. If no more slave nodes are online, the master continues to work autonomously. The Perst replication mechanism doesn't enforce any particular strategy for the recovery of nodes. It is up to the application how to handle the situation of the master crashing and choosing a new master node.
In Perst, there is the org.garret.perst.Query
class which allows you to execute JSQL queries. To execute a JSQL query it is
necessary to provide the object iterator class (java.util.Iterator) of iterated
objects and the string representing the query condition. It is also possible to
specify indices, resolvers, query parameters. The result of a query execution
is another iterator which can be used to traverse through the selected objects
(objects matching the specified search criteria).
All the Perst collection classes contain the select
method which allows you to filter collection members using a JSQL query. This
method is given the class of objects stored in the collection (for some
collections such as FieldIndex this is known and should not be specified) and
the JSQL condition. As a result of its work, select returns the iterator of the
selected objects.
Example | Meaning |
---|---|
expression | non-terminals |
not | terminals |
| | disjoint alternatives |
(not) | optional part |
{1..9} | repeat zero or more times |
select-condition ::= ( expression ) ( traverse ) ( order ) expression ::= disjunction disjunction ::= conjunction | conjunction or disjunction conjunction ::= comparison | comparison and conjunction comparison ::= operand = operand | operand != operand | operand <> operand | operand < operand | operand <= operand | operand > operand | operand >= operand | operand (not) like operand | operand (not) like operand escape string | operand (not) in operand | operand (not) in expressions-list | operand (not) between operand and operand | operand is (not) null operand ::= addition additions ::= multiplication | addition + multiplication | addition || multiplication | addition - multiplication multiplication ::= power | multiplication * power | multiplication / power power ::= term | term ^ power term ::= identifier | number | string | true | false | null | current | first | last | ( expression ) | not comparison | - term | term [ expression ] | identifier . term | function term | count (*) | contains array-field (with expression) (group by identifier having expression) | exists identifier : term function ::= abs | length | lower | upper | integer | real | string | | sin | cos | tan | asin | acos | | atan | log | exp | ceil | floor | sum | avg | min | max string ::= ' { { any-character-except-quote } ('') } ' expressions-list ::= ( expression { , expression } ) order ::= order by sort-list sort-list ::= field-order { , field-order } field-order ::= field (asc | desc) field ::= identifier { . identifier } fields-list ::= field { , field } user-function ::= identifier
Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate any SQL reserved words.
abs | acos | and | asc | asin |
atan | avg | between | by | contains |
cos | ceil | count | current | desc |
escape | exists | exp | false | floor |
group | having | in | integer | is |
length | like | log | lower | max |
min | not | null | or | real |
sin | string | sum | tan | true |
upper | with |
JSQL extends ANSI standard SQL operations by supporting bit
manipulation operations. Operators and
/or
can be
applied not only to boolean operands but also to operands of integer type. The
result of applying
and
/or
operator to integer operands is an integer value with its bits set
by bit-AND/bit-OR operation. Bit operations can be used for efficient
implementation of small sets. Also raising to a power operation ^ is supported
by JSQL for integer and floating point types.
length()
function.
[]
operator.
If the index expression is out of the array range, then an exception will be
raised.
in
operator can be used
for checking if the array contains the value specified by the left operand.
This operation can only be used for arrays of atomic types: with boolean,
numeric, reference or string components.
exists
operator. The variable specified after the exists
keyword can be used as an index in arrays
in the expression preceded by the exists
quantor. This index variable will iterate through all possible array index values, until the value of the expression becomes
true
or the index runs out of range. Condition
exists i: (contract[i].company.location = 'US')will select all details which are shipped by companies located in US, while the query:
not exists i: (contract[i].company.location = 'US')will select all details which are shipped only by companies outside US.
Nested exists
clauses are allowed. Use of nested
exists
quantors is
equivalent to nested loops using correspondening index variables. For example,
the query
exists column: (exists row: (matrix[column][row] = 0))will select all records, containing 0 in the elements of the
matrix
field, which have
an array of integer type. This construction is equivalent to the following two
nested loops:
boolean result = false; for (int column = 0; column < matrix.length(); column++) { for (int row = 0; row < matrix[column].length(); row++) { if (matrix[column][row] == 0) { result = true; break; } } }The order of using indices is significant! The result of the following query execution
exists row: (exists column: (matrix[column][row] = 0))will be completely different from the result of the previous query. The program may simply hang in the last case due to occurrence of an infinite loop for empty matrices.
IndexOutOfRangeError
exception to notify the end of iteration,
contains with will
just iterate through all the array elements and not throw any exception. Hence,
the performance of
contains with is better.
The like
construction can be used for matching strings with a pattern
containing special wildcard characters '%' and '_'. The '_' character matches
any single character, while the '%' character matches any number of characters
(including zero characters). The extended form of the
like
operator with the escape
part can be used to handle the '%' and '_' characters in the
pattern as normal characters if they are preceded by a special escape
character, specified after the
escape
keyword.
It is possible to search for a substring within a string by
using the in
operator. The expression ('blue' in color)
will
be true for all records which contain 'blue' in their color
field.
Strings can be concatenated by the +
or ||
operators. The latter one was added only for compatibility with the ANSI SQL
standard. Because JSQL doesn't support implicit conversion to string type in expressions,
the semantic of operator +
can be redefined for strings.
company.address.city = 'Chicago'will access records referenced by the
company
component of the
Contract
record and extract the city component of the
address
field of the referenced record from the Supplier
table.
References can be checked for null
by is null
or is not null
predicates. Also, references can be compared for equality with each other as
well as with the special null
keyword. When a null reference is dereferenced, an exception will
be raised by JSQL.
There is a special keyword current
, which can be
used to get a reference to the current record during a table search. Usually, the
current
keyword is used for the comparison of the current record identifier
with other references or for locating the current record identifier within an array
of references. For example, the following query will search the Contract
table for all active contracts (assuming that the field canceledContracts
has
Contract[]
type):
current not in supplier.canceledContracts
A programmer can define methods for structures, which can
be used in queries with the same syntax as normal structure components. Such
methods should have no arguments, except for a pointer to the object to which
they belong (the this
pointer in Java), and should return an atomic value (of boolean,
numeric, string or reference type). Also, the method should not change the object
instance (immutable method). So, user-defined methods can be used for creating virtual components - components which are not stored in a database, but instead are
calculated using the values of other components. For example, you can use the standard
Java
java.util.Date
class with such methods as getYear()
, getMonth()
.
. . . Thus, it is possible to specify queries like: "delivery.getYear = 99
"
in an application where the delivery
record field
has the
java.util.Date
type.
Name | Argument type | Return type | Description |
---|---|---|---|
abs | integer | integer | absolute value of the argument |
abs | real | real | absolute value of the argument |
sin | real | real | sin (rad) |
cos | real | real | cos (rad) |
tan | real | real | tan (rad) |
asin | real | real | arcsin |
acos | real | real | arccos |
atan | real | real | arctan |
exp | real | real | exponent |
log | real | real | natural logarithm |
ceil | real | real | the smallest integer value that is not less than the argument |
floor | real | real | the largest integer value that is not greater than the argument |
integer | real | integer | conversion of real to integer |
length | array | integer | number of elements in array |
lower | string | string | lowercase string |
real | integer | real | conversion of integer to real |
string | integer | string | conversion of integer to string |
string | real | string | conversion of real to string |
upper | string | string | uppercase string |
To be able to use indices, a programmer has to obtain a query
object and register the indices in it. JSQL supports the Perst Index
and FieldIndex
indices. The indices are located by JSQL by their field name. It is assumed
that class objects returned by the index iterator are the same as specified in the
select
query. If you are going to use the same query instance for
selecting data from different collections (with different collection member
classes), please check that there are no name conflicts for the key fields,
i.e. index corresponding to one collection will not be applied for another
collection.
Maintaining indices is the responsibility of a programmer. JSQL is only using the indices for searching elements by their keys.
Database
class. This class emulates a relational database on top of Perst. Certainly,
even with this class, Perst provides a very object-oriented way of dealing with
persistent data.
The Database
class allows you to create/drop tables, add/drop indices, create/update/delete
records. It automatically updates indices when a new record is added or
removed. Also, the Database
class makes it possible to prepare and execute queries.
The Database
class is used as a wrapper for the underlying Perst storage. The storage
passed to the constructor of the Database
class should be opened and either not initialized,
or initialized before by the same Database
constructor. It is not possible
to attach a database to the Perst Storage with an existing root object other than one set
by the Database
constructor.
A table in a database is associated with the Java class.
Almost all methods of the database class require passing of the Class
parameter or persistent object instance.
In the latter case, class of this object is used. If Perst finds no table associated with the class, it tries to find a table
for the superclass and so on... Hence, it is possible to have a "polymorphic" table - a table containing records
(objects) of multiple classes derived from this one. Methods dealing with an object instance
have an overloaded version where the class can be specified explicitly. This
makes it possible to place an object in a table associated with some particular interface.
A database provides two ways of querying tables: using simple
and prepared queries. In the first case, Database.select
directly returns the result of query execution (iterator). In the second case,
it is possible to execute a prepared query multiple times, specifying different
values of parameters.
Please find more information on the Database
class methods in
Javadoc and look at the JSQL-SSD example -
Supplier-Shipment-Detail database
built using the Database
class.
Concerning the interface to a database system, there is the
object persistence aspect which should control transparent object loading and
storing to/from the storage. Without AspectJ, Perst uses a programmer
controlled, recursive object loading mechanism (when some object is requested,
all objects referenced from this object will also be fetched unless they
redefine recusriveLoading
method to
return false
. To store a modified object in a database, a programmer should
explicitly call the
store()
or the modify()
method. With AspectJ this is not needed: persistence aspect will
automatically load referenced object and mark it as modified when values of
some of the object’s fields are changed. Also, there is no need to derive
all the persistent-capable classes from the Persistent
base class;
it can be easily done using the AspectJ declaration construction.
The AspectJ interface to Perst was developed by
Patrick Morris-Suzuki. It is located in
the package:
org.garret.perst.aspectj
. To build it, you can use
perst/src/compile_aspectj.bat
batch file (AspectJ should be properly installed before). It will
build
perst/lib/perst_aspectj.jar
file. Each persistent-capable method must implement
org.garret.perst.aspectj.AutoPersist
, which is a marker interface with no methods. However, the user can
avoid this by writing a simple aspect to define the persistent-capable classes
like this:
public aspect PersistentClasses { declare parents: mypackage.* implements AutoPersist; }Then you can build your application like this:
ajc *.java -aspectpath /perst/lib/perst_aspectj.jarRestrictions of AspectJ API to Perst:
hashCode()
and equals()
methods will work correctly only after an object is assigned its OID.
In the perst/tst/aspectj
directory there is an example of an application using AspectJ
interface to Perst: Guess. It is the same example as "Guess an
animal" located in the perst/tst
directory (which does not use
AspectJ), but it doesn't contain explicit calls to the store()
method. Also, the original Guess example recursively loaded the while game tree
when the root object is accessed; whereas the AspectJ version fetches object
instances only on demand. You can build the examples using the compile.bat
script and run them using the Guess.bat
script.
With JAssist you can use a normal Java compiler. The Perst
translator for JAssist automatically makes all classes matching specific name
patterns persistent capable. With this translator, it is not required for the application
classes to be derived from the Persistent
class and provide an empty default constructor.
To be able to use the JAssist interface, you should use the special JAssist class loader instead of the default Java class loader to load all classes which you want to make persistent capable. Classes with fully qualified names matching one of the specified patterns will be transformed during loading. Other classes are loaded unchanged.
I have slightly modified the JAssist code to be able to distinguish access to the this field
and to the fields of foreign objects. This makes it possible to eliminate extra overhead
for the this field access. If you are going to use the original JAssist library,
then you should comment my code using !isSelfReader()
condition.
In this case Perst will not be able to handle access to the foreign (non this) fields.
You should use the getter/setter methods instead.
Alternatively, you can replace this condition with true
,
allowing access to foreign fields but with significant degradation of
performance and increased code size, because in this case before ALL accesses
to fields of a persistent-capable object, a call to load()
method will be inserted.
The modified versions of the JAssist files are included in the Perst distribution in the
perst/src/org/garret/jassist/edit
directory.
java.lang.Object
, then, replace the superclass with org.garret.perst.Persistent
.
Persistent.load()
method at the beginning of each instance method.
Persistent.modify()
method before each write to a field.
recursiveLoading
method returning false
.
The JAssist interface has only one restriction: you should not access the fields of any persistent object other than a this object in a constructor. Below is an example using the JAssist interface:
package com.mycompany.mypackage; import org.garret.perst.jassist.PerstTranslator; import javassist.*; public class MyApp { public static void main(String[] args) { Translatator trans = new PerstTranslator(new String[]{"com.mycompany.*"}); ClassPool pool = ClassPool.getDefault(trans); Loader cl = new Loader(pool); cl.run("Main", args); } }In this example all the classes from
com.mycompany.mypackage
except MyApp will be loaded by the JAssist class loader and
automatically made persistent capable.
To build the JAssist interface, you should use the compile-aspectj.bat
file in the perst/src directory. The javassist.jar
file is
included in the Perst distribution and located in the perst/lib directory. You
can find examples using the JAssist interface in the perst/tst/jassist
directory.
Perst performs cyclic scanning of the bitmap pages. It
keeps the identifier of the current bitmap page and the current position within
the page. Every time an allocation request arrives, scanning of the bitmap
starts from the current position. When the last allocated bitmap page is
scanned, scanning continues from the beginning (from the first bitmap page)
until the current position is reached. When no free space is found after a full
cycle through all bitmap pages, a new block of memory is allocated. The size of
the extension is the larger of the size of the allocated object and the extension
quantum. The extension quantum is a parameter of the database, specified in the
constructor. A bitmap is extended to be able to map additional space. If the virtual
space is exhausted and no more bitmap pages can be allocated, then the OutOfMemory
exception
is raised.
Allocating memory using bitmaps provides a high locality of references (objects are mostly allocated sequentially) and also minimizes the number of modified pages. Minimization of the number of modified pages is significant when the commit operation is performed and all dirty pages should be flushed to the disk. When all cloned objects are placed sequentially, the number of modified pages is minimal and so, the transaction commit time is also reduced. Using the extension quantum also helps to preserve sequential allocation. Once a bitmap is extended, objects will be allocated sequentially until the extension quantum will be completely used up. Only after reaching the end of the bitmap, scanning restarts from the beginning, searching for holes in previously allocated memory.
To reduce the number of bitmap page scans, Perst associates a descriptor with each page, which is used to remember the maximal size of the hole on the page. Calculation of the maximal hole size is performed in the following way: if an object of size M cannot be allocated from this bitmap page, then the maximal hole size is less than M. Thus, M is stored in the page descriptor if the previous value of the descriptor is larger than M. For the next allocation of an object of size greater or equal to M, we will skip this bitmap page. The page descriptor is reset when some object is deallocated from this bitmap page.
Some database objects (like hash table pages) should be aligned on a page boundary to provide more efficient access. The Perst memory allocator checks the requested size and whether it is aligned on a page boundary. Then, the address of the allocated memory segment is also aligned on a page boundary. A search for a free hole will be faster in this case, because Perst increases the step size of the current position increment according to the value of the alignment.
To be able to deallocate memory used by an object, Perst needs to keep information about the object size somewhere. The Perst memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by a record header, which contains the record size and a pointer to an L2-list linking all records in the table. So, the size of the table record object can be extracted from the record header. Page objects always occupy the whole database page and are allocated at the positions aligned on a page boundary. Page objects have no headers. Perst distinguishes page objects from normal objects by using a special marker in the object index.
When an object is modified the first time, it is cloned (a copy of the object is created) and the object handle in the current index is changed to point to the newly created object copy. The shadow index still contains the handle which points to the original version of the object. All changes are done with the object copy, leaving the original object unchanged. Perst makes a mark in a special bitmap page of the object index which contains the modified object handle.
When a transaction is committed, Perst first checks if the size of the object index was increased during the current transaction. If so, it also reallocates the shadow copy of the object index. Then, Perst frees memory for all "old objects", i.e. objects which were cloned within a transaction. Memory cannot be deallocated before a commit, because we want to preserve the consistent state of the database by keeping a cloned object unchanged. If we deallocate memory immediately after cloning, a new object can be allocated in place of the cloned object and we lose consistency. As far as memory deallocation is performed in Perst by bitmaps using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in the bitmap page, which also should be cloned before modification. Cloning the bitmap page will require new space for allocating the page copy, and we can reuse the space of deallocated objects. However, it is not acceptable due to the reason explained above - we will lose database consistency. That is why deallocation of an object is done in two steps. When an object is cloned, all bitmap pages used for marking the object’s space are also cloned (if not cloned before). So when a transaction is committed, we only clear bits in the bitmap pages and no more requests for allocation of memory can be generated at this moment.
After deallocation of old copies, Perst flushes all modified pages to the disk to synchronize content of the memory and the disk files. After that Perst changes the current object index indicator in the database header to switch the roles of the object indices. Now the object index, which was current becomes shadow, and the shadow index becomes current. Then, Perst again flushes the modified page (i.e. page with the database header) to the disk, transferring the database to a new, consistent state. After that Perst copies all modified handles from the new object index to the object index which was previously shadow and now becomes current. At this moment, contents of both indices are synchronized and Perst is ready to start a new transaction.
The bitmap of the modified object index pages is used to minimize the time taken for committing a transaction. Not the whole object index, but only its modified pages should be copied after committing of the transaction bitmap is cleared.
When a transaction is explicitly aborted by the Storage.rollback
method, the shadow object index is copied back to the current
index, eliminating all changes done by the aborted transaction. After the end
of copying, both indices are identical again and the database state corresponds
to the moment before the start of the current transaction.
The allocation of object handles is done by the free handles list. The header of the list is also shadowed and two instances of the list header are stored in the database header. Switching between them is done in the same way as switching of object indices. When there are no more free elements in the list, Perst allocates handles from the unused part of a new index. When there is no more space in the index, it is reallocated. An object index is the only entity in a database which is not cloned on modification. Instead of this, two copies of the object index are always used.
There are some predefined OID values in Perst. OID 0 is reserved as the invalid object identifier. OIDs starting from 1 are reserved for bitmap pages. The number of bitmap pages depends on the database’s maximum virtual space. For one terabyte virtual space, 8 kb page size and 64 byte allocation quantum, 32 K bitmap pages are required. So, 32 K handles are reserved in the object index for the bitmap. Bitmap pages are allocated on demand, when the database size is extended. Hence, the OID of the first user object will be 0x8002.
The recovery procedure is trivial in Perst. There are two
instances of object index, one of which is current and the other corresponds to
the consistent database state. When a database is opened, Perst checks the database
header to detect if the database was closed normally. If not (the dirty
flag
is set in the database header), then Perst performs a database recovery.
Recovery is very similar to the rollback of a transaction. The indicator of the
current index in a database object header is used to determine the index
corresponding to the consistent database state and the object handles from this
index are copied to another object index, eliminating all changes done by the uncommitted
transaction. As long as the only action performed by the recovery procedure is
copying of an object’s index (in reality only handles having different
values in the current and shadow indices are copied to reduce the number of
modified pages) and the size of the object index is small, a recovery can be performed
very fast. Fast recovery procedures reduce the "out-of-service" time
of an application.
The table below summarizes the special features of Perst:
perst.jar
build.
You can also build it yourself using the
compile.bat
script in the
src
directory. The only thing you need to work with the database is to
include this JAR in your class path. A template for a Perst application is as
follows:
import org.garret.perst.*; public class YourPersistentClass extends Persistent { int x; String y; Link links; ... void doUpdate() { x = 1; y = "Hello World"; links = getStorage().createLink(); store(); // save changes in the database } } public class YourRoot extends Persistent { YourPersistentClass ref; Index myIndex; void foo() { ref.doUpdate(); // referenced object has no to be explictely loaded } }; public class YourApplication { static public void main(String[] args) { String databaseFilePath = args[0]; // path to the database file int pagePoolSize = Integer.parseInt(args[1], 10); // size of page pol in bytes Storage storage = StorageFactory.getInstance().createStorage(); storage.open(databaseFilePath, pagePoolSize); YourRoot root = (YourRoot)storage.getRoot(); if (root == null) { // Storage was not initialized yet root = new YourRoot(); root.myIndex = storage.createIndex(String.class, true); // unique index storage.setRoot(root); } root.foo(); // all objects referenced from root are implicitly loaded YourPersistentClass obj = root.myIndex.get(new Key("John")); // find object with key "John" ... // do something else with database storage.commit(); // commit transaction ... // do something else with database storage.close(); // commit the last transaction and close the database } }
You can also look at these Perst examples:
Link
interface.
Database
class.
Version
and VersionHistory
classes and their performance.
Projection
class.
The tests are located in the tst
directory and can
be built using the compile.bat
script.
perst/src15
and run the
compile.bat
script. It will build perst15.jar
(a precompiled
version is included in the distribution). You can also try the TestIndex
example for JDK 1.5 located in the
perst/src15/tst
directory.
This example illustrates using the Perst generic containers and the new Java foreach
construction.
Perst Lite does not use the Java reflection mechanism. So,
to be able to store/fetch persistent objects to/from the database, the
programmer must provide special pack/unpack routines. Perst provides the ISerializable
interface, which defines the following methods:
/** * Interface which should be implemented by any object that can be serialized. * Implementation ofwriteObject
readObject
methods should * store/fetch content of the object to/from the provided stream. * The order of storing/fetching fields should be the same in both methods. */ public interface ISerializable { /** * Store object fields in the stream. * @param out stream to which object is written */ void writeObject(IOutputStream out); /** * Fetch object fields from the stream * @param in stream from which object data should be fetched */ void readObject(IInputStream in); }
So, each persistent-capable class should implement this
interface (actually it should be derived from the
org.garret.perst.Persistent
class which provides the basic implementation of this interface)
and implement the writeObject
/readObject
methods. The interfaces IOutputStream
and IInputStream
provide methods for storing and fetching respectively, values of different
types. Below are example definitions of pack/unpack methods:
public class Test extends Persistent { int i; float f; String s; Test o; public writeObject(IOutputStream out) { super.writerObject(out); out.writeInt(i); out.writeFloat(f); out.writeString(s); out.writeObject(o); } public void readObject(IInputStream in) { super.readObject(in); i = in.readInt(); f = in.readFloat(); s = in.readString(); o = (Test)in.readObject(); } }Also, because of restrictions on the JDK 1.1 reflection mechanism, a persistent-capable class should be:
Because Perst Lite does not use reflection, it is not able to provide the functionality required for dynamic access to object components. So, field indices, XML import/export and SubSQL query language are not supported in this version. However, because in Perst Lite, there is no overhead caused by the reflection mechanism, the speed of fetching/storing objects is about four times faster than in the full Perst version.
Note for Blackberry users:
java.io.InputStream.skip
method which is very naively
implemented in some J2ME runtimes: it just reads data from the stream bye-by-byte until
reaches the requested position. For example at Nokia
6110 (S60 3-rd edition) performing 10000 index searches through 10000
records takes ... more than 3 hours! And the database size in this example
is only about 600kb. So this size can easily be loaded in memory.That is why Perst Lite includes special version of PagePool - InfinitePagePool which preloads the whole database file in memory. It was also possible with mainstream Perst version to use infinite page pool. But this implementation of infinite page pool for Perst Lite is much more efficient and requires less memory. But the main difference is that it loads the while database in memory during open. With this implementation search time of the test mentioned above is about 8 seconds Compare it with 3 hours!
With JSR 75 it is more or less straightforward - you should create database file at
the PC using perst11.jar or any simulator which stores files created using JSR 75 API in as normal files
in PC file system. But JSR 75 is not available everywhere. And format of RMS storage
is opaque and vendor specific. But it is possible to import data in the RMS record storage at low level
(as Perst pages) - it is much faster then just inserting data in the database (because
there no need to construct index, allocate space, pack objects,...).
Implementation of org.garret.perst.impl.RandomAccessFile
on top of RMS includes static importData
method which can be perform such import from the arbitrary input stream:
/** * Import data from specified input stream to record store * @param storeName storage name prefix. If there is limitation for storage size, then several storages with * this prefix and suffixes 0, 1, 2,... will be created. * @param in input stream. Data can be imported from resource file (jar file), from network connection or from other * data source * @return true if data was successfully imported, false if storage is not empty or read operation from input stream * is failed */ public static boolean importData(String storageName, InputStream in) throws IOException, RecordStoreException;Using this methods it is possible to prepare database at PC database file (using emulator or perst11.jar library), include it in the midlet .jar file as resource file and import this file to the RMS storage:
protected void startApp() { storage = StorageFactory.getInstance().createStorage(); org.garret.perst.impl.RandomAccessFile.importData("testindex.dbs", getClass().getResourceAsStream("/testindex.dbs")); storage.open("testindex.dbs"); ... }The
importData
method imports the resource file only when RMS storage is
empty. So the is no need to programmer to perform extra checking -
if database is initialized or not.It is possible to use importData method not only for importing resource files from .jar file but also for importing data from any data source, for example from network connection.
writeObject/readObject
methods. The only thing you have to do is to add stubs for the writeObject/readObject
methods in the persistent-capable classes and specify the path to
them as arguments to the SerGen utility. It is necessary to specify stubs in the
list of persistent-capable classes so that the SerGen utility does not have to
load the whole project, resolve all class paths and source locations, and try
to guess the right place and indentation of where to insert these generated
methods. Rather, SerGen uses a simplified solution that only requires the
programmer to explicitly specify the classes and stubs for the writeObject/readObject
methods. It solves both problems: SerGen does not have to guess
which classes are persistent capable or where to generate the bodies of these
methods.The SerGen utility is invoked as follows:
java SerGen list-of-filesHere, list-of-files is a list of source files (*.java) or directories where these source files are located. SerGen recursively traverses these directories, selects files with the .java extension and tries to locate the
writeObject/readObject
methods in them. If the methods are located, then the file is patched: SerGen
first generates a XXX.java.tmp file, removes the original file, and renames XXX.java.tmp
to XXX.java
. The paths of all the patched files are reported on the console
output.
Directory
interface.
Default implementations of this interface include a file system and in-memory directory.
Perst provides its own implementation of the Lucene Directory
interface allowing to store the index directly in a Perst database.The advantages of storing a Lucene index inside a Perst database are the following:
Operation | Lucene+file system | Lucene+Perst |
---|---|---|
Build index | 56 sec | 84 sec |
Index search | 114 sec | 118 sec |
To create Perst storage of a Lucene index you should use the PerstDirectory class which is located in the perst/lucene directory:
Storage db = StorageFactory.getInstance().createStorage(); db.open("file.dbs", pagePoolSize); Directory directory = new PerstDirectory(db);Please refer to the Lucene programming API documentation for how to use this directory (it can be used like any other Lucene directory implementation, for example the standard
FSDirectory
directory. For instance, to build or update the index you should create an index writer:
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), create);Please look at the PerstIndexer.java and PerstSearcher.java examples in the same directory. These examples can work with an index stored in a Perst database, as well as in a standard file system (if
-fs
option is specified).
To store references to Perst persistent objects inside a Lucene database, it is possible to add a special field to the document containing the document's OID (object identifier). The OID of the persistent object can be obtained using the Persistent.getOid()
method
and vice versa - the persistent object can be accessed by its OID using the Storage.getObjectByOID()
method.
But the most convenient way of integrating full text search inside Perst is to use the "Continuous" package. This package provides a transparent RDBMS-like interface, version control, optimistic locking and built-in full text search capabilities (based on the Lucene search engine). For more information please read the Continuous section in the Perst manual.
Another important aspect missed in Perst is full text search. Actually in real life, strict search is rarely needed - it is hard to assume that a user knows the precise name of a book or author. Certainly, the implementation of your own full text search engine is a very complex task, may be even more complex than the implementation of Perst itself. Fortunately there is a good, powerful, fast, well-known and free full text search engine for Java and .Net - Lucene. It would be nice if Perst and Lucene could work together.
Versioning is yet another useful feature required by many applications. Most informational systems dealing with important data (such as finance information) have to store all versions of the documents to be able to retrieve the state of the system at a particular moment of time. Also, such popular services such as a wiki are based on versioning. Perst provides some primitives for version control but they were not integrated in the database. It is the responsibility of the programmer to maintain version histories and locate particular versions.
From my experience with Perst, I found out that the object-oriented
model based on transparent persistence used in Perst is hard to understand for
many people not familiar with object-oriented databases. That is why people
prefer to use the org.garret.perst.Database
wrapper class, which provides some kind of abstraction of the traditional
interface to a relational database system. Unfortunately, it is not possible to
hide all the details from a programmer and some operations like
excluding/including objects in indices should still be explicitly performed by the
programmer.
The main idea behind the development of the Continuous package was to provide an extension of Perst which could address all the issues discussed below:
I hope that the approach provided by the Continuous package can actually be a good compromise in reaching all these goals. What makes me think so is that the ideas used to solve one of these issues, also help in other tasks. For example, multi-versioning is not useful by itself, but allows you to achieve isolation of client transactions without setting a large number of locks (which would result in a reduced level of concurrency and increased probability of a deadlock). Each client transaction works with its own snapshot of the database as seen at the moment the transaction begins, in isolation from other transactions. Only during the transaction commit, a check is performed to verify that changes done by the transaction do not conflict with changes committed by other transactions. So, it is similar to optimistic locking. A simplified table-like data structure, similar to the one used in relational databases, allows you to hide manipulations with indices and a full text search engine from the programmer.
The Lucene full text search engine can be used with Perst in standalone mode - when the Lucene indices are stored in a standard OS file system. This provides the highest level of performance, but as far as neither Lucene, nor the file system guarantee the durability and consistency of the file data in case of a power or system failure, such an incident can cause corruption of the full text search index and necessitate its rebuild (which can take a significant amount of time). Continuous provides its own implementation of the abstract Lucene directory, allowing the storage of Lucene data directly in a Perst database. As far as Perst BLOBs are optimized for sequential access to the BLOB data, they are not able to efficiently perform random access operations used by Lucene and so performance in this case will be worse than in the case of using the standard Lucene FSDirectory (which stores data in OS files). But, the main bonus of the Perst Lucene directory provider is that the Lucene indices becomes transactional - they should survive application and system faults and any inconsistency between normal and full text search data is not possible at all.
To create a full text search index for Perst objects using the Continuous package you only needed to mark particular class fields with the FullTextSearchable annotation attribute:
class Address extends IValue { @FullTextSearchable private String country; @FullTextSearchable private String city; @FullTextSearchable private String street; @Indexable private int zipCode; ... } class Company extends CVersion { @FullTextSearchable @Indexable(unique=true) private String name; @FullTextSearchable private Address location; ... }When objects of this class are stored in the database (the transaction is committed), the Continuous package will automatically extract the value of fields marked with @FullTextSearchable, parse out the individual words, and build a document with this text and associate the document with this persistent object. You can search objects in the same way as with the standard Lucene API, and the search result will contain references to Perst documents.
The @Indexable indicates the field is a standard Perst b-tree index (optionally unique).
Pre-requirements of the Continuous package:
It is possible to build the package using either the compile.bat
script in the perst/continuous/src
directory or using build.xml
in perst/continuous
and the ant
tool. The directory perst/continuous/tst
contains two
examples using the Continuous package. The bank example can also be used as a test
for measuring the performance and scalability of Perst with the Continuous
package.
Javadoc documentation of the Continuous package can be found here.
The speed of accessing data from a disk is several times
slower than the speed of accessing data in main memory. That is why caching of
data is the key to increase database performance. Perst uses a pool of pages to
optimize access to the disk. The page pool size can be specified in the Storage.open
method (by default it is 4 Mb). Usually, increasing the page pool size leads to
better performance. But you will notice the following things:
-Xmx
parameter, but if you do not specify it, the maximal size of the page pool
should not be greater than 32 Mb (at least for Sun's JVM).
If you think that all your data should fit in the main
memory, you can use the
Storage.INFINITE_PAGE_POOL
constant in the Storage.open
method. In this case, the page
pool will be dynamically extended when a new page is requested. As a result,
all pages will be cached and present in memory. So, each page is read from the
disk only once. Also, in this case a strong object cache is used instead of a weak
object cache. It means that all the fetched objects are also pinned in memory
and an object is unpacked only once. It is important to notice that the amount
of used main memory will be greater than the database size: all objects will be
present in memory in packed (inside the page containing the object) and in
unpacked (referenced from the object cache) form.
In some applications (such as for mobile devices) persistency is not needed. But, such Perst container classes
as Link
, Index
, FieldIndex
, SpatialIndex
,
can still be used by the applications. In this case, you can use the NullFile
implementation of the
IFile
interface together with Storage.INFINITE_PAGE_POOL
to create a transient,
in-memory database. Data in this case will never be written on the disk.
There are some constants defined in the StorageImpl
class which have influence on the initial and maximal database size. If you
want to change them, you will have to rebuild Perst. Below is a detailed
description of these parameters:
Parameter | Default value | Description |
---|---|---|
dbDefaultInitIndexSize | 1024 | Initial object index size. The object index is increased on demand. Reallocation of the index is an expensive operation and so, to minimize the number of such reallocations, the object index size is always doubled. Specifying a larger initial index size allows a reduction in the number of future reallocations and so, increases performance a little bit. But it also leads to a larger initial size of the database file. With the default value of this parameter, the initial database size is about 50 kb. |
dbDefaultExtensionQuantum | 4Mb | Database extension quantum. Memory is allocated by scanning a bitmap. If there is no hole large enough, then the database is extended by this value. Increasing the value of this parameters leads to less frequent rescanning of the allocation bitmap from the very beginning. It leads to faster allocation speeds and better locality of reference for the created objects (because there is a greater chance that they will be allocated sequentially). On the other hand, it leads to less efficient memory usage. Reducing the value of this parameter forces the reuse of existing holes in the memory allocation bitmap. |
dbObjectCacheInitSize | 1319 | Size of object cache. Perst needs this cache to check if an object with a given OID is already present in memory. This cache uses weak references to allow the garbage collector to do its work. When some threshold of filling the cache is reached, the cache is reallocated by doubling its size. Once again, increasing this parameter can save some number of cache reallocations. |
Now, some hints on how to increase Perst performance and reduce the size of used main memory:
If your database performs a lot of updates of persistent
data, then the main limiting factor is the speed of writing changes to the disk;
especially a synchronous write to the disk performed by commit. If you will do a
commit after each update, then the average speed will be about 10 updates per
second (this estimation is based on the assumption than average disk access
time is about 10 msec and each transaction commit usually requires writing
about 10 pages at random places in the file). But, it is possible to
dramatically increase update performance if you group several updates in one
transaction. Perst creates a shadow of an object when it is updated the first
time inside a transaction. If an object will be updated once in n transactions,
then n shadows will be created. If an object will be updated n times inside one
transaction, then a shadow will be created only once. This explains the advantage
of having one large transaction.
But, if you will perform an update of a large number of objects in one transaction and for each updated object a shadow is created, then it leads to a significant increase in the database file size. If you update each object in the database inside one transaction, the database size will be almost doubled! However, if you perform each update in a separate transaction, then the size of the database will be almost unchanged (because the space of allocated shadow objects will be reused in this case). So the best approach is to perform a commit after 100 or 1000 updates; it will reduce the overhead of each commit and minimize the database size.
If your persistent object forms a tree or graph where all objects can be accessed by reference from the root object, then once you load the root object in the main memory and store a reference to it in some variable, GC will never be able to collect any instance of a loaded persistent object (because it will be accessible from the root object). So, when you application accesses more and more objects from the storage, at some moment of time all of them will have to be loaded in the main memory. This can cause space exhaustion. To prevent this situation you should avoid storing references to container objects which contain references to a large number of members in variables. This is especially true when storing the root object. In this case, GC is able to do its work and throw out from the memory, objects which are not used at this moment (to which there are no references from the local variable). But, it is important to note that objects accessible through the index by a key can be normally deallocated by the garbage collector. So, special care is not needed in this case.
Values are always stored in context of the persistent object referencing them. Only those fields of the value objects, which are known at runtime, are stored. If you assigned an object derived from the declared class of the field to the value field, then, the derived fields will not be saved.
onLoad()
method or via the default constructor of the class.
Initialization of transient attributes or resettable attributes should be done via the default constructor, or the onLoad()
method that is called when an object is retrieved from the db.
recursiveLoading
method
in some classes. You should explicitly load all objects referenced from the object
with disabled recursive loading.
But really, the latter situation is better. It is important to notice that all the Perst collection classes always load their members on demand. It means that once you have a reference, for example, to FieldIndex, only the header of this collection object will be loaded and the members will be loaded on demand. This is true for all other Perst collections: Link
, Relation
, Index
, SortedCollection
, IPersistentSet
. Since persistent objects in a database are usually accessible through collections, in most cases explicit loading of the persistent objects is not needed. But be careful: if in addition to placing all your objects in some index, you also link them, for example, in an L2-List, then fetching a single object will cause loading of all other objects from this collection!
Storage.setProperty("perst.serialize.transient.objects")
. Perst will correctly handle reference to a persistent object from the
serialized transient objects. But it is important to notice that the identities
of the saved transient objects are not preserved. If there was a transient
object O referenced from persistent objects P1 and P2, then, when objects P1
and P2 are stored, an instance of object O will be serialized both in P1 and P2’s
records. And, when objects P1 and P2 will be reloaded from the database, they
will refer to different instances O1 and O2. Also, please notice that it is
impossible to mark transient objects as modified. If some of the referenced
transient objects are updated, you should also update the persistent object
referencing (directly or indirectly) these objects.
StorageError(StorageError.INVALID_OID)
exception if a reference to the deleted object is found and return a
non zero number of collected objects if there is garbage in the database.
beginThreadTransaction
and endThreadTransaction
is not enough for proper concurrent work with persistent objects. The following alternatives are possible:
EXCLUSIVE_TRANSACTION
mode. This approach is similar to 2), but Perst will provide
exclusion of threads itself.
SERIALIZABLE_TRANSACTION
mode + object level locking + alternative B-Tree implementation.
Please look at the tst/TestServer.java
example which illustrates how to use the per-thread transactions
and locking when a database is concurrently accessed from multiple threads.
If you insert keys in the index in any random order, it is most likely that loaded pages will be located at random offsets within the database file. Thus, access to the pages requires positioning of the disk head. The average disk positioning time for modern disks is about 10 msec. It means that a disk is able to perform about 100 disk accesses per second. If a database is so large that each operation with a B-Tree requires loading of all the pages in the path from the root to a leaf page (for a large number of objects the path will be at least 3 pages), then, the database will be able to perform about 30 inserts (or searches) per second. Inserting 1 million objects in this worst case scenario will take about 10 hours! Fortunately, real performance is significantly better (because of page cache hits, file system cache...).
There are two obvious ways of improving performance:
Arrays.sort
method. If the number of loaded objects is too large to fit in the memory, I recommend you load some number of objects which fit in the memory, sort them and then insert them in the index. Then, load the next portion of objects, sort them, insert them in the index and so on.
Perst provides four approaches which should be used together to provide the best performance for multimedia applications:
Now, we shall see how to use all
these four approaches. BLOBs can be created using the Storage.createBlob
method. BLOBs allow incremental writing and retrieving of BLOB
content. To create a multifile, you should create a multifile description file describing its segments. Each line of this file should contain the path
to the physical file and the size of this segment (in kilobytes). Size of the segment
should be aligned on the database page boundary (it should contain an integer
number of pages). For the last segment, the size can be skipped. The file can
look something like this (tst/testalloc.mfd
):
"testalloc1.dbs" 1125899906842624 "testalloc2.dbs"To open a multifile it is necessary to pass the
Storage.open
method path to the
multifile description file prepended by the '@' character.
Please notice that space in the
multifile segments in allocated on demand, so it is possible to specify a very
large segment size without the risk of running out of space on the disk. In
this way, we can place BLOBs in separate files. For example, in testalloc.mfd
the multifile description shown above the size of the first segment,
is set to 2^60 bytes. The first segment will be used to store all the database
objects except BLOBs. Since 2^60 is very large number, we should not be afraid
that at some point in time, the space in the first segment will get exhausted
(space on the disk gets exhausted earlier).
The custom allocator is a persistent object implementing the
org.garret.perst.CustomAllocator
interface. Programmers can provide their own implementation of this
interface or create a bitmap allocator using the
Storage.createBitmapAllocator
method. This method takes four parameters: size of the allocation
quantum, base address of the allocator, extension quantum, and the limit of
allocated space. By setting the allocator's base address equal to the base
address of the multifile segment ((1L << 60) in case of above multifile
definition). Then, it is necessary to register the allocator. This should be
done using the Storage.registerCustomAllocator
method,
associating this allocator with the particular class. Space in the storage,
for all instances belonging to this class or derived from it,
will be allocated using this allocator. If we want to separate
BLOBs from other data we should bind this allocator with the BLOB class.
And now the last thing: we want to disable caching for BLOB pages. Perst has the "perst.page.pool.lru.limit" property which can be used to disable caching for all pages in the file which have an offset larger than the specified limit. The default value of this parameter is 2^60 (1L << 60). This means that all pages with an offset larger than 2^60 will not be cached in the page pool. Hence, in our example, as far as we allocate BLOBs in the segment starting at address 2^60, the caching of BLOB pages will be disabled automatically (there is no need to explicitly set the "perst.page.pool.lru.limit" property).
The approaches described in this
section may seem to be too complicated. But actually, they are not so
difficult. Please have a look at the tst/TestAlloc.java
example which illustrates how it is possible to store files in a Perst
database. You will see that not a very big effort is needed to efficiently
store BLOBs.
First, create and initialize a database. Then you should use the org.garret.perst.CompressDatabase
utility to compress the database file. This is a command line
utility. The first mandatory parameter specifies the path to the database file
and the second optional parameter specifies the compression level. This utility
creates a compressed file in the same directory where the original file is
located but with the ".dbz" extension. This utility splits files into
128 kb blocks and compresses each block separately. You can open a compressed
file using a standard zip utility. Inside, you will find several entries:
000000000000, 000000131072, 000000262144 . .
.
. Each entry corresponds to a 128 kb block of the
original file. If you extract and concatenate all of them, you will get the original
database file. When the compressed file is ready, you can open it using org.garret.perst.CompressedFile
.
Please see the TestBlob example.
First, you should fill the database, by running TestBlob
. Then, you
should compress the database using the CompressDatabase
utility. After this you can run TestBlob zip
which will extract data
from the compressed database. In this example, the original database file size is
315392 bytes and the size of the compressed database is 24436 bytes.