Chapter 11. System Management and Deployment Issues

Fred Toussi

The HSQL Development Group

$Revision: 3630 $

Copyright 2002-2010 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQL Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.

$Date: 2010-06-06 10:44:27 -0400 (Sun, 06 Jun 2010) $

Table of Contents

Mode of Operation and Tables
Mode of Operation
Tables
Large Objects
Deployment context
Readonly Databases
Memory and Disk Use
Table Memory Allocation
Result Set Memory Allocation
Temporary Memory Use During Operations
Data Cache Memory Allocation
Object Pool Memory Allocation
Lob Memory Usage
Disk Space
Managing Database Connections
Tweaking the Mode of Operation
Application Development and Testing
Embedded Databases in Desktop Applications
Embedded Databases in Server Applications
Embedding a Database Listener
Using HyperSQL Without Logging
Server Databases
Upgrading Databases
Upgrading From Older Versions
Manual Changes to the *.script File
Backward Compatibility Issues
Backing Up Database Catalogs
Making Online Backups
Making Offline Backups
Examining Backups
Restoring a Backup
Encrypted Databases
Creating and Accessing an Encrypted Database
Speed Considerations
Security Considerations
Monitoring Database Operations
Statement Level Monitoring
Internal Event Monitoring
Server Operation Monitoring
Statements

Mode of Operation and Tables

HyperSQL has many modes of operation and features that allow it to be used in very different scenarios. Levels of memory usage, speed and accessibility by different applications are influenced by how HyperSQL is deployed.

Mode of Operation

The decision to run HyperSQL as a separate server process or as an in-process database should be based on the following:

  • When HyperSQL is run as a server on a separate machine, it is isolated from hardware failures and crashes on the hosts running the application.

  • When HyperSQL is run as a server on the same machine, it is isolated from application crashes and memory leaks.

  • Server connections are slower than in-process connections due to the overhead of streaming the data for each JDBC call.

  • You can reduce client/server traffic using SQL Stored procedures to reduce the number of JDBC execute calls.

  • During development, it is better to use a Server with server.silent=false, which displays the statements sent to the server on the console window.

  • To improve speed of execution for statements that are executed repeatedly, reuse a parameterized PreparedStatement for the lifetime of the connections.

Tables

TEXT tables are designed for special applications where the data has to be in an interchangeable format, such as CSV (comma separated values). TEXT tables should not be used for routine storage of data.

MEMORY tables and CACHED tables are generally used for data storage. The difference between the two is as follows:

  • The data for all MEMORY tables is read from the *.script file when the database is started and stored in memory. In contrast the data for cached tables is not read into memory until the table is accessed. Furthermore, only part of the data for each CACHED table is held in memory, allowing tables with more data than can be held in memory.

  • When the database is shutdown in the normal way, all the data for MEMORY tables is written out to the disk. In comparison, the data in CACHED tables that has changed is written out during operation and at shutdown.

  • The size and capacity of the data cache for all the CACHED tables is configurable. This makes it possible to allow all the data in CACHED tables to be cached in memory. In this case, speed of access is good, but slightly slower than MEMORY tables.

  • For normal applications it is recommended that MEMORY tables are used for small amounts of data, leaving CACHED tables for large data sets. For special applications in which speed is paramount and a large amount of free memory is available, MEMORY tables can be used for large tables as well.

Large Objects

HyperSQL 2.0 supports dedicated storage and access to BLOB and CLOB objects. These objects can have huge sizes. BLOB or CLOB is specified as the type of a column of the table. Afterwards, rows can be inserted into the table using a PreparedStatement for efficient transfer of large LOB data to the database. In mem: catalogs, CLOB and BLOB data is stored in memory. In file: catalogs, this data is stored in a single separate file which has the extension *.lobs. The size of this file can grow to huge, terabyte figures.

LOB data should be store in the database using a JDBC PreparedStatement object. The streaming methods send the LOB to the database in one operation as a binary or character stream. Inside the database, the disk space is allocated as needed and the data is saved as it is being received. LOB data should be retrieved from the database using a JDBC ResultSet method. When a streaming method is used to retrieve a LOB, it is retrieved in large chunks in a transparent manner. LOB data can also be stored by calling a JDBC method with String or byte[] argument, but these methods limit the size of the LOB that can be stored or retrieved.

LOB data is not duplicated in the database when a lob is copied from one table to another. The disk space is reused when a LOB is deleted and is not contained in any table.

By using a dedicated LOB store, HyperSQL achieves consistently high speeds (usually over 20MB / s) for both storage and retrieval of LOBs.

The LOB catalog is stored in the database as a memory table. Therefore the amount of JVM memory should be increased when more than tens of thousands of LOBs are stored in the database.

Deployment context

The files used for storing HyperSQL database data are all in the same directory. New files are always created and deleted by the database engine. Two simple principles must be observed:

  • The Java process running HyperSQL must have full privileges on the directory where the files are stored. This include create and delete privileges.

  • The file system must have enough spare room both for the 'permanent' and 'temporary' files. The default maximum size of the *.log file is 50MB. The *.data file can grow to up to 16GB (more if the default has been increased). The .backup file can be up to the size of the *.data file. The *.lobs file can grow to several terabytes. The temporary files created at the time of a SHUTDOWN can be equal in size to the *.script file and the .data file.

Readonly Databases

A file: catalog can be made readonly permanently, or it can be opened as readonly. To make the database readonly, the property, value pair, readonly=true can be added to the .properties file of the database.

It is also possible to open a normal database as readonly. For this, the property can be included in the URL of the first connection to the database.

There is another option which allows MEMORY tables to be writable, but without persisting the changes at SHUTDOWN. This option is activated with the property, value pair, files_readonly= true, which can be added to the .properties file of the database, or included in the URL of the first connection to the database. This option is useful for running application tests which operate on a predefined dataset.

Memory and Disk Use

Memory used by the program can be thought of as two distinct pools: memory used for table data which is not released unless the data is deleted and memory that can be released or is released automatically, including memory used for caching, building result sets and other internal operations such as storing the information needed for a rollback a transaction.

Most JVM implementations allocate up to a maximum amount of memory (usually 64 MB by default). This amount is generally not adequate when large memory tables are used, or when the average size of rows in cached tables is larger than a few hundred bytes. The maximum amount of allocated memory can be set on the Java command line that is used for running HyperSQL. For example, with Sun JVM, parameter -Xmx256m increases the amount to 256 MB.

Table Memory Allocation

The memory used for a MEMORY table is the sum of memory used by each row. Each MEMORY table row is a Java object that has 2 int or reference variables. It contains an array of objects for the fields in the row. Each field is an object such as Integer, Long, String, etc. In addition each index on the table adds a node object to the row. Each node object has 6 int or reference variables. As a result, a table with just one column of type INTEGER will have four objects per row, with a total of 10 variables of 4 bytes each - currently taking up 80 bytes per row. Beyond this, each extra column in the table adds at least a few bytes to the size of each row.

Result Set Memory Allocation

By default, all the rows in the result set are built in memory, so very large result sets may not be possible to build. In server mode databases, by default, the result set memory is released from the server once the database server has returned the result set. in-process databases release the memory when the application program releases the java.sql.ResultSet object. Server modes require additional memory for returning result sets, as they convert the full result set into an array of bytes which is then transmitted to the client.

HyperSQL 2.0 supports disk-based result sets. The commands, SET SESSION RESULT MEMORY ROWS <integer> and SET DATABASE DEFAULT RESULT MEMORY ROWS <integer> specify a threshold for the number of rows. Results with row counts above the threshold are stored on disk. These settings also apply to temporary tables and subquery tables.

When the setFetchSize() method of the Statement interface is used to limit the number rows fetched, the whole result is held by the engine and is returned to the JDBC ResultSet in blocks of rows of the specified fetch size. Disk-based result sets slow down the database operations and should be used only when absolutely necessary, perhaps with result sets that are larger than tens of thousands of rows.

Temporary Memory Use During Operations

When UPDATE and DELETE queries are performed on CACHED tables, the full set of rows that are affected, including those affected due to ON UPDATE actions, is held in memory for the duration of the operation. This means it may not be possible to perform deletes or updates involving very large numbers of rows of CACHED tables. Such operations should be performed in smaller sets.

When transactions support is enabled with SET AUTOCOMMIT FALSE, lists of all insert, delete or update operations are stored in memory so that they can be undone when ROLLBACK is issued. For CACHED tables, only the transaction information is held in memory, not the actual rows that have changed. Transactions that span thousands of modification to data will take up a lot of memory until the next COMMIT or ROLLBACK clears the list. Each row modification uses less than 100 bytes until COMMIT.

When subqueries or views are used in SELECT and other statements, transient tables are created and populated by the engine. If the SET SESSION RESULT MEMORY ROWS <integer> statement has been used, these transient tables are stored on disk when they are larger than the threshold.

Data Cache Memory Allocation

With CACHED tables, the data is stored on disk and only up to a maximum number of rows are held in memory at any time. The default is up to 50,000 rows. The SET FILES CACHE ROWS command or the hsqldb.cache_rows connection property can be set to alter this amount. As any random subset of the rows in any of the CACHED tables can be held in the cache, the amount of memory needed by cached rows can reach the sum of the rows containing the largest field data. For example if a table with 100,000 rows contains 40,000 rows with 1,000 bytes of data in each row and 60,000 rows with 100 bytes in each, the cache can grow to contain 50,000 of the smaller rows, but as explained further, only 10,000 or the large rows.

An additional property, hsqldb.cache_size is used in conjunction with the hsqldb.cache_rows property. This puts a limit in bytes on the total size of rows that are cached. The default values is 10,000KB. (This is the size of binary images of the rows and indexes. It translates to more actual memory, typically 2-4 times, used for the cache because the data is represented by Java objects.)

If memory is limited, the hsqldb.cache_rows or hsqldb.cache_size database properties can be reduced. In the example above, if the hsqldb.cache_size is reduced from 10,000 to 5,000, it will allow the number of cached rows to reach 50,000 small rows, but only 5,000 of the larger rows.

Data for CLOB and BLOB columns is not cached and does not affect the CACHED table memory cache.

The use of Java nio file access method also increases memory usage. Access with nio improves database update speed and is used by default for data files up to 256 MB. For minimal memory use, nio access should be disabled.

Object Pool Memory Allocation

HyperSQL uses a set of fast pools for immutable objects such as Integer, Long and short String objects that are stored in the database. In most circumstances, this reduces the memory footprint still further as fewer copies of the most frequently-used objects are kept in memory. The object pools are shared among all databases in the JVM. The size of each pool can be modified only by altering and recompiling the org.hsqldb.store.ValuePool class.

Lob Memory Usage

Access to lobs is always performed in chunks, so it is perfectly possible to store and access a CLOB or BLOB that is larger than the JVM memory allocation. Early versions of HyperSQL 2.0 use memory-based tables for the lob catalog (not the data). Therefore it is practical to store about 100,000 individual lobs in the database with the default JVM memory allocation. More lobs can be stored with larger JVM memory allocations. The realistic maximum number of lobs stored in the database is probably about a million. The actual total size of lobs is almost unlimited. We have tested with over 100 GB of lobs without any loss of performance.

Disk Space

With file: database, the engine uses the disk for storage of data and any change. For safely, the engine backs up the data internally during operation. Spare space, at least equal to the size of the .data and .script file is needed. The .lobs file is not backed up during operation.

Managing Database Connections

In all running modes (server or in-process) multiple connections to the database engine are supported. in-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients.

Connection pooling software can be used to connect to the database but it is not generally necessary. Connection pools may be used for the following reasons.

  • To allow new queries to be performed while a time-consuming query is being performed in the background. In HyperSQL, blocking depends on the transaction control model, the isolation level, and the current activity by other sessions.

  • To limit the maximum number of simultaneous connections to the database for performance reasons. With HSQLDB this can be useful if your application is designed in a way that opens and closes connections for each small task. Also, the overall performance may be higher when fewer simultaneous connections are used. If you want to reduce the number of simultaneous sessions, you can use a connection pool with fewer pooled connections.

An application that is not both multi-threaded and transactional, such as an application for recording user login and logout actions, does not need more than one connection. The connection can stay open indefinitely and reopened only when it is dropped due to network problems.

When using an in-process database, when the last connection to the database is closed, the database still remains open. An explicit SHUTDOWN command, with or without an argument, is required to close the database. A connection property on the connection URL or in a properties object can be used to shutdown the database when the last connection is closed.

When using a server database (and to some extent, an in-process database), care must be taken to avoid creating and dropping JDBC Connections too frequently. Failure to observe this will result in poor performance when the application is under heavy load.

A common error made by users in load-test simulations is to use a single client machine to open and close thousands of connections to a HyperSQL server instance. The connection attempts will fail after a few thousand because of OS restrictions on opening sockets and the delay that is built into the OS in closing them.

Tweaking the Mode of Operation

Different modes of operation and settings are used for different purposes. Some scenarios are discussed below:

Application Development and Testing

For application unit testing you can use an all-in-memory, in-process database.

If the tests are all run in one process, then the contents of a mem: database survives between tests. To release the contents you can use the SHUTDOWN command (an SQL command). You can even use multiple mem: databases in your tests and SHUTDOWN each one separately.

If the tests are in different processes and you want to keep the data between the tests, the best solution is to use a Server instance that has a mem: database. After the tests are done, you can SHUTDOWN this database, which will shutdown the server.

The Server has an option that allows databases to be created as needed by making a connection (see the Listeners Chapter). This option is useful for testing, as your server is never shut down. Each time you connect to the mem: database that is served by the Server, the database is created if it does not exist (i.e. has been previously shut down).

If you do not want to run a Server instance, and you need persistence between tests in different processes, then you should use a file: database. You can use the shutdown=true connection property to ensure the database is persisted fully after the connections are closed. An alternative option is to use hsqldb.write_delay=false connection property, but this is slightly slower than the other option.

It has been reported that some data access frameworks do not close all their connection to the database after the tests. In such situations, you need to use zero WRITE DELAY if you want the data to persist at the end of the tests

You may actually want to use a file: database, or a server instance that serves a file: database in preference to a mem: database. As HyperSQL logs the DDL and DML statements in the .log file, this file can be used to check what is being sent to the database. Note that UPDATE statements are represented by a DELETE followed by an INSERT statement. Statements are written out when the connection commits. The write delay also has an effect on how soon the statements are written out.

Some types of tests start with a database that already contains the tables and data, and perform various operations on it during the tests. You can create and populate the initial database then set the property "files_read_only=true" in the .properties file of the database. The tests can then modify the database, but these modifications are not persisted after the tests have completed.

Embedded Databases in Desktop Applications

In this usage, the amount of data change is often limited and there is often a requirement to persist the data immediately. You can use the property write_delay=false to force a disk sync after each commit. Before the application is closed, you should perform the SHUTDOWN command to ensure the database is opened instantly when it is next opened.

Embedded Databases in Server Applications

This usage involves a server application, such as a web application, connecting to an embedded HyperSQL instance. In this usage, the database is often accessed heavily, therefore performance and latency is a consideration. If the database is updated heavily, the default value of the WRITE DELAY property (1 sec) is often enough, as it is assumed the server or the application does not go down frequently. If it is necessary, you can reduce the WRITE DELAY to a small value (20 ms) without impacting the update speed. If you reduce WRITE DELAY to zero, performance drops to the speed of disk file sync operation.

Alternatively, a server application can use an all-in-mem database instance for fast access, while sending the data changes to a persistent, disk based instance either periodically or in real time.

Embedding a Database Listener

Since you won't be able to access in-process database instances from other processes, you will often want to run a Listener in your server applications with embedded databases. You can do this by starting up a Server or WebServer instance programmatically, but you could also use the class org.hsqldb.util.MainInvoker to start up your application and a Server or WebServer without any programming.

Example 11.1. MainInvoker Example

  java -cp path/to/your/app.jar:path/to/hsqldb.jar your.App "" org.hsqldb.server.Server

(Use ; instead of : to delimit classpath elements on Windows). Specify the same in-process JDBC URL to your app and in the server.properties file. You can then connect to the database from outside using a JDBC URL like jdbc:hsqldb:hsql://hostname.

This tactic can be used to run off-the-shelf server applications with an embedded HyperSQL Server, without doing any coding.

MainInvoker can be used to run any number of Java class main method invocations in a single JVM. See the API spec for MainInvoker for details on its usage.

Using HyperSQL Without Logging

All file database that are not readonly, write changes to the .log file. There are scenarios where writing to the .log file can be turned off to improve performance, especially with larger databases. For these applications you can set the property hsqldb.log_data=false to disable the recovery log and speed up data change performance. The equivalent SQL command is SET FILES LOG FALSE.

With this setting, no data is logged, but all the changes to cached tables are written to the .data file. To persist all the data changes up to date, you can use the CHECKPOINT command. If you perform SHUTDOWN, the data is also persisted correctly. If you do not use CHECKPOINT or SHUTDOWN. All the changes are lost and the database reverts to its original state when it is opened.

Your server applications can use a database as a temporary disk data cache which is not persisted past the lifetime of the application. For this usage, delete the database files when the application ends.

On some platforms, such as embedded devices which are reliable, this is also a useful option. Your application issues CHECKPOINT to save the changes made so far. This method of use reduces write operations on SSD devices. For this usage, the lock file should also be disabled with the connection property hsqldb.lock_file=false.

Server Databases

Running databases in a HyperSQL server is the best overall method of access. As the JVM process is separate from the application, this method is the most reliable as well as the most accessible method of running databases.

Upgrading Databases

Any database that is not produced with the release version of HyperSQL 2.0 must be upgraded to this version. Most catalogs created with 1.8.x can be upgraded simply by opening with HyperSQL 2. When this is not possible due to errors, the rest of the procedures below should be followed.

Once a database is upgraded to 2.0, it can no longer be used with previous versions of HyperSQL.

If your database has been created with version 1.7.x, first upgrade to version 1.8.1 and perform a SHUTDOWN COMPACT with this version. You can then open and upgrade the database with version 2.0.

Upgrading From Older Versions

To upgrade from version 1.8.x with the default TEXT format script files, simply open the database with 2.0. If the version 1.8.x files have database script format set to BINARY or COMPRESSED (ZIPPED) you must issue the SET SCRIPTFORMAT TEXT and SHUTDOWN SCRIPT commands with the old version, then open with the new version of the engine. In most cases the upgrade is successful and complete.

It is strongly recommended to execute SHUTDOWN COMPACT after an automatic upgrade from previous versions.

If your database has been created with version 1.7.2 or 1.7.3, first upgrade to version 1.8.1 and perform a SHUTDOWN COMPACT with this version. You can then upgrade the database to version 2.0.

To upgrade from older version database files (1.7.1 and older) that contain CACHED tables, use the SCRIPT procedure below. In all versions of HyperSQL, the SCRIPT 'filename' command (used as an SQL statement) allows you to save a full record of your database, including database object definitions and data, to a file of your choice. You can export a script file using the old version of the database engine and open the script as a database with 2.0.

Procedure 11.1. Upgrade Using the SCRIPT Procedure for Very Old Versions

  1. Open the original database in the old version of DatabaseManager

  2. Issue the SCRIPT command, for example SCRIPT 'newversion.script' to create a script file containing a copy of the database.

  3. SHUTDOWN this database.

  4. Copy the original *.properties file into newversion.properties in the same directory as newversion.script

  5. Try to open the new database newversion using DatabaseManager of version 1.8.1.

  6. If there is any inconsistency in the data, the script line number is reported on the console and the opening process is aborted. Edit and correct any problems in the newversion.script before attempting to open again. Use the guidelines in the next section (Manual Changes to the .script File). Use a programming editor that is capable of handling very large files and does not wrap long lines of text.

Manual Changes to the *.script File

In HyperSQL 2.0 the full range of ALTER TABLE commands is available to change the data structures and their names. However, if an old database cannot be opened due to data inconsistencies, or it uses index or column names that are not compatible with 2.0, manual editing of the *.script file can be performed.

  • Version 2.0 does not accept duplicate names for indexes that were allowed before 1.7.2.

  • Version 2.0 does not accept some table or column names that are SQL reserved keywords without double quoting.

  • Version 2.0 is more strict with check conditions and default values.

Other manual changes are also possible. Note that the *.script file must be the result of a SHUTDOWN SCRIPT and must contain the full data for the database. The following changes can be applied so long as they do not affect the integrity of existing data.

  • Names of tables, columns and indexes can be changed. These changes must be consistent regarding foreign key constraints.

  • CHECK

    A check constraint can always be removed.

  • NOT NULL

    A not-null constraint can always be removed.

  • PRIMARY KEY

    A primary key constraint can be removed. It cannot be removed if there is a foreign key referencing the column(s).

  • UNIQUE

    A UNIQUE constraint can be removed if there is no foreign key referencing the column(s).

  • FOREIGN KEY

    A FOREIGN KEY constraint can always be removed.

  • COLUMN TYPES

    Some changes to column types are possible. For example an INTEGER column can be changed to BIGINT.

After completing the changes and saving the modified .script file, you can open the database as normal.

Backward Compatibility Issues

HyperSQL 2.0 conforms to the SQL Standard better than previous versions and supports more features. For these reasons, there may be some compatibility issues when converting old database, or using applications that were written for version 1.8.x or earlier. Some of the potential issues are listed here.

  • User names and passwords are case-sensitive. Check the .script file of a database for the correct case of user name and password and use this form in the connection properties or on connection URL.

  • Check constraints must conform to the SQL Standard. A check constraint is rejected if it is not deterministic or retrospectively deterministic. When opening an old database, HyperSQL silently drops check constraints that no longer compile. See under check constraints for more detail about what is not allowed.

  • Type declarations in column definition and in cast expressions must have the necessary size parameters.

  • In connection with the above, an old database that did not have the enforce_strict_size property, is now converted to version 2.0 with the engine supplying the missing size parameters. For example, a VARCHAR column declaration that has no size, is given a 32K size. Check these sizes are adequate for your use, and change the column definition as necessary.

  • Column names in a GROUP BY clause were previously resolved to the column label. They are now resolved to column name first, and if the name does not match, to the column label.

  • If two or more tables in a join contain columns with the same name, the columns cannot be referenced in join and where conditions. Use table names before column names to qualify the references to such columns.

  • Table definitions containing GENERATED BY DEFAULT AS IDENTITY but with no PRIMARY KEY do not automatically create a primary key. Database .script files made with 1.8 are fine, as the PRIMARY KEY clause is always included. But your application program may assume an automatic primary key is created.

  • CREATE ALIAS is now obsolete. Use the new function definition syntax. The org.hsqldb.Library class no longer exists. You should use the SQL form of the old library functions. For example, use LOG(x) rather than the direct form, "org.hsqldb.Library.log"(x).

  • The names of some commands for changing database and session properties have changed. See the list of statements in this chapter.

Backing Up Database Catalogs

The database engine saves the files containing all the data in a file catalog when a shutdown takes place. It automatically recovers from an abnormal termination and preserves the data when the catalog is opened next time. In an ideal operating environment, where there is no OS crash, disk failure, bugs in code, etc. there would be no need regularly to backup a database. This is meant to say, the engine performs the routine shutdown procedure internally, therefore backing up catalogs is an insurance policy against all sorts of misadventure that are not under the control of the database engine.

The data for each catalog consists of up to 5 files in the same directory with the endings such as *.properties, *.script, etc., as detailed in previous chapters.

HyperSQL 2.0 includes commands to backup the database files into a single .tar or .tar.gz file archive. The backup can be performed by a command given in a JDBC session if the target database catalog is running, or on the command-line if the target catalog has been shutdown.

Making Online Backups

To back up a running catalog, obtain a JDBC connection and issue a BACKUP DATABASE command in SQL. In its most simple form, the command format below will backup the database as a single .tar.gz file to the given directory.

  BACKUP DATABASE TO <directory name> BLOCKING

See the next section under Statements for details about the command and its options. See the sections below about restoring a backup.

Making Offline Backups

To back up an offline catalog, the catalog must be in shut down state. You will run a Java command like this

Example 11.2. Offline Backup Example

  java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --save  \
  tar/path.tar db/base/path

where tar/path.tar is a file path to the *.tar or *.tar.gz file to be created, and db/base/path is the file path to the catalog file base name (in same fashion as in server.database.* settings and JDBC URLs with catalog type file:.

Examining Backups

You can list the contents of backup tar files with DbBackup on your operating system command line, or with any Pax-compliant tar or pax client (this includes GNU tar),

Example 11.3. Listing a Backup with DbBackup

  java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --list tar/path.tar

You can also give regular expressions at the end of the command line if you are only interested in some of the file entries in the backup. Note that these are real regular expressions, not shell globbing patterns, so you would use .+script to match entries ending in "script", not *script.

You can examine the contents of the backup in their entirety by restoring the backup, as explained in the following section, to a temporary directory.

Restoring a Backup

You use DbBackup on your operating system command line to restore a catalog from a backup.

Example 11.4. Restoring a Backup with DbBackup

  java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --extract  \
      tar/path.tar db/dir

where tar/path.tar is a file path to the *.tar or *.tar.gz file to be read, and db/dir is the target directory to extract the catalog files into. Note that db/dir specifies a directory path, without the catalog file base name. The files will be created with the names stored in the tar file (and which you can see as described in the preceding section).

Encrypted Databases

HyperSQL supports encrypted databases. Encryption services use the Java Cryptography Extensions (JCE) and uses the ciphers installed with the JRE. HyperSQL itself does not contain any cryptography code.

Three elements are involved in specifying the encryption method and key. A cipher, together with its configuration is identified by a string which includes the name of the cipher and optional parameters. A provider is the fully qualified class name of the cipher provider. A key is represented as a hexadecimal string.

Creating and Accessing an Encrypted Database

First, a key must be created for the desired cipher and configuration. This is done by calling the function CRYPT_KEY(<cipher spec>, <provider>). If the default provider (the built-in JVM ciphers) is used, then NULL should be specified as the provider. The CRYPT_KEY function returns a hexadecimal key. The function call can be made in any HyperSQL database, so long as the provider class is on the classpath. This key can be used to create a new encrypted database. Calls to this function always return different keys, based on a generated random values.

As an example, a call to CRYPT_KEY('Blowfish', null) returned the string, '604a6105889da65326bf35790a923932'. To create a new database, the URL below is used:

jdbc:hsqldb:file:<database path>;crypt_key=604a6105889da65326bf35790a923932;crypt_type=blowfish

The third property name is crypt_provider. This is specified only when the provider is not the default provider.

HyperSQL works with any symmetric cipher that may be available from the JVM.

The files that are encrypted include the .script, .data, .backup and .log files. The .lobs file is not encrypted by default. The property crypt_lobs=true must be specified to encrypt the .lobs file.

Speed Considerations

General operations on an encrypted database are performed the same as with any database. However, some operations are significantly slower than with the equivalent cleartext database. With MEMORY tables, there is no difference to the speed of SELECT statements, but data change statements are slower. With CACHED tables, the speed of all statements is slower.

Security Considerations

Security considerations for encrypted databases have been discussed at length in HSQLDB discussion groups. Development team members have commented that encryption is not a panacea for all security needs. The following issues should be taken into account:

  • Encrypted files are relatively safe in transport, but because databases contain many repeated values and words, especially known tokens such as CREATE, INSERT, etc., breaking the encryption of a database may be simpler than an unknown file.

  • Only the files are encrypted, not the memory image. Poking into computer memory, while the database is open, will expose the contents of the database.

  • HyperSQL is open source. Someone who has the key, can compile and use a modified version of the program that saves a full cleartext dump of an encrypted database

Therefore encryption is generally effective only when the users who have access to the crypt key are trusted.

Monitoring Database Operations

Database operations can be monitored at different levels using internal HyperSQL capabilities or add-ons.

Statement Level Monitoring

Statement level monitoring allows you to gather statistics about executed statements. HyperSQL is supported by the monitoring tool JAMon (Java Application Monitor). JAMon is currently developed as the SourceForge project, jamonapi.

JAMon works at the JDBC level. It can monitor and gather statistics on different types of executed statements or other JDBC calls.

Early versions of JAMon were developed with HSQLDB and had to be integrated into HSQLDB at code level. The latest versions can be added on as a proxy in a much simpler fashion.

Internal Event Monitoring

HyperSQL can log important internal events of the engine. These events occur during the operation of the engine, and are not always coupled with the exact type of statement being executed. Normal events such as opening and closing of files, or errors such as OutOfMemory conditions are examples of logged events.

HyperSQL supports two methods of logging. One method is specific to the individual database and is managed internally by HyperSQL. The other method is specific to JVM and is managed by a logging framework.

The internally-generated, individual log for the database can be enabled with the SET DATABASE EVENT LOG LEVEL statement, described in the next section. This method of logging is very useful for desktop application deployment, as it provides an ongoing record of database operations.

HyperSQL also supports log4J and JDK logging. The same event information that is passed to the internal log, is passed to external logging frameworks. These frameworks are configured outside HyperSQL. The log messages include the unique id of the database that generated the message, so it can be identified in a multi-database server context.

Server Operation Monitoring

A Server or WebServer instance can be started with the property server.silent=false. This causes all the connections and their executed statements to be printed to stdout as the statements are submitted to the server.

Statements

System level statements are listed in this section. Statements that begin with SET DATABASE or SET FILES are for properties that have an effect on the normal operation of HyperSQL. The effects of these statements are also discussed in different chapters.

SHUTDOWN

shutdown statement

<shutdown statement> ::= SHUTDOWN [IMMEDIATELY | COMPACT | SCRIPT]

Shutdown the database. If the optional qualifier is not used, a normal SHUTDOWN is performed. A normal SHUTDOWN ensures all data is saved correctly and the database opens without delay on next use.

SHUTDOWN IMMEDIATELY

Saves the *.log file and closes the database files. This is the quickest form of shutdown. This command should not be used as the routine method of closing the database, because when the database is accessed next time, it may take a long time to start.

SHUTDOWN COMPACT

This is similar to normal SHUTDOWN, but reduces the *.data file to its minimum size. It takes longer than normal SHUTDOWN.

SHUTDOWN SCRIPT

This is similar to SHUTDOWN COMPACT, but it does not rewrite the *.data and text table files. After SHUTDOWN SCRIPT, only the *.script and *.properties files remain. At the next startup, these files are processed and the *.data and *.backup files are created. This command in effect performs part of the job of SHUTDOWN COMPACT, leaving the other part to be performed automatically at the next startup.

This command produces a full script of the database which can be edited for special purposes prior to the next startup.

Only a user with the DBA role can execute this statement.

BACKUP DATABASE

backup database statement

<backup database statement> ::= BACKUP DATABASE TO <file path> {SCRIPT | [NOT] COMPRESSED} BLOCKING

Backup the database to specified <file path> for archiving purposes.

The <file path> can be in two forms. If the <file path> ends with a forward slash, it specifies a directory. In this case, an automatic name for the archive is generated that includes the date, time and the base name of the database. The database is backed up to this archive file in the specified directory. If the <file path> does not end with a forward slash, it specifies a user-defined file name for the backup archive. The archive is in tar, gzip format depending on whether it is compressed or not.

The SCRIPT option is not currently supported. If SCRIPT is specified, the backup will consist of two files, a *.properties file and a *.script file, which contain all the data and settings of the database. These files are not compressed.

If COMPRESSED or NOT COMPRESSED is specified, the backup consists of the current snapshot of database files. During backup, a CHECKPOINT command is silently executed.

The qualifier, BLOCKING, means all database operations are suspended during backup.

The HyperSQL jar also contains a program that creates an archive of an offline database. It also contains a program to expand an archive into database files. These programs are documented in this chapter under Backing up Database Catalogs.

Only a user with the DBA role can execute this statement.

CHECKPOINT

checkpoint statement

<checkpoint statement> ::= CHECKPOINT [DEFRAG]

Closes the database files, rewrites the script file, deletes the log file and opens the database. If DEFRAG is specified, also shrinks the *.data file to its minumum size. Only a user with the DBA role can execute this statement.

Only a user with the DBA role can execute this statement.

CRYPT_KEY

crypt_key function

<crypt_key function> ::= CRYPT_KEY ( <cipher spec>, <provider> )

The statement, CALL CRYPT_KEY( <cipher spec>, <provider> ) returns a binary string representing a valid key for the giver cipher and provider. The <provider> argument is specified as NULL for the default provider.

SCRIPT

script statement

<script statement> ::= SCRIPT [<file name>]

Returns a script containing SQL statements that define the database, its users, and its schema objects. If <file name> is not specified, the statements are returned in a ResultSet, with each row containing an SQL statement. No data statements are included in this form. The optional file name is a single-quoted string. If <file name> is specified, then the script is written to the named file. In this case, all the data in all tables of the database is included in the script as INSERT statements.

Only a user with the DBA role can execute this statement.

SET DATABASE COLLATION

set database collation statement

<set database collation statement> ::= SET DATABASE COLLATION <collation name>

Each database can have its own collation. Sets the collation from the set of collations supported by HyperSQL. Once this command has been issued, the database can be opened in any JVM and will retain its collation. Only a user with the DBA role can execute this statement.

Only a user with the DBA role can execute this statement.

SET DATABASE DEFAULT TABLE TYPE

set database default table type statement

<set database default table type> ::= SET DATABASE DEFAULT TABLE TYPE { CACHED | MEMORY }

Sets the type of table created when the next CREATE TABLE statement is executed. The default is MEMORY.

Only a user with the DBA role can execute this statement.

SET DATABASE DEFAULT RESULT MEMORY ROWS

set database default result memory rows statement

<set database default result memory rows> ::= SET DATABASE DEFAULT RESULT MEMORY ROWS <unsigned integer literal>

Sets the maximum number of rows of each result set and other internal temporary table that is held in memory. This setting applies to all sessions. Individual sessions can change the value with the SET SESSION RESULT MEMORY ROWS command. The default is 0, meaning all result sets are held in memory.

Only a user with the DBA role can execute this statement.

SET DATABASE EVENT LOG LEVEL

set database event log level statement*

<set database event log level> ::= SET DATABASE EVENT LOG LEVEL { 0 | 1 | 2 }

Sets the amount of information logged in the internal, database-specific event log. Level 0 means no log. Level 1 means only important (error) events. Level 2 means more events, including both important and less important (normal) events. For readonly and mem: databases, if the level is set above 0, the log messages are directed to stderr.

Only a user with the DBA role can execute this statement.

SET DATABASE GC

set database gc statement

<set database gc statement> ::= SET DATABASE GC <unsigned integer literal>

An optional property which forces calls to System.gc() after the specified number of row operations. The default value for this property is 0, which means no System.gc() calls. Usual values for this property range from 10000 depending on the system and the memory allocation. This property may be useful in some in-process deployments, especially with older JVM implementations.

Only a user with the DBA role can execute this statement.

SET DATABASE SQL SIZE

set database sql size statement

<set database sql size statement> ::= SET DATABASE SQL SIZE { TRUE | FALSE }

Enable or disable enforcement of column sizes for CHAR and VARCHAR columns. The default is TRUE, meaning table definition must contain VARCHAR(n) instead of VARCHAR.

Only a user with the DBA role can execute this statement.

SET DATABASE SQL NAMES

set database sql names statement

<set database sql names statement> ::= SET DATABASE SQL NAMES { TRUE | FALSE }

Enable or disable full enforcement of the rule that prevents SQL keywords being used for database object names such as columns and tables. The default is FALSE, meaning disable.

Only a user with the DBA role can execute this statement.

SET DATABASE SQL REFERENCES

set database sql references statement

<set database sql references statement> ::= SET DATABASE SQL REFERENCES { TRUE | FALSE }

This command can enable or disable full enforcement of the rule that prevents ambiguous column references in SQL statements (usually SELECT statements). A column reference is ambiguous when it is not qualified by a table name or table alias and can refer to more than one column in a JOIN list.

The property is FALSE by default. It is better to enable this check while development, to improve the quality and correctness of SQL statements.

Only a user with the DBA role can execute this statement.

SET DATABASE REFERENTIAL INTEGRITY

set database referential integrity statement

<set database referential integrity statement> ::= SET DATABASE REFERENTIAL INTEGRITY { TRUE | FALSE }

This command enables or disables the enforcement of referential integrity constraints (foreign key constraints), check constraints apart from NOT NULL and triggers. By default, referential integrity constraints are checked.

The only legitimate use of this statement is before importing large amounts of external data into tables that have existing FOREIGN KEY constraints. After import, the statement must be used again to enable constraint enforcement.

If you are not sure the data conforms to the constraints, run queries to verify all rows conform to the FOREIGN KEY constraints and take appropriate actions for the rows that do not conform.

A query example to return the rows in a foreign key table that have no parent is given below:

Example 11.5. Finding foreign key rows with no parents after a bulk import

  SELECT * FROM foreign_key_table LEFT OUTER JOIN primary_key_table 
    ON foreign_key_table.fk_col = primary_key_table.pk_col WHERE primary_key_table.pk_col IS NULL

Only a user with the DBA role can execute this statement.

SET DATABASE UNIQUE NAME

set database unique name

<set database unique name statement> ::= SET DATABASE UNIQUE NAME <identifier>

Each HyperSQL catalog (database) has an engine-generated internal name. This name is based on the time of creation of the database and is exactly 16 characters. The name is used for in log events sent to external logging frameworks. This name can be changed by an administrator. The new name must be exactly 16 characters long.

SET DATABASE TRANSACTION CONTROL

set database transaction control statement

<set database transaction control statement> ::= SET DATABASE TRANSACTION CONTROL { LOCKS | MVLOCKS | MVCC }

Set the concurrency control system for the database. It can be issued only when all sessions have been committed or rolled back. This command and its modes is discussed in the Sessions and Transactions chapter.

Only a user with the DBA role can execute this statement.

SET FILES BACKUP INCREMENT

set files backup increment statement

<set database backup increment statement> ::= SET FILES BACKUP INCREMENT { TRUE | FALSE }

Older versions of HSQLDB perform a backup of the .data file before its contents are modified and the whole .data file is saved in a compressed form when a CHECKPOINT or SHUTDOWN is performed. This takes a long time when the size of the database exceeds 100 MB or so (on an average 2010 computer, you can expect a backup speed of 20MB / s or more).

The alternative is backup in increments, just before some part of the .data file is modified. In this mode, no backup is performed at CHECKPIONT or SHUTDOWN. This mode is preferred for large databases which are opened and closed frequently.

The default mode is TRUE. If the old method of backup is preferred, the mode can be set FALSE.

Only a user with the DBA role can execute this statement.

SET FILES CACHE ROWS

set files cache rows statement

<set files cache rows statement> ::= SET FILES CACHE ROWS <unsigned integer literal>

Sets the maximum number of rows (of CACHED tables) held in the memory cache.

Only a user with the DBA role can execute this statement.

SET FILES CACHE SIZE

set files cache size statement

<set files cache size statement> ::= SET FILES CACHE SIZE <unsigned integer literal>

Sets maximum amount of data (of CACHED tables) in kilobytes held in the memory cache.

Only a user with the DBA role can execute this statement.

SET FILES DEFRAG

set files defrag statement

<set files defrag statement> ::= SET FILES DEFRAG <unsigned integer literal>

Sets the threshold for performing a DEFRAG during a checkpoint. The <unsigned integer literal> is the percentage of abandoned space in the *.data file. When a CHECKPOINT is performed either as a result of the .log file reaching the limit set by SET FILES LOG SIZE m, or by the user issuing a CHECKPOINT command, the amount of space abandoned since the database was opened is checked and if it is larger than specified percentage, a CHECKPOINT DEFRAG is performed instead of a CHECKPOINT.

The default is 0, which indicates no DEFRAG. Useful values are between 10 to 50

Only a user with the DBA role can execute this statement.

SET FILES LOG

set files log statement

<set files log statement> ::= SET FILES LOG { TRUE | FALSE }

Sets logging of database operations on or off. Turning logging off is for special usage, such as temporary cache usage.

Only a user with the DBA role can execute this statement.

SET FILES LOG SIZE

set files log size statement

<set files log size statement> ::= SET FILES LOG SIZE <unsigned integer literal>

Sets the maximum size in MB of the *.log file to the specified value. The default maximum size is 50 MB. If the value is zero, no limit is used for the size of the file. When the size of the file reaches this value, a CHECKPOINT is performed and the the *.log file is cleared to size 0.

Only a user with the DBA role can execute this statement.

SET FILES BACKUP INCREMENT

set files backup increment statement

<set files increment backup statement> ::= SET FILES INCREMENT BACKUP { TRUE | FALSE }

This specifies the method for internal backup operation. The default is true.

During updates, the contents of the .data file is modified. When this property is true, the modified contents are backed up gradually. This causes a marginal slowdown in operations, but allows fast checkpoint and shutdown with large .data files.

When the property is false, the .data file is backed up entirely at the time of checkpoint and shutdown. Up to version 1.8.0, HSQLDB supported only full backup. Version 1.8.1 supports incremental backup.

Only a user with the DBA role can execute this statement.

SET FILES NIO

set files nio

<set files nio statement> ::= SET FILES NIO { TRUE | FALSE }

Changes the access method of the .data file. The default is TRUE and uses the Java nio classes to access the file.

Only a user with the DBA role can execute this statement.

SET FILES WRITE DELAY

set files write delay statement

<set files write delay statement> ::= SET FILES WRITE DELAY {{ TRUE | FALSE } | <seconds value> | <milliseconds value> MILLIS}

Set the WRITE DELAY property of the database. The WRITE DELAY controls the frequency of file sync for the log file. When WRITE_DELAY is set to FALSE or 0, the sync takes place immediately at each COMMIT. WRITE DELAY TRUE performs the sync once every 10 seconds (which is the default). A numeric value can be specified instead.

The purpose of this command is to control the amount of data loss in case of a total system crash. A delay of 1 second means at most the data written to disk during the last second before the crash is lost. All data written prior to this has been synced and should be recoverable.

A write delay of 0 impacts performance in high load situations, as the engine has to wait for the file system to catch up.

To avoid this, you can set write delay down to 10 milliseconds.

Each time the SET FILES WRITE DELAY statement is executed with any value, a sync is immediately performed. Only a user with the DBA role can execute this statement.

Only a user with the DBA role can execute this statement.

SET FILES SCALE

set files scale

<set files scale statement> ::= SET FILES SCALE <scale value>

Changes the scale factor for the .data file. The default scale is 8 and allows 16GB of data storage capacity. The scale can be increased in order to increase the maximum data storage capacity. The scale values 8, 16, 32, 64 and 128 are allowed. Scale value 128 allows a maximum capacity of 256GB.

This command can be used only when there is no data in CACHED tables.

Only a user with the DBA role can execute this statement.

SET FILES LOB SCALE

set files lob scale

<set files lob scale statement> ::= SET FILES LOB SCALE <scale value>

Changes the scale factor for the .lobs file. The scale is interpreted in kilobytes. The default scale is 32 and allows 64TB of lob data storage capacity. The scale can be reduced in order to improve storage efficiency. If the lobs are a lot smaller than 32 kilobytes, reducing the scale will reduce wasted space. The scale values 1, 2, 4, 8, 16, 32 are allowed. For example if the average size of lobs is 4 kilobytes, the default scale of 32 will result in 28KB wasted space for each lob. Reducing the lob scale to 2 will result in average 1KB wasted space for each lob.

This command can be used only when there is no lob in the database.

Only a user with the DBA role can execute this statement.


$Revision: 3601 $