Specification of QDBM Version 1

Copyright (C) 2000-2003 Mikio Hirabayashi
Last Update: Sat, 08 Feb 2003 20:41:35 +0900

Table of Contents

  1. Overview
  2. Features
  3. Installation
  4. Basic API
  5. Extended API
  6. Copying

(I'm not good at English. So please proofread this document and point out mistakes to me.)


Overview

QDBM is a library of routines for managing database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. A key must be unique within a database, so it is impossible to store two or more records with a key overlaps.

The following access methods are provided to the database: storing a record with a key and a value, deleting a record by a key, retrieving a record by a key. Moreover, traversal access to every key are provided, but the order is arbitrary. These access methods are similar to ones of DBM (or its compatibles, NDBM and GDBM) library defined in the UNIX standard. QDBM is an alternative for DBM because of its higher performance.


Features

QDBM is developed referring to GDBM for the purpose of the following three points: higher processing speed, smaller size of a database file, and simpler API. The features of QDBM are just having attained them. Moreover, the following three restrictions of DBM: a process can handle only one database, the size of a key and a value is bounded, a database file is sparse, are cleared.

QDBM uses hash algorithm to retrieve records. If a bucket array has sufficient number of elements, the time complexity of retrieval is `O(1)'. That is, time required for retrieving a record is constant, regardless of the scale of the database. It is also the same about storing and deleting. Collision of hash values is managed by separate chain method. Data structure of the chain is binary search tree. Even if a bucket array has unusually scarce elements, the time complexity of retrieval is `O(n log n)'.

QDBM attains improvement in retrieval by loading whole of a bucket array onto RAM. If a bucket array is on RAM, it is possible to access region of a target record by about one pass of file operations. A bucket array saved in a file is not read into RAM with `read' call but directly mapped to RAM with `mmap' call. Therefore, the preparation time on connecting to a database is very short, and two or more processes can share the same memory map.

If number of elements of a bucket array is half of records stored within a database, although it depends on characteristic of the input, the rate of collision of hash values are about 50% (40% if the same, 20% if twice, 10% if four times, 3% if eight times). In such case, it is possible to retrieve a record by two or less passes of file operations. If it is made into a performance index, in order to handle a database containing one million of records, a bucket array with a half million of elements is needed. The size of each element is 4 bytes. That is, if 2M bytes of RAM is available, a database containing one million records can be handled.

QDBM has two kinds of methods to connect to a database. A `reader' can perform retrieving but neither storing nor deleting. A `writer' can perform all access methods. Exclusion control between processes is performed when connecting to a database by file locking. While a writer is connecting to a database, neither readers nor writers can not connect. While a reader is connecting to a database, other readers can connect, but neither writer can. According to this mechanism, the adjustment of simultaneous connection to a database is guaranteed in multitasking environment. However, it does not correspond in multithread environment.

DBM has two modes of storing operation, `insert' and `replace'. In case a key overlaps an existing record, the insert mode keeps the existing value, while the replace mode transposes it to the specified value. In addition to the two modes, QDBM has `concatenate' mode. In the mode, the specified value is concatenated at the end of the existing value and stored. This feature is beneficial when adding a element to a value as an array. Moreover, although DBM has a method to fetch out a value from a database only by reading a whole region of a record, QDBM has a method to fetch out a part of a value. When a value is treated as an array, this feature is also beneficial.

If data alignment is assigned to a database, each record will place in a file with vacating suitable padding bytes. When it is going to overwrite a value of larger size than the size of the existing value or to concatenate a specified value to a existing value, if increasing size is settled in size of the padding, it is not necessary to move the region of the existing record to another position. Although the processing which moves the region of a record requires amount of calculation according to size of the region, the performance of updating database to the scale of stored records is kept constant by taking padding in size according to size of each record.

QDBM has two kinds of API, basic API and extended API. In the former, a database is treated as a file. In the latter, a database is treated as a directory containing one or more database files. If it is going to store all the data of a database in one file, file size may exceed restriction of a file system. In extended API, a database is divided into two or more files in a directory. Since basic API and extended API are resembled mutually, it is easy to porting an application between each API.


Installation

To install QDBM, `gcc' and `make' are required.

When an archive file of QDBM is extracted, change the current working directory to the generated directory and perform installation. First, execute the following command, and the environment of building programs are set up.

./configure

Build programs.

make

Perform self-diagnostic test.

make check

Install programs. This operation must be carried out by the root user.

make install

When a series of work finishes, header files, `depot.h' and `curia.h' will be installed in `/usr/local/include', libraries, `libqdbm.a' and `libqdbm.so' will be installed in `/usr/local/lib', executable commands, `dpm', `dptest', `dptsv', `crm' and `crtest' will be installed in `/usr/local/bin'.

To uninstall QDBM, execute the following command after `./configure'. This operation must be carried out by the root user.

make uninstall

Basic API

Depot is basic API of QDBM. In order to use Depot, you should include `depot.h' in source code. Usually, the following description will be near the beginning of a source file.

#include <depot.h>

A pointer to DEPOT is used as a database handle. It is like that some file I/O routines of `stdio.h' use a pointer to FILE. A database handle is opened with the function `dpopen' and closed with `dpclose'. You should not refer directly to any member of a handle. If a fatal error occurs in a database, any access method via the handle except `dpclose' will not work and return error status.

You can assign a file descriptor for debugging information to the external variable `dpdbgfd'. The initial value is -1. If the value is positive, some debugging information is output into the file descriptor.

extern int dpdbgfd;

A status of the last happened error is assigned to the external variable `dpecode'. The initial value is is DP_ENOERR. Refer to `depot.h' for details of the type DPECODE.

extern DPECODE dpecode;

You can use the function `dperrmsg' in order to obtain a message string corresponding to an error code.

const char *dperrmsg(DPECODE ecode);

`ecode' specifies an error code. The return value is a message string of an error message. Region of the return value is not writable.

You can use the function `dpopen' in order to obtain a database handle. Size of a bucket array of a database is determined on creating, and is not to be changed except for by optimization of the database. Suggested size of a bucket array is about from 0.5 to 4 times of number of all records to store. While connecting as a writer, an exclusive lock is invoked to the database. While connecting as a reader, a shared lock is invoked to the database. Control blocks until the lock is achieved.

DEPOT *dpopen(const char *name, DPOMODE omode, int bnum);

`name' specifies a name of a database file. `omode' specifies a connection mode: DP_OWRITER as a writer, DP_OREADER as a reader. If the mode is DP_WRITER, the following may be added by bitwise or: DP_OCREAT, which means it creates a new database if not exist, DP_OTRUNC, which means it creates a new database regardless if one exists. `bnum' specifies number of elements of a bucket array. If it is not more than 0, the default value is specified. The return value is a database handle. If the return value is NULL, it is not successful.

Every connected handle is to be closed with the function `dpclose'. Contents of updating of a database is synchronized with a file when closing connection. When a writer opens a database, a database will be destroyed if it does not close appropriately.

int dpclose(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is true, else, it is false. Since the region of the closed handle is released, it becomes impossible to use the handle.

You can use the function `dpput' in order to store a recored.

int dpput(DEPOT *depot, const char *kbuf, int ksiz, const char *vbuf, int vsiz, DPDMODE dmode);

`depot' specifies a database handle connected as a writer. `kbuf' specifies a pointer to the bytes of a key. `ksiz' specifies size of the region of a key. If `ksiz' is negative, the size assigned with `strlen(kbuf)'. `vbuf' specifies a pointer to the bytes of a value. `vsiz' specifies size of the region of the value. If `vsiz' is negative, the size assigned with `strlen(vbuf)'. `dmode' specifies behavior when the key overlaps, by the following values: DP_DOVER, which means a specified value overwrites an existing one, DP_DKEEP, which means an existing value is kept, DP_DCAT, which means a the specified value is concatenated at the end of the existing value and stored. If successful, the return value is true, else, it is false.

You can use the function `dpout' in order to delete a recored.

int dpout(DEPOT *depot, const char *kbuf, int ksiz);

`depot' specifies a database handle connected as a writer. `kbuf' specifies a pointer to the bytes of a key. `ksiz' specifies size of the region of a key. If `ksiz' is negative, the size assigned with `strlen(kbuf)'. If successful, the return value is true, else, it is false. false is returned when no record is correspond to the specified key.

You can use the function `dpget' in order to retrieve a recored. The region of the return value is allocated with `malloc' call, so is to be released with `free' call.

char *dpget(DEPOT *depot, const char *kbuf, int ksiz, int start, int max, int *sp);

`depot' specifies a database handle. `kbuf' specifies a pointer to the bytes of a key. `ksiz' specifies size of the region of a key. If `ksiz' is negative, the size assigned with `strlen(kbuf)'. `start' specifies an offset address of the beginning of a value's region to be read. `max' specifies max length of bytes to be read. if `max' is negative, length to read is unlimited. `sp' specifies a pointer to a valiable to which length of the region of the return value. If `sp' is NULL, it is not used. If successful, the return value is a pointer to the region of the value of the corresponding record, else, it is NULL. false is returned when no record is correspond to the specified key or length of the value of the record is less than `start'. Because an additional nil code is append at the end of the region of the return value, the return value can be treated as a character string.

You can use the function `dpvsiz' in order to know length of a value of a recored. The function is faster than `dpget'.

int dpvsiz(DEPOT *depot, const char *kbuf, int ksiz);

`depot' specifies a database handle. `kbuf' specifies a pointer to the bytes of a key. `ksiz' specifies size of the region of a key. If `ksiz' is negative, the size assigned with `strlen(kbuf)'. If successful, the return value is length of the value of the corresponding record, else, it is -1. -1 is returned when no record is correspond to the specified key.

You can use the function `dpiterinit' in order to initialize a iterator of a database handle for traversal access to every key in the database.

int dpiterinit(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is true, else, it is false.

You can use the function `dpiternext' in order to get the next key of a iterator. Although the order cannot be controlled, every record in a database can be referred to. The region of the return value is allocated with `malloc' call, so is to be released with `free' call.

char *dpiternext(DEPOT *depot, int *sp);

`depot' specifies a database handle. `sp' specifies a pointer to a variable to which length of the region of the return value. If `sp' is NULL, it is not used. If successful, the return value is a pointer to the region of the next key, else, it is NULL. NULL is returned when no record is to be get out of the iterator. Because an additional nil code is append at the end of the region of the return value, the return value can be treated as a character string.

You can use the function `dpsetalign' in order to set alignment of a database handle. If alignment is assigned in a database handle, processing speed of overwriting an existing record goes up. Basic size of alignment is suggested to be average length of values of records to be stored. Unit size of alignmnet is suggested to be about from 4 times to 16 times of basic size.

int dpsetalign(DEPOT *depot, int asiz, int aunit);

`depot' specifies a database handle connected as a writer. `asiz' specifies basic size of alignment. `aunit' specifies unit size of alignment. If successful, the return value is true, else, it is false. When a record is recording, size of the region reserved for its value is determined as multiple of alignment size. Alignment size is determined as multiple of basic size by size of specified value divided by unit size. Since an alignment size is not saved in a database, you should specify alignment with opening a database.

You can use the function `dpsync' in order to synchronize contents of updating with a file and device. The function is useful when another process uses a connected database file.

int dpsync(DEPOT *depot);

`depot' specifies a database handle connected as a writer. If successful, the return value is true, else, it is false.

You can use the function `dpoptimize' in order to optimize a database. In an alternating succession of storing in overwrite mode or concatenation mode, size of a database file comes big due to accumulated dispensable spaces. Such dispensable spaces are deleted by optimization.

int dpoptimize(DEPOT *depot, int bnum);

`depot' specifies a database handle connected as a writer. `bnum' specifies number of elements of a bucket array. If it is not more than 0, the default value is specified. A temporary file is created in process of optimization, and it replaces with the original file. Therefore, your cautions are required, when an original file has two or more links, or when some symbolic links are used as an original file.

You can use the function `dpname' in order to know a name of a database handle. The region of the return value is allocated with `malloc' call, so is to be released with `free' call.

char *dpname(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is a pointer to the region of the name of the database handle, else, it is NULL.

You can use the function `dpname' in order to know size of a database file.

int dpfsiz(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is size of the database file, else, it is -1.

You can use the function `dpbnum' in order to know number of elements which a bucket array of a database has.

int dpbnum(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is number of elements which the bucket array of the database has, else, it is -1.

You can use the function `dpbnum' in order to know number of used elements of a bucket array.

int dpbusenum(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is number of used elements of the bucket array, else, it is -1.

You can use the function `dpbnum' in order to know number of records stored in a database.

int dprnum(DEPOT *depot);

`depot' specifies a database handle. If successful, the return value is number of records stored in the database, else, it is -1.

The function `dpinnerhash' is a hash function used inside of Depot. The function is helpful to know which element a key is to be stored into.

int dpinnerhash(const char *kbuf, int ksiz);

`kbuf' specifies a pointer to the bytes of a key. `ksiz' specifies size of the region of a key. If `ksiz' is negative, the size assigned with `strlen(kbuf)'. The return value is a hash value of 31 bits length computed from the key.

The function `dpouterhash' is a hash function without reference to hash functions used inside of Depot. The function is helpful for hash algorithm used in an application of Depot.

int dpouterhash(const char *kbuf, int ksiz);

`kbuf' specifies a pointer to the bytes of a key. `ksiz' specifies size of the region of a key. If `ksiz' is negative, the size assigned with `strlen(kbuf)'. The return value is a hash value of 31 bits length computed from the key.

The function `dpprimenum' is used in order to obtain a prime number not less than a number. The function is helpful when an application determine size of a bucket array of its hash algorithm.

int dpprimenum(int num);

`num' specified a positive number. The return value is a prime number not less than the specified number.

For building a program using Depot, the program should be linked with a library file `libqdbm.a' or `libqdbm.so'. For example, the command below is executed to build `hoge' from `hoge.c'.

gcc -I/usr/local/include -L/usr/local/lib -o hoge hoge.c -lqdbm

Extended API

Curia is the extended API of QDBM. Curia provides routines for managing two or more database files in a directory. Restriction of some file systems that size of each file is limited is escaped by dividing a database file into two or more. Moreover, deployment of each database file on individual devices improves scalability.

In order to use Curia, you should include `depot.h' and `curia.h' in source code. Usually, the following description will be near the beginning of a source file.

#include <depot.h>
#include <curia.h>

Routines below is to be used as well as ones of Depot.

CURIA *cropen(const char *name, CROMODE omode, int bnum, int dnum);
int crclose(CURIA *curia);
int crput(CURIA *curia, const char *kbuf, int ksiz, const char *vbuf, int vsiz, CRDMODE dmode);
int crout(CURIA *curia, const char *kbuf, int ksiz);
char *crget(CURIA *curia, const char *kbuf, int ksiz, int start, int max, int *sp);
int crvsiz(CURIA *curia, const char *kbuf, int ksiz);
int criterinit(CURIA *curia);
char *criternext(CURIA *curia, int *sp);
int crsetalign(CURIA *curia, int asiz, int aunit);
int crsync(CURIA *curia);
int croptimize(CURIA *curia, int bnum);
char *crname(CURIA *curia);
int crfsiz(CURIA *curia);
int crbnum(CURIA *curia);
int crbusenum(CURIA *curia);
int crrnum(CURIA *curia);

Routines below is for managing large objects. Usual records and large objects have different namespaces for each other.

int crputlob(CURIA *curia, const char *kbuf, int ksiz, const char *vbuf, int vsiz, CRDMODE dmode);
int croutlob(CURIA *curia, const char *kbuf, int ksiz);
char *crgetlob(CURIA *curia, const char *kbuf, int ksiz, int start, int max, int *sp);
int crvsizlob(CURIA *curia, const char *kbuf, int ksiz);
int crrnumlob(CURIA *curia);

Method of building programs using API of Curia is completely the same as the case of basic API.


Copying

This program was written by Mikio Hirabayashi and distributed as a free software. You can redistribute it and/or modify it under the terms of the GNU General Public License Version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You may contact the author by e-mail to <mikio@24h.co.jp>. Any suggestion or bug report is welcome to the author.