Specification of QDBM Version 1
Copyright (C) 2000-2003 Mikio Hirabayashi
Last Update: Mon, 23 Jun 2003 08:13:31 +0900
Table of Contents
- Overview
- Features
- Installation
- Depot: Basic API
- Commands for Depot
- Curia: Extended API
- Commands for Curia
- Relic: NDBM-compatible API
- Commands for Relic
- Hovel: GDBM-compatible API
- Commands for Hovel
- Cabin: Utility API
- Commands for Cabin
- File Format
- Bugs
- FAQ
- Copying
QDBM is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Each key must be unique within a database, so it is impossible to store two or more records with a key overlaps.
The following access methods are provided to the database: storing a record with a key and a value, deleting a record by a key, retrieving a record by a key. Moreover, traversal access to every key are provided, although the order is arbitrary. These access methods are similar to ones of DBM (or its compatibles, NDBM and GDBM) library defined in the UNIX standard. QDBM is an alternative for DBM because of its higher performance.
Effective Implementation of Hash Database
QDBM is developed referring to GDBM for the purpose of the following three points: higher processing speed, smaller size of a database file, and simpler API. They have been achieved. Moreover, the following three restrictions of traditional DBM: a process can handle only one database, the size of a key and a value is bounded, a database file is sparse, are cleared.
QDBM uses hash algorithm to retrieve records. If a bucket array has sufficient number of elements, the time complexity of retrieval is `O(1)'. That is, time required for retrieving a record is constant, regardless of the scale of a database. It is also the same about storing and deleting. Collision of hash values is managed by separate chaining. Data structure of the chains is binary search tree. Even if a bucket array has unusually scarce elements, the time complexity of retrieval is `O(log n)'.
QDBM attains improvement in retrieval by loading RAM with the whole of a bucket array. If a bucket array is on RAM, it is possible to access a region of a target record by about one pass of file operations. A bucket array saved in a file is not read into RAM with the `read' call but directly mapped to RAM with the `mmap' call. Therefore, preparation time on connecting to a database is very short, and two or more processes can share the same memory map.
If the number of elements of a bucket array is about half of records stored within a database, although it depends on characteristic of the input, the probability of collision of hash values are about 56.7% (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight times). In such case, it is possible to retrieve a record by two or less passes of file operations. If it is made into a performance index, in order to handle a database containing one million of records, a bucket array with half a million of elements is needed. The size of each element is 4 bytes. That is, if 2M bytes of RAM is available, a database containing one million records can be handled.
QDBM provides two modes to connect to a database: `reader' and `writer'. A reader can perform retrieving but neither storing nor deleting. A writer can perform all access methods. Exclusion control between processes is performed when connecting to a database by file locking. While a writer is connected to a database, neither readers nor writers can be connected. While a reader is connected to a database, other readers can be connect, but writers can not. According to this mechanism, data consistency is guaranteed with simultaneous connections in multitasking environment.
Traditional DBM provides two modes of the storing operation, `insert' and `replace'. In the case a key overlaps an existing record, the insert mode keeps the existing value, while the replace mode transposes it to the specified value. In addition to the two modes, QDBM provides `concatenate' mode. In the mode, the specified value is concatenated at the end of the existing value and stored. This feature is useful when adding a element to a value as an array. Moreover, although DBM has a method to fetch out a value from a database only by reading the whole of a region of a record, QDBM has a method to fetch out a part of a region of a value. When a value is treated as an array, this feature is also useful.
Generally speaking, while succession of updating, fragmentation of available regions occurs, and the size of a database grows rapidly. QDBM deal with this problem by coalescence of dispensable regions and reuse of them, and featuring of optimization of a database. When overwriting a record with a value whose size is greater than the existing one, it is necessary to remove the region to another position of the file. Because the time complexity of the operation depends on the size of the region of a record, extending values successively is inefficient. However, QDBM deal with this problem by alignment. If increment can be put in padding, it is not necessary to remove the region.
Simple but Various APIs
QDBM provides very simple APIs. You can perform database I/O as usual file I/O with `FILE' pointer defined in ANSI C. In the basic API of QDBM, entity of a database is recorded as one file. In the extended API, entity of a database is recorded as several files in one directory. Using the latter, you can handle a database whose size is more than 2GB on 32-bit systems. Because the two APIs are very similar with each other, porting an application from one to the other is easy.
APIs which are compatible with NDBM and GDBM are also provided. As there are a lot of applications using NDBM or GDBM, it is easy to port them onto QDBM. In most cases, it is completed only by replacement of header including (#include) and re-compiling. However, QDBM can not handle database files made by the original NDBM or GDBM.
In order to handle records on memory easily, the utility API is provided. It implements memory allocating functions, sorting functions, extensible datum, array list and hash map. Using them, you can handle records in C language cheaply as in such script languages as Perl or Ruby.
Along with APIs for C, QDBM provides APIs for C++, Java, Perl and Ruby. APIs for C contain five kinds: the basic API, the extended API, the NDBM-compatible API, the GDBM-compatible API and the utility API. Command line interfaces corresponding to each API are also provided. They are useful for prototyping, testing, debugging and so on. The C++ API encapsulates database handling functions of the basic API and the extended API of QDBM with class mechanism of C++. The Java API has native methods calling the basic API and the extended API of QDBM with Java Native Interface. The Perl API has methods calling the basic API and the extended API of QDBM with XS language. The Ruby API has method calling the basic API and the extended API of QDBM as modules of Ruby. C++ API and Java API and Ruby API are thread-safe.
Wide Portability
QDBM is implemented, based on syntax of ANSI C (C89) and using only APIs defined in ANSI C or POSIX. Thus, QDBM works on most UNIX and its compatible OSs. As for C API, checking of operations have been done at least on Linux 2.2, Linux 2.4, FreeBSD 4.8, FreeBSD 5.0, SunOS 5.7, SunOS 5.8, SunOS 5.9, HP-UX 11.00, Cygwin 1.3.10 and MacOS X 10.2. Although a database file created by QDBM depends on byte order of the processor, to do with it, mutual converter of byte orders is provided.
Preparation
To install QDBM from a source package, GCC of 2.8 or later version and `make' are required.
When an archive file of QDBM is extracted, change the current working directory to the generated directory and perform installation.
Usual Steps
Run the configuration script.
./configure
Build programs.
make
Perform self-diagnostic test.
make check
Install programs. This operation must be carried out by the root user.
make install
Using Libtool
If above steps do not work, try the following steps. This way needs GNU libtool of 1.5 or later version.
Run the configuration script.
./configure
Build programs.
make -f LTmakefile
Perform self-diagnostic test.
make -f LTmakefile check
Install programs. This operation must be carried out by the root user.
make -f LTmakefile install
Result
When a series of work finishes, header files, `depot.h', `curia.h', `relic.h' and `hovel.h' will be installed in `/usr/local/include', libraries, `libqdbm.a', `libqdbm.so' and so on will be installed in `/usr/local/lib', executable commands, `dpmgr', `dptest', `dptsv', `crmgr', `crtest', `rlmgr', `rltest', `hvmgr', `hvtest' and `cbtest' will be installed in `/usr/local/bin'.
When you run a program linked dynamically to `libqdbm.so', the library search path should include `/usr/local/lib'. You can set the library search path with the environment variable `LD_LIBRARY_PATH'.
To uninstall QDBM, execute the following command after `./configure'. This operation must be carried out by the root user.
make uninstall
If an old version of QDBM is installed on your system, uninstall it before installation of a new one.
The other APIs except for C are not installed by default. Refer to `plus/xspex.html' to know how to install the C++ API. Refer to `java/jspex.html' to know how to install the Java API. Refer to `perl/pspex.html' to know how to install the Perl API. Refer to `ruby/rspex.html' to know how to install the Ruby API.
To install QDBM from such a binary package as RPM, refer to the manual of the package manager. For example, if you use RPM, execute like the following command by the root user.
rpm -ivh qdbm-1.x.x-x.i386.rpm
For Windows
On Windows (Cygwin), you should follow the procedures below for installation.
Run the configuration script.
./configure
Build programs.
make win
Perform self-diagnostic test.
make check-win
Install programs. As well, perform `make uninstall-win' to uninstall them.
make install-win
On Windows, the import library `libqdbm.dll.a' is created instead of the static library `libqdbm.a', and the dynamic linking library `qdbm.dll' is created instead of such shared libraries as `libqdbm.so'. `qdbm.dll' is installed into such system directory as `C:\WINNT\SYSTEM32'.
For MacOS X
On MacOS X (Darwin), you should follow the procedures below for installation.
Run the configuration script.
./configure
Build programs.
make mac
Perform self-diagnostic test.
make check-mac
Install programs. As well, perform `make uninstall-mac' to uninstall them.
make install-mac
On MacOS X, `libqdbm.dylib' and so on are created instead of `libqdbm.so' and so on. You can set the library search path with the environment variable `DYLD_LIBRARY_PATH'.
Depot is the basic API of QDBM. Almost all features for managing a database provided by QDBM are implemented by Depot. Other APIs are no more than wrappers of Depot. Depot is the fastest in all APIs of QDBM.
In order to use Depot, you should include `depot.h' and `stdlib.h' in the source files. Usually, the following description will be near the beginning of a source file.
- #include <depot.h>
- #include <stdlib.h>
A pointer to `DEPOT' is used as a database handle. It is like that some file I/O routines of `stdio.h' use a pointer to `FILE'. A database handle is opened with the function `dpopen' and closed with `dpclose'. You should not refer directly to any member of the handle. If a fatal error occurs in a database, any access method via the handle except `dpclose' will not work and return error status. Although a process is allowed to use multiple database handles at the same time, handles of the same database file should not be used.
The external variable `dpversion' is the string containing the version information.
- extern const char *dpversion;
The external variable `dpdbgfd' is a file descriptor to output debugging information.
- extern int dpdbgfd;
- The initial value of this variable is -1. If the value is negative, debugging output is not performed.
The external variable `dpecode' is assigned with the last happened error code. Refer to `depot.h' for details of the error codes.
- extern int dpecode;
- The initial value of this variable is `DP_NOERROR'.
The function `dperrmsg' is used in order to get a message string corresponding to an error code.
- const char *dperrmsg(int ecode);
- `ecode' specifies an error code. The return value is the message string of the error code. The region of the return value is not writable.
The function `dpopen' is used in order to get a database handle.
- DEPOT *dpopen(const char *name, int omode, int bnum);
- `name' specifies the name of a database file. `omode' specifies the connection mode: `DP_OWRITER' as a writer, `DP_OREADER' as a reader. If the mode is `DP_OWRITER', the following may be added by bitwise or: `DP_OCREAT', which means it creates a new database if not exist, `DP_OTRUNC', which means it creates a new database regardless if one exists. `bnum' specifies the number of elements of the bucket array. If it is not more than 0, the default value is specified. The size of a bucket array is determined on creating, and can not be changed except for by optimization of the database. Suggested size of a bucket array is about from 0.5 to 4 times of the number of all records to store. The return value is the database handle or `NULL' if it is not successful. While connecting as a writer, an exclusive lock is invoked to the database file. While connecting as a reader, a shared lock is invoked to the database file. The thread blocks until the lock is achieved.
The function `dpclose' is used in order to close a database handle.
- int dpclose(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is true, else, it is false. Because the region of a closed handle is released, it becomes impossible to use the handle. Updating a database is assured to be written when the handle is closed. If a writer opens a database but does not close it appropriately, the database will be broken.
The function `dpput' is used in order to store a record.
- int dpput(DEPOT *depot, const char *kbuf, int ksiz, const char *vbuf, int vsiz, int dmode);
- `depot' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `vbuf' specifies the pointer to the region of a value. `vsiz' specifies the size of the region of the value. If it is negative, the size is assigned with `strlen(vbuf)'. `dmode' specifies behavior when the key overlaps, by the following values: `DP_DOVER', which means the specified value overwrites the existing one, `DP_DKEEP', which means the existing value is kept, `DP_DCAT', which means the specified value is concatenated at the end of the existing value. If successful, the return value is true, else, it is false.
The function `dpout' is used in order to delete a record.
- int dpout(DEPOT *depot, const char *kbuf, int ksiz);
- `depot' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is true, else, it is false. False is returned when no record corresponds to the specified key.
The function `dpget' is used in order to retrieve a record.
- char *dpget(DEPOT *depot, const char *kbuf, int ksiz, int start, int max, int *sp);
- `depot' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. If it is negative, the size to read is unlimited. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding record, else, it is `NULL'. `NULL' is returned when no record corresponds to the specified key or the size of the value of the corresponding record is less than `start'. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `dpvsiz' is used in order to get the size of the value of a record.
- int dpvsiz(DEPOT *depot, const char *kbuf, int ksiz);
- `depot' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is the size of the value of the corresponding record, else, it is -1. Because this function does not read the entity of a record, it is faster than `dpget'.
The function `dpiterinit' is used in order to initialize the iterator of a database handle.
- int dpiterinit(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is true, else, it is false. The iterator is used in order to access the key of every record stored in a database.
The function `dpiternext' is used in order to get the next key of the iterator.
- char *dpiternext(DEPOT *depot, int *sp);
- `depot' specifies a database handle. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the next key, else, it is `NULL'. `NULL' is returned when no record is to be get out of the iterator. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. It is possible to access every record by iteration of calling this function. However, it is not assured if updating the database is occurred while the iteration. Besides, the order of this traversal access method is arbitrary, so it is not assured that the order of storing matches the one of the traversal access.
The function `dpsetalign' is used in order to set alignment of a database handle.
- int dpsetalign(DEPOT *depot, int align);
- `depot' specifies a database handle connected as a writer. `align' specifies the size of alignment. If successful, the return value is true, else, it is false. If alignment is set to a database, the efficiency of overwriting values are improved. The size of alignment is suggested to be average size of the values of the records to be stored. If alignment is positive, padding whose size is multiple number of the alignment is placed. If alignment is negative, as `vsiz' is the size of a value, the size of padding is calculated with `(vsiz / pow(2, abs(align) - 1))'. Because alignment setting is not saved in a database, you should specify alignment every opening a database.
The function `dpsync' is used in order to synchronize updating contents with the file and the device.
- int dpsync(DEPOT *depot);
- `depot' specifies a database handle connected as a writer. If successful, the return value is true, else, it is false. This function is useful when another process uses the connected database file.
The function `dpoptimize' is used in order to optimize a database.
- int dpoptimize(DEPOT *depot, int bnum);
- `depot' specifies a database handle connected as a writer. `bnum' specifies the number of the elements of the bucket array. If it is not more than 0, the default value is specified. If successful, the return value is true, else, it is false. In an alternating succession of deleting and storing with overwrite or concatenate, dispensable regions accumulate. This function is useful to do away with them.
The function `dpname' is used in order to get the name of a database.
- char *dpname(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is the pointer to the region of the name of the database, else, it is `NULL'. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `dpfsiz' is used in order to get the size of a database file.
- int dpfsiz(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is the size of the database file, else, it is -1.
The function `dpbnum' is used in order to get the number of the elements of the bucket array.
- int dpbnum(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is the number of the elements of the bucket array, else, it is -1.
The function `dpbusenum' is used in order to get the number of the used elements of the bucket array.
- int dpbusenum(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is the number of the used elements of the bucket array, else, it is -1. This function is inefficient because it accesses all elements of the bucket array.
The function `dprnum' is used in order to get the number of the records stored in a database.
- int dprnum(DEPOT *depot);
- `depot' specifies a database handle. If successful, the return value is the number of the records stored in the database, else, it is -1.
The function `dpwritable' is used in order to check whether a database handle is a writer or not.
- int dpwritable(DEPOT *depot);
- `depot' specifies a database handle. The return value is true if the handle is a writer, false if not.
The function `dpfatalerror' is used in order to check whether a database has a fatal error or not.
- int dpfatalerror(DEPOT *depot);
- `depot' specifies a database handle. The return value is true if the database has a fatal error, false if not.
The function `dpinode' is used in order to get the inode number of a database file.
- int dpinode(DEPOT *depot);
- `depot' specifies a database handle. The return value is the inode number of the database file.
The function `dpfdesc' is used in order to get the file descriptor of a database file.
- int dpfdesc(DEPOT *depot);
- `depot' specifies a database handle. The return value is the file descriptor of the database file. Handling the file descriptor of a database file directly is not suggested.
The function `dpremove' is used in order to remove a database file.
- int dpremove(const char *name);
- `name' specifies the name of a database file. If successful, the return value is true, else, it is false.
The function `dpeconv' is used in order to convert a database file for another platform with different byte order.
- int dpeconv(const char *name, int big);
- `name' specifies the name of a database file. `big' specifies whether the result is for big endian or not. If successful, the return value is true, else, it is false. Content of each records is not converted. Applications are responsible for it.
The function `dpinnerhash' is a hash function used inside Depot.
- int dpinnerhash(const char *kbuf, int ksiz);
- `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. The return value is the hash value of 31 bits length computed from the key. This function is useful when an application calculates the state of the inside bucket array.
The function `dpouterhash' is a hash function which is independent from the hash functions used inside Depot.
- int dpouterhash(const char *kbuf, int ksiz);
- `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. The return value is the hash value of 31 bits length computed from the key. This function is useful when an application uses its own hash algorithm outside Depot.
The function `dpprimenum' is used in order to get a prime number not less than a number.
- int dpprimenum(int num);
- `num' specified a positive number. The return value is a prime number not less than the specified number. This function is useful when an application determines the size of a bucket array of its own hash algorithm.
The following example stores and retrieves a phone number, using the name as the key.
#include <depot.h>
#include <stdlib.h>
#include <stdio.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
DEPOT *depot;
char *val;
/* open the database */
if(!(depot = dpopen(DBNAME, DP_OWRITER | DP_OCREAT, -1))){
fprintf(stderr, "dpopen: %s\n", dperrmsg(dpecode));
return 1;
}
/* store the record */
if(!dpput(depot, NAME, -1, NUMBER, -1, DP_DOVER)){
fprintf(stderr, "dpput: %s\n", dperrmsg(dpecode));
}
/* retrieve the record */
if(!(val = dpget(depot, NAME, -1, 0, -1, NULL))){
fprintf(stderr, "dpget: %s\n", dperrmsg(dpecode));
} else {
printf("Name: %s\n", NAME);
printf("Number: %s\n", val);
free(val);
}
/* close the database */
if(!dpclose(depot)){
fprintf(stderr, "dpclose: %s\n", dperrmsg(dpecode));
return 1;
}
return 0;
}
The following example shows all records of the database.
#include <depot.h>
#include <stdlib.h>
#include <stdio.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
DEPOT *depot;
char *key, *val;
/* open the database */
if(!(depot = dpopen(DBNAME, DP_OREADER, -1))){
fprintf(stderr, "dpopen: %s\n", dperrmsg(dpecode));
return 1;
}
/* initialize the iterator */
if(!dpiterinit(depot)){
fprintf(stderr, "dpiterinit: %s\n", dperrmsg(dpecode));
}
/* scan the iterator */
while((key = dpiternext(depot, NULL)) != NULL){
if(!(val = dpget(depot, key, -1, 0, -1, NULL))){
fprintf(stderr, "dpget: %s\n", dperrmsg(dpecode));
free(key);
break;
}
printf("%s: %s\n", key, val);
free(val);
free(key);
}
/* close the database */
if(!dpclose(depot)){
fprintf(stderr, "dpclose: %s\n", dperrmsg(dpecode));
return 1;
}
return 0;
}
For building a program using Depot, the program should be linked with a library file `libqdbm.a' or `libqdbm.so'. For example, the following command is executed to build `sample' from `sample.c'.
gcc -I/usr/local/include -o sample sample.c -L/usr/local/lib -lqdbm
Although each function of Depot is not reentrant, it does not use any static object internally. So, it can be used as a thread-safe function if each calling and reference to the external variable `dpecode' are under exclusion control, on the assumption that `errno', `malloc' and so on are thread-safe.
Depot has the following command line interfaces.
The command `dpmgr' is a utility for debugging Depot and its applications. It features editing and checking of a database. It can be used for database applications with shell scripts. This command is used in the following format. `name' specifies a database name. `key' specifies the key of a record. `val' specifies the value of a record.
- dpmgr create [-v] [-bnum num] name
- Create a database file.
- dpmgr put [-v] [-kx|-ki] [-vx|-vi|-vf] [-keep|-cat] [-na] name key val
- Store a record with a key and a value.
- dpmgr out [-v] [-kx|-ki] name key
- Delete a record with a key.
- dpmgr get [-v] [-kx|-ki] [-start num] [-max num] [-ox] [-n] name key
- Retrieve a record with a key and output it to the standard output.
- dpmgr list [-v] [-ox] name
- List all keys delimited with line-feed to the standard output.
- dpmgr optimize [-v] [-bnum num] [-na] name
- Optimize a database.
- dpmgr inform [-v] name
- Output miscellaneous information to the standard output.
- dpmgr remove [-v] name
- Remove a database file.
- dpmgr econv [-v] [-be|-le] name
- Convert endian of a database file for the local system.
- dpmgr version
- Output version information of QDBM to the standard output.
Options feature the following.
- -v : output debug information.
- -bnum num : specifies the number of the elements of the bucket array.
- -kx : treat `key' as a binary expression of hexadecimal notation.
- -ki : treat `key' as an integer expression of decimal notation.
- -vx : treat `val' as a binary expression of hexadecimal notation.
- -vi : treat `val' as an integer expression of decimal notation.
- -vf : read the value from a file specified with `val'.
- -keep : specify the storing mode for `DP_OKEEP'.
- -cat : specify the storing mode for `DP_OCAT'.
- -na : do not set alignment.
- -start : specify the beginning offset of a value to fetch.
- -max : specify the max size of a value to fetch.
- -ox : treat the output as a binary expression of hexadecimal notation.
- -n : do not output the tailing newline.
- -be : convert the database file for big endian.
- -le : convert the database file for little endian.
This command returns 0 on success, another on failure.
The command `dptest' is a utility for facility test and performance test. Check a database generated by the command or measure the execution time of the command. This command is used in the following format. `name' specifies a database name. `rnum' specifies the number of the records. `bnum' specifies the number of the elements of the bucket array.
- dptest write name rnum bnum
- Store records with keys of 8 bytes. They change as `00000001', `00000002'...
- dptest read name
- Retrieve all records of the database above.
- dptest rcat name rnum bnum pnum align
- Store records with partway duplicated keys using concatenate mode.
- dptest combo name
- Perform combination test of various operations.
- dptest wicked name rnum
- Perform updating operations selected at random.
This command returns 0 on success, another on failure.
The command `dptsv' features mutual conversion between a database of Depot and a TSV text. This command is used in the following format. `name' specifies a database name. The subcommand `export' reads TSV data from the standard input. If a key overlaps, the latter is adopted. `-bnum' specifies the number of the elements of the bucket array. The subcommand `import' writes TSV data to the standard output.
- dptsv import [-bnum num] name
- Create a database from TSV.
- dptsv export name
- Write TSV data of a database.
This command returns 0 on success, another on failure.
Curia is the extended API of QDBM. It provides routines for managing multiple database files in a directory. Restrictions of some file systems that the size of each file is limited are escaped by dividing a database file into two or more. If the database files deploy on multiple devices, the scalability is improved.
Although Depot creates a database with a file name, Curia creates a database with a directory name. A database file named `depot' places in the specified directory. Although it keeps the attribute of the database, it does not keep the entities of the records. Besides, sub directories are created by the number of division of the database, named with 4 digits. The database files place in the subdirectories. The entities of the records are stored in the database file. For example, in the case that a database directory named `casket' and the number of division is 3, `casket/depot', `casket/0001/depot', `casket/0002/depot' and `casket/0003/depot' are created. No error occurs even if the namesake directory exists when creating a database. So, if sub directories exists and some devices are mounted on the sub directories, the database files deploy on the multiple devices. It is possible for the database files to deploy on multiple file servers using NFS and so on.
Curia features managing large objects. Although usual records are stored in some database files, records of large objects are stored in individual files. Because the files of large objects are deployed in different directories named with the hash values, the access speed is part-way robust although it is slower than the speed of usual records. Large and not often accessed data should be secluded as large objects. By doing this, the access speed of usual records are improved. the directory hierarchies of large objects are places in the directory named `lob' in the sub directories of the database. Because the key spaces of the usual records and the large objects are different, the operations keep out of each other.
In order to use Curia, you should include `depot.h', `curia.h' and `stdlib.h' in the source files. Usually, the following description will be near the beginning of a source file.
- #include <depot.h>
- #include <curia.h>
- #include <stdlib.h>
A pointer to `CURIA' is used as a database handle. It is like that some file I/O routines of `stdio.h' use a pointer to `FILE'. A database handle is opened with the function `cropen' and closed with `crclose'. You should not refer directly to any member of the handle. If a fatal error occurs in a database, any access method via the handle except `crclose' will not work and return error status. Although a process is allowed to use multiple database handles at the same time, handles of the same database directory should not be used.
Curia also assign the external variable `dpecode' with the error code. The function `dperrmsg' is used in order to get the message of the error code.
The function `cropen' is used in order to get a database handle.
- CURIA *cropen(const char *name, int omode, int bnum, int dnum);
- `name' specifies the name of a database directory. `omode' specifies the connection mode: `CR_OWRITER' as a writer, `CR_OREADER' as a reader. If the mode is `CR_OWRITER', the following may be added by bitwise or: `CR_OCREAT', which means it creates a new database if not exist, `CR_OTRUNC', which means it creates a new database regardless if one exists. `bnum' specifies the number of elements of each bucket array. If it is not more than 0, the default value is specified. The size of each bucket array is determined on creating, and can not be changed except for by optimization of the database. Suggested size of each bucket array is about from 0.5 to 4 times of the number of all records to store. `dnum' specifies the number of division of the database. If it is not more than 0, the default value is specified. The number of division can not be changed from the initial value. The return value is the database handle or `NULL' if it is not successful. While connecting as a writer, an exclusive lock is invoked to the database directory. While connecting as a reader, a shared lock is invoked to the database directory. The thread blocks until the lock is achieved.
The function `crclose' is used in order to close a database handle.
- int crclose(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is true, else, it is false. Because the region of a closed handle is released, it becomes impossible to use the handle. Updating a database is assured to be written when the handle is closed. If a writer opens a database but does not close it appropriately, the database will be broken.
The function `crput' is used in order to store a record.
- int crput(CURIA *curia, const char *kbuf, int ksiz, const char *vbuf, int vsiz, int dmode);
- `curia' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `vbuf' specifies the pointer to the region of a value. `vsiz' specifies the size of the region of the value. If it is negative, the size is assigned with `strlen(vbuf)'. `dmode' specifies behavior when the key overlaps, by the following values: `CR_DOVER', which means the specified value overwrites the existing one, `CR_DKEEP', which means the existing value is kept, `CR_DCAT', which means the specified value is concatenated at the end of the existing value. If successful, the return value is true, else, it is false.
The function `crout' is used in order to delete a record.
- int crout(CURIA *curia, const char *kbuf, int ksiz);
- `curia' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is true, else, it is false. False is returned when no record corresponds to the specified key.
The function `crget' is used in order to retrieve a record.
- char *crget(CURIA *curia, const char *kbuf, int ksiz, int start, int max, int *sp);
- `curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. If it is negative, the size to read is unlimited. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding record, else, it is `NULL'. `NULL' is returned when no record corresponds to the specified key or the size of the value of the corresponding record is less than `start'. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `crvsiz' is used in order to get the size of the value of a record.
- int crvsiz(CURIA *curia, const char *kbuf, int ksiz);
- `curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is the size of the value of the corresponding record, else, it is -1. Because this function does not read the entity of a record, it is faster than `crget'.
The function `criterinit' is used in order to initialize the iterator of a database handle.
- int criterinit(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is true, else, it is false. The iterator is used in order to access the key of every record stored in a database.
The function `criternext' is used in order to get the next key of the iterator.
- char *criternext(CURIA *curia, int *sp);
- `curia' specifies a database handle. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the next key, else, it is `NULL'. `NULL' is returned when no record is to be get out of the iterator. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. It is possible to access every record by iteration of calling this function. However, it is not assured if updating the database is occurred while the iteration. Besides, the order of this traversal access method is arbitrary, so it is not assured that the order of storing matches the one of the traversal access.
The function `crsetalign' is used in order to set alignment of a database handle.
- int crsetalign(CURIA *curia, int align);
- `curia' specifies a database handle connected as a writer. `align' specifies the size of alignment. If successful, the return value is true, else, it is false. If alignment is set to a database, the efficiency of overwriting values are improved. The size of alignment is suggested to be average size of the values of the records to be stored. If alignment is positive, padding whose size is multiple number of the alignment is placed. If alignment is negative, as `vsiz' is the size of a value, the size of padding is calculated with `(vsiz / pow(2, abs(align) - 1))'. Because alignment setting is not saved in a database, you should specify alignment every opening a database.
The function `crsync' is used in order to synchronize updating contents with the files and the devices.
- int crsync(CURIA *curia);
- `curia' specifies a database handle connected as a writer. If successful, the return value is true, else, it is false. This function is useful when another process uses the connected database directory.
The function `croptimize' is used in order to optimize a database.
- int croptimize(CURIA *curia, int bnum);
- `curia' specifies a database handle connected as a writer. `bnum' specifies the number of the elements of each bucket array. If it is not more than 0, the default value is specified. In an alternating succession of deleting and storing with overwrite or concatenate, dispensable regions accumulate. This function is useful to do away with them.
The function `crname' is used in order to get the name of a database.
- char *crname(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is the pointer to the region of the name of the database, else, it is `NULL'. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `crfsiz' is used in order to get the total size of database files.
- int crfsiz(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is the total size of the database files, else, it is -1.
The function `crbnum' is used in order to get the total number of the elements of each bucket array.
- int crbnum(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is the total number of the elements of each bucket array, else, it is -1.
The function `crbusenum' is used in order to get the total number of the used elements of each bucket array.
- int crbusenum(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is the total number of the used elements of each bucket array, else, it is -1. This function is inefficient because it accesses all elements of each bucket array.
The function `crrnum' is used in order to get the number of the records stored in a database.
- int crrnum(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is the number of the records stored in the database, else, it is -1.
The function `crwritable' is used in order to check whether a database handle is a writer or not.
- int crwritable(CURIA *curia);
- `curia' specifies a database handle. The return value is true if the handle is a writer, false if not.
The function `crfatalerror' is used in order to check whether a database has a fatal error or not.
- int crfatalerror(CURIA *curia);
- `curia' specifies a database handle. The return value is true if the database has a fatal error, false if not.
The function `crinode' is used in order to get the inode number of a database directory.
- int crinode(CURIA *curia);
- `curia' specifies a database handle. The return value is the inode number of the database directory.
The function `crremove' is used in order to remove a database directory.
- int crremove(const char *name);
- `name' specifies the name of a database directory. If successful, the return value is true, else, it is false.
The function `creconv' is used in order to convert a database directory for another platform with different byte order.
- int creconv(const char *name, int big);
- `name' specifies the name of a database directory. `big' specifies whether the result is for big endian or not. If successful, the return value is true, else, it is false. Content of each records is not converted. Applications are responsible for it.
The function `crputlob' is used in order to store a large object.
- int crputlob(CURIA *curia, const char *kbuf, int ksiz, const char *vbuf, int vsiz, int dmode);
- `curia' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `vbuf' specifies the pointer to the region of a value. `vsiz' specifies the size of the region of the value. If it is negative, the size is assigned with `strlen(vbuf)'. `dmode' specifies behavior when the key overlaps, by the following values: `CR_DOVER', which means the specified value overwrites the existing one, `CR_DKEEP', which means the existing value is kept, `CR_DCAT', which means the specified value is concatenated at the end of the existing value. If successful, the return value is true, else, it is false.
The function `croutlob' is used in order to delete a large object.
- int croutlob(CURIA *curia, const char *kbuf, int ksiz);
- `curia' specifies a database handle connected as a writer. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is true, else, it is false. false is returned when no large object corresponds to the specified key.
The function `crgetlob' is used in order to retrieve a large object.
- char *crgetlob(CURIA *curia, const char *kbuf, int ksiz, int start, int max, int *sp);
- `curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `start' specifies the offset address of the beginning of the region of the value to be read. `max' specifies the max size to be read. If it is negative, the size to read is unlimited. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding large object, else, it is `NULL'. `NULL' is returned when no large object corresponds to the specified key or the size of the value of the corresponding large object is less than `start'. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `crvsizlob' is used in order to get the size of the value of a large object.
- int crvsizlob(CURIA *curia, const char *kbuf, int ksiz);
- `curia' specifies a database handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is the size of the value of the corresponding large object, else, it is -1. Because this function does not read the entity of a large object, it is faster than `crgetlob'.
The function `crrnumlob' is used in order to get the number of the large objects stored in a database.
- int crrnumlob(CURIA *curia);
- `curia' specifies a database handle. If successful, the return value is the number of the large objects stored in the database, else, it is -1.
The following example stores and retrieves a phone number, using the name as the key.
#include <depot.h>
#include <curia.h>
#include <stdlib.h>
#include <stdio.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
CURIA *curia;
char *val;
/* open the database */
if(!(curia = cropen(DBNAME, CR_OWRITER | CR_OCREAT, -1, -1))){
fprintf(stderr, "cropen: %s\n", dperrmsg(dpecode));
return 1;
}
/* store the record */
if(!crput(curia, NAME, -1, NUMBER, -1, CR_DOVER)){
fprintf(stderr, "crput: %s\n", dperrmsg(dpecode));
}
/* retrieve the record */
if(!(val = crget(curia, NAME, -1, 0, -1, NULL))){
fprintf(stderr, "crget: %s\n", dperrmsg(dpecode));
} else {
printf("Name: %s\n", NAME);
printf("Number: %s\n", val);
free(val);
}
/* close the database */
if(!crclose(curia)){
fprintf(stderr, "crclose: %s\n", dperrmsg(dpecode));
return 1;
}
return 0;
}
The following example shows all records of the database.
#include <depot.h>
#include <curia.h>
#include <stdlib.h>
#include <stdio.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
CURIA *curia;
char *key, *val;
/* open the database */
if(!(curia = cropen(DBNAME, CR_OREADER, -1, -1))){
fprintf(stderr, "cropen: %s\n", dperrmsg(dpecode));
return 1;
}
/* initialize the iterator */
if(!criterinit(curia)){
fprintf(stderr, "criterinit: %s\n", dperrmsg(dpecode));
}
/* scan the iterator */
while((key = criternext(curia, NULL)) != NULL){
if(!(val = crget(curia, key, -1, 0, -1, NULL))){
fprintf(stderr, "crget: %s\n", dperrmsg(dpecode));
free(key);
break;
}
printf("%s: %s\n", key, val);
free(val);
free(key);
}
/* close the iterator */
if(!crclose(curia)){
fprintf(stderr, "crclose: %s\n", dperrmsg(dpecode));
return 1;
}
return 0;
}
How to build programs using Curia is the same as the case of Depot.
Although each function of Curia is not reentrant, it does not use any static object internally. So, it can be used as a thread-safe function if each calling and reference to the external variable `dpecode' are under exclusion control, on the assumption that `errno', `malloc' and so on are thread-safe.
Curia has the following command line interfaces.
The command `crmgr' is a utility for debugging Curia and its applications. It features editing and checking of a database. It can be used for the database applications with shell scripts. This command is used in the following format. `name' specifies a database name. `key' specifies the key of a record. `val' specifies the value of a record.
- crmgr create [-v] [-bnum num] [-dnum num] name
- Create a database file.
- crmgr put [-v] [-kx|-ki] [-vx|-vi|-vf] [-keep|-cat] [-lob] [-na] name key val
- Store a record with a key and a value.
- crmgr out [-v] [-kx|-ki] [-lob] name key
- Delete a record with a key.
- crmgr get [-v] [-kx|-ki] [-start num] [-max num] [-ox] [-lob] [-n] name key
- Retrieve a record with a key and output it to the standard output.
- crmgr list [-v] [-ox] name
- List all keys delimited with line-feed to the standard output.
- crmgr optimize [-v] [-bnum num] [-na] name
- Optimize a database.
- crmgr inform [-v] name
- Output miscellaneous information to the standard output.
- crmgr remove [-v] name
- Remove a database directory.
- crmgr econv [-v] [-be|-le] name
- Convert endian of a database directory for the local system.
- crmgr version
- Output version information of QDBM to the standard output.
Options feature the following.
- -v : output debug information.
- -bnum num : specifies the number of elements of each bucket array.
- -dnum num : specifies the number of division of the database.
- -kx : treat `key' as a binary expression of hexadecimal notation.
- -ki : treat `key' as an integer expression of decimal notation.
- -vx : treat `val' as a binary expression of hexadecimal notation.
- -vi : treat `val' as an integer expression of decimal notation.
- -vf : read the value from a file specified with `val'.
- -keep : specify the storing mode for `CR_OKEEP'.
- -cat : specify the storing mode for `CR_OCAT'.
- -na : do not set alignment.
- -start : specify the beginning offset of a value to fetch.
- -max : specify the max size of a value to fetch.
- -ox : treat the output as a binary expression of hexadecimal notation.
- -lob : handle large objects.
- -n : do not output the tailing newline.
- -be : convert the database directory for big endian.
- -le : convert the database directory for little endian.
This command returns 0 on success, another on failure.
The command `crtest' is a utility for facility test and performance test. Check a database generated by the command or measure the execution time of the command. This command is used in the following format. `name' specifies a database name. `rnum' specifies the number of records. `bnum' specifies the number of elements of a bucket array. `dnum' specifies the number of division of a database.
- crtest write [-lob] name rnum bnum dnum
- Store records with keys of 8 bytes. They change as `00000001', `00000002'...
- crtest read name
- Retrieve all records of the database above.
- crtest combo name
- Perform combination test of various operations.
Options feature the following.
- -lob : handle large objects.
This command returns 0 on success, another on failure.
Relic is the API which is compatible with NDBM. So, Relic wraps functions of Depot as API of NDBM. It is easy to port an application from NDBM to QDBM. In most cases, you should only replace the includings of `ndbm.h' with `relic.h' and replace the linking option `-lndbm' with `-lqdbm'.
The original NDBM treats a database as a pair of files. One, `a directory file', has a name with suffix `.dir' and stores a bit map of keys. The other, `a data file', has a name with suffix `.pag' and stores entities of each records. Relic creates the directory file as a mere dummy file and creates the data file as a database. Relic has no restriction about the size of each record. Relic can not handle database files made by the original NDBM.
In order to use Relic, you should include `relic.h', `stdlib.h', `sys/types.h', `sys/stat.h' and `fcntl.h' in the source files. Usually, the following description will be near the beginning of a source file.
- #include <relic.h>
- #include <stdlib.h>
- #include <sys/types.h>
- #include <sys/stat.h>
- #include <fcntl.h>
A pointer to `DBM' is used as a database handle. A database handle is opened with the function `dbm_open' and closed with `dbm_close'. You should not refer directly to any member of a handle.
Structures of `datum' type is used in order to give and receive data of keys and values with functions of Relic.
- typedef struct { void *dptr; size_t dsize; } datum;
- `dptr' specifies the pointer to the region of a key or a value. `dsize' specifies the size of the region.
The function `dbm_open' is used in order to get a database handle.
- DBM *dbm_open(char *name, int flags, int mode);
- `name' specifies the name of a database. The file names are concatenated with suffixes. `flags' is the same as the one of `open' call, although `O_WRONLY' is treated as `O_RDWR' and additional flags except for `O_CREAT' and `O_TRUNC' have no effect. `mode' specifies the mode of the database file as the one of `open' call does. The return value is the database handle or `NULL' if it is not successful.
The function `dbm_close' is used in order to close a database handle.
- void dbm_close(DBM *db);
- `db' specifies a database handle. Because the region of the closed handle is released, it becomes impossible to use the handle.
The function `dbm_store' is used in order to store a record.
- int dbm_store(DBM *db, datum key, datum content, int flags);
- `db' specifies a database handle. `key' specifies a structure of a key. `content' specifies a structure of a value. `flags' specifies behavior when the key overlaps, by the following values: `DBM_REPLACE', which means the specified value overwrites the existing one, `DBM_INSERT', which means the existing value is kept. The return value is 0 if it is successful, 1 if it gives up because of overlaps of the key, -1 if other error occurs.
The function `dbm_delete' is used in order to delete a record.
- int dbm_delete(DBM *db, datum key);
- `db' specifies a database handle. `key' specifies a structure of a key. The return value is 0 if it is successful, -1 if some errors occur.
The function `dbm_fetch' is used in order to retrieve a record.
- datum dbm_fetch(DBM *db, datum key);
- `db' specifies a database handle. `key' specifies a structure of a key. The return value is a structure of the result. If a record corresponds, the member `dptr' of the structure is the pointer to the region of the value. If no record corresponds or some errors occur, `dptr' is `NULL'. `dptr' points to the region related with the handle. The region is available until the next time of calling this function with the same handle.
The function `dbm_firstkey' is used in order to get the first key of a database.
- datum dbm_firstkey(DBM *db);
- `db' specifies a database handle. The return value is a structure of the result. If a record corresponds, the member `dptr' of the structure is the pointer to the region of the first key. If no record corresponds or some errors occur, `dptr' is `NULL'. `dptr' points to the region related with the handle. The region is available until the next time of calling this function or the function `dbm_nextkey' with the same handle.
The function `dbm_nextkey' is used in order to get the next key of a database.
- datum dbm_nextkey(DBM *db);
- `db' specifies a database handle. The return value is a structure of the result. If a record corresponds, the member `dptr' of the structure is the pointer to the region of the next key. If no record corresponds or some errors occur, `dptr' is `NULL'. `dptr' points to the region related with the handle. The region is available until the next time of calling this function or the function `dbm_firstkey' with the same handle.
The function `dbm_error' is used in order to check whether a database has a fatal error or not.
- int dbm_error(DBM *db);
- `db' specifies a database handle. The return value is true if the database has a fatal error, false if not.
The function `dbm_clearerr' has no effect.
- int dbm_clearerr(DBM *db);
- `db' specifies a database handle. The return value is 0. The function is only for compatibility.
The function `dbm_rdonly' is used in order to check whether a handle is read-only or not.
- int dbm_rdonly(DBM *db);
- `db' specifies a database handle. The return value is true if the handle is read-only, or false if not read-only.
The function `dbm_dirfno' is used in order to get the file descriptor of a directory file.
- int dbm_dirfno(DBM *db);
- `db' specifies a database handle. The return value is the file descriptor of the directory file.
The function `dbm_pagfno' is used in order to get the file descriptor of a data file.
- int dbm_pagfno(DBM *db);
- `db' specifies a database handle. The return value is the file descriptor of the data file.
The following example stores and retrieves a phone number, using the name as the key.
#include <relic.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <string.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
DBM *db;
datum key, val;
int i;
/* open the database */
if(!(db = dbm_open(DBNAME, O_RDWR | O_CREAT, 00644))){
perror("dbm_open");
return 1;
}
/* prepare the record */
key.dptr = NAME;
key.dsize = strlen(NAME);
val.dptr = NUMBER;
val.dsize = strlen(NUMBER);
/* store the record */
if(dbm_store(db, key, val, DBM_REPLACE) != 0){
perror("dbm_store");
}
/* retrieve the record */
val = dbm_fetch(db, key);
if(val.dptr){
printf("Name: %s\n", NAME);
printf("Number: ");
for(i = 0; i < val.dsize; i++){
putchar(((char *)val.dptr)[i]);
}
putchar('\n');
} else {
perror("dbm_fetch");
}
/* close the database */
dbm_close(db);
return 0;
}
How to build programs using Relic is the same as the case of Depot. Note that an option to be given to a linker is not `-lndbm', but `-lqdbm'.
Each function of Relic is not reentrant, and not thread-safe.
Relic has the following command line interfaces.
The command `rlmgr' is a utility for debugging Relic and its applications. It features editing and checking of a database. It can be used for database applications with shell scripts. This command is used in the following format. `name' specifies a database name. `key' specifies the key of a record. `val' specifies the value of a record.
- rlmgr create name
- Create a database file.
- rlmgr store [-kx] [-vx|-vf] [-insert] name key val
- Store a record with a key and a value.
- rlmgr delete [-kx] name key
- Delete a record with a key.
- rlmgr fetch [-kx] [-ox] [-n] name key
- Retrieve a record with a key and output to the standard output.
- rlmgr list [-v] [-ox] name
- List all keys delimited with line-feed to the standard output.
Options feature the following.
- -kx : treat `key' as a binary expression of hexadecimal notation.
- -vx : treat `val' as a binary expression of hexadecimal notation.
- -vf : read the value from a file specified with `val'.
- -insert : specify the storing mode for `DBM_INSERT'.
- -ox : treat the output as a binary expression of hexadecimal notation.
- -n : do not output the tailing newline.
This command returns 0 on success, another on failure.
The command `rltest' is a utility for facility test and performance test. Check a database generated by the command or measure the execution time of the command. This command is used in the following format. `name' specifies a database name. `rnum' specifies the number of records.
- rltest write name rnum
- Store records with keys of 8 bytes. They change as `00000001', `00000002'...
- rltest read name rnum
- Retrieve records of the database above.
This command returns 0 on success, another on failure.
Hovel is the API which is compatible with GDBM. So, Hovel wraps functions of Depot and Curia as API of GDBM. It is easy to port an application from GDBM to QDBM. In most cases, you should only replace the includings of `gdbm.h' with `hovel.h' and replace the linking option `-lgdbm' with `-lqdbm'. Hovel can not handle database files made by the original GDBM.
In order to use Hovel, you should include `hovel.h', `stdlib.h', `sys/types.h' and `sys/stat.h' in the source files. Usually, the following description will be near the beginning of a source file.
- #include <hovel.h>
- #include <stdlib.h>
- #include <sys/types.h>
- #include <sys/stat.h>
An object of `GDBM_FILE' is used as a database handle. A database handle is opened with the function `gdbm_open' and closed with `gdbm_close'. You should not refer directly to any member of a handle. Although Hovel works as a wrapper of Depot and handles a database file usually, if you use the function `gdbm_open2' to open the handle, it is possible to make behavior of a handle as a wrapper of Curia and treat a database directory.
Structures of `datum' type is used in order to give and receive data of keys and values with functions of Hovel.
- typedef struct { char *dptr; size_t dsize; } datum;
- `dptr' specifies the pointer to the region of a key or a value. `dsize' specifies the size of the region.
The external variable `gdbm_version' is the string containing the version information.
- extern char *gdbm_version;
The external variable `gdbm_error' is assigned with the last happened error code. Refer to `hovel.h' for details of the error codes.
- extern gdbm_error gdbm_errno;
The function `gdbm_strerror' is used in order to get a message string corresponding to an error code.
- char *gdbm_strerror(gdbm_error gdbmerrno);
- `gdbmerrno' specifies an error code. The return value is the message string of the error code. The region of the return value is not writable.
The function `gdbm_open' is used in order to get a database handle after the fashion of GDBM.
- GDBM_FILE gdbm_open(char *name, int block_size, int read_write, int mode, void (*fatal_func)(void));
- `name' specifies the name of a database. `block_size' is ignored. `read_write' specifies the connection mode: `GDBM_READER' as a reader, `GDBM_WRITER', `GDBM_WRCREAT' and `GDBM_NEWDB' as a writer. `GDBM_WRCREAT' makes a database file if it does not exist. `GDBM_NEWDB' makes a new database even if it exists. You can add the following to writer modes by bitwise or: `GDBM_SYNC', `GDBM_NOLOCK' and `GDBM_FAST'. `GDBM_SYNC' means a database is synchronized after every updating method. The other two are ignored. `mode' specifies mode of a database file as the one of `open' call does. `fatal_func' is ignored. The return value is the database handle or `NULL' if it is not successful.
The function `gdbm_open2' is used in order to get a database handle after the fashion of QDBM.
- GDBM_FILE gdbm_open2(char *name, int read_write, int mode, int bnum, int dnum, int align);
- `name' specifies the name of a database. `read_write' specifies the connection mode: `GDBM_READER' as a reader, `GDBM_WRITER', `GDBM_WRCREAT' and `GDBM_NEWDB' as a writer. `GDBM_WRCREAT' makes a database file or directory if it does not exist. `GDBM_NEWDB' makes a new database even if it exists. You can add the following to writer modes by bitwise or: `GDBM_SYNC', `GDBM_NOLOCK' and `GDBM_FAST'. `GDBM_SYNC' means a database is synchronized after every updating method. The other two are ignored. `mode' specifies a mode of a database file or a database directory as the one of `open' or `mkdir' call does. `bnum' specifies the number of elements of each bucket array. If it is not more than 0, the default value is specified. `dnum' specifies the number of division of the database. If it is not more than 0, the returning handle is created as a wrapper of Depot, else, it is as a wrapper of Curia. `align' specifies the basic size of alignment. The return value is the database handle or `NULL' if it is not successful. If the database already exists, whether it is one of Depot or Curia is measured automatically.
The function `gdbm_close' is used in order to close a database handle.
- void gdbm_close(GDBM_FILE dbf);
- `dbf' specifies a database handle. Because the region of the closed handle is released, it becomes impossible to use the handle.
The function `gdbm_store' is used in order to store a record.
- int gdbm_store(GDBM_FILE dbf, datum key, datum content, int flag);
- `dbf' specifies a database handle connected as a writer. `key' specifies a structure of a key. `content' specifies a structure of a value. `flag' specifies behavior when the key overlaps, by the following values: `GDBM_REPLACE', which means the specified value overwrites the existing one, `GDBM_INSERT', which means the existing value is kept. The return value is 0 if it is successful, 1 if it gives up because of overlaps of the key, -1 if other error occurs.
The function `gdbm_delete' is used in order to delete a record.
- int gdbm_delete(GDBM_FILE dbf, datum key);
- `dbf' specifies a database handle connected as a writer. `key' specifies a structure of a key. The return value is 0 if it is successful, -1 if some errors occur.
The function `gdbm_fetch' is used in order to retrieve a record.
- datum gdbm_fetch(GDBM_FILE dbf, datum key);
- `dbf' specifies a database handle. `key' specifies a structure of a key. The return value is a structure of the result. If a record corresponds, the member `dptr' of the structure is the pointer to the region of the value. If no record corresponds or some errors occur, `dptr' is `NULL'. Because the region pointed to by `dptr' is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `gdbm_exists' is used in order to check whether a record exists or not.
- int gdbm_exists(GDBM_FILE dbf, datum key);
- `dbf' specifies a database handle. `key' specifies a structure of a key. The return value is true if a record corresponds and no error occurs, or false, else, it is false.
The function `gdbm_firstkey' is used in order to get the first key of a database.
- datum gdbm_firstkey(GDBM_FILE dbf);
- `dbf' specifies a database handle. The return value is a structure of the result. If a record corresponds, the member `dptr' of the structure is the pointer to the region of the first key. If no record corresponds or some errors occur, `dptr' is `NULL'. Because the region pointed to by `dptr' is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `gdbm_nextkey' is used in order to get the next key of a database.
- datum gdbm_nextkey(GDBM_FILE dbf, datum key);
- `dbf' specifies a database handle. The return value is a structure of the result. If a record corresponds, the member `dptr' of the structure is the pointer to the region of the next key. If no record corresponds or some errors occur, `dptr' is `NULL'. Because the region pointed to by `dptr' is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `gdbm_sync' is used in order to synchronize updating contents with the file and the device.
- void gdbm_sync(GDBM_FILE dbf);
- `dbf' specifies a database handle connected as a writer.
The function `gdbm_reorganize' is used in order to reorganize a database.
- int gdbm_reorganize(GDBM_FILE dbf);
- `dbf' specifies a database handle connected as a writer. If successful, the return value is 0, else -1.
The function `gdbm_fdesc' is used in order to get the file descriptor of a database file.
- int gdbm_fdesc(GDBM_FILE dbf);
- `dbf' specifies a database handle connected as a writer. The return value is the file descriptor of the database file. If the database is a directory the return value is -1.
The function `gdbm_setopt' has no effect.
- int gdbm_setopt(GDBM_FILE dbf, int option, int *value, int size);
- `dbf' specifies a database handle. `option' is ignored. `size' is ignored. The return value is 0. The function is only for compatibility.
The following example stores and retrieves a phone number, using the name as the key.
#include <hovel.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <string.h>
#define NAME "mikio"
#define NUMBER "000-1234-5678"
#define DBNAME "book"
int main(int argc, char **argv){
GDBM_FILE dbf;
datum key, val;
int i;
/* open the database */
if(!(dbf = gdbm_open(DBNAME, 0, GDBM_WRCREAT, 00644, NULL))){
fprintf(stderr, "%s\n", gdbm_strerror(gdbm_errno));
return 1;
}
/* prepare the record */
key.dptr = NAME;
key.dsize = strlen(NAME);
val.dptr = NUMBER;
val.dsize = strlen(NUMBER);
/* store the record */
if(gdbm_store(dbf, key, val, GDBM_REPLACE) != 0){
fprintf(stderr, "%s\n", gdbm_strerror(gdbm_errno));
}
/* retrieve the record */
val = gdbm_fetch(dbf, key);
if(val.dptr){
printf("Name: %s\n", NAME);
printf("Number: ");
for(i = 0; i < val.dsize; i++){
putchar(val.dptr[i]);
}
putchar('\n');
free(val.dptr);
} else {
fprintf(stderr, "%s\n", gdbm_strerror(gdbm_errno));
}
/* close the database */
gdbm_close(dbf);
return 0;
}
How to build programs using Hovel is the same as the case of Depot. Note that an option to be given to a linker is not `-lgdbm', but `-lqdbm'.
Each functions of Hovel is not reentrant, and not thread-safe.
Hovel has the following command line interfaces.
The command `hvmgr' is a utility for debugging Hovel and its applications. It features editing and checking of a database. It can be used for database applications with shell scripts. This command is used in the following format. `name' specifies a database name. `key' specifies the key of a record. `val' specifies the value of a record.
- hvmgr create [-qdbm bnum dnum] name
- Create a database file.
- hvmgr store [-qdbm] [-kx] [-vx|-vf] [-insert] name key val
- Store a record with a key and a value.
- hvmgr delete [-qdbm] [-kx] name key
- Delete a record with a key.
- hvmgr fetch [-qdbm] [-kx] [-ox] [-n] name key
- Retrieve a record with a key and output to the standard output.
- hvmgr list [-qdbm] [-v] [-ox] name
- List all keys delimited with line-feed to the standard output.
- hvmgr optimize [-qdbm] name
- Optimize a database.
Options feature the following.
- -qdbm [bnum dnum] : use `gdbm_open2' to open the database. `bnum' specifies the number of the elements of the bucket array. `dnum' specifies the number of division of the database.
- -kx : treat `key' as a binary expression of hexadecimal notation.
- -vx : treat `val' as a binary expression of hexadecimal notation.
- -vf : read the value from a file specified with `val'.
- -insert : specify the storing mode for `GDBM_INSERT'.
- -ox : treat the output as a binary expression of hexadecimal notation.
- -n : do not output the trailing newline.
This command returns 0 on success, another on failure.
The command `hvtest' is a utility for facility test and performance test. Check a database generated by the command or measure the execution time of the command. This command is used in the following format. `name' specifies a database name. `rnum' specifies the number of records.
- hvtest write [-qdbm] name rnum
- Store records with keys of 8 bytes. They changes as `00000001', `00000002'...
- hvtest read [-qdbm] name rnum
- Retrieve records of the database above.
Options feature the following.
- -qdbm : use `gdbm_open2' and open the handle as Curia.
This command returns 0 on success, another on failure.
Cabin is the utility API which provides memory allocating functions, sorting functions, extensible datum, array list and hash map for handling records on memory.
In order to use Cabin, you should include `cabin.h' and `stdlib.h' in the source files. Usually, the following description will be near the beginning of a source file.
- #include <cabin.h>
- #include <stdlib.h>
A pointer to `CBDATUM' is used as a handle of an extensible datum. A datum handle is opened with the function `cbdatumopen' and closed with `cbdatumclose'. A pointer to `CBLIST' is used as a handle of an array list. A list handle is opened with the function `cblistopen' and closed with `cblistclose'. A pointer to `CBMAP' is used as a handle of a hash map. A map handle is opened with the function `cbmapopen' and closed with `cbmapclose'. You should not refer directly to any member of each handles.
The external variable `cbfatalfunc' is the pointer to call back function for handling a fatal error.
- extern void (*cbfatalfunc)(const char *);
- The argument specifies the error message. The initial value of this variable is `NULL'. If the value is `NULL', the default function is called when a fatal error occurs. A fatal error occures when memory allocation is failed.
The function `cbmalloc' is used in order to allocate a region on memory.
- void *cbmalloc(size_t size);
- `size' specifies the size of the region. The return value is the pointer to the allocated region. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `cbrealloc' is used in order to re-allocate a region on memory.
- void *cbrealloc(void *ptr, size_t size);
- `ptr' specifies the pointer to a region. `size' specifies the size of the region. The return value is the pointer to the re-allocated region. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `memdup' is used in order to duplicate a region on memory.
- char *cbmemdup(const char *ptr, int size);
- `ptr' specifies the pointer to a region. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use.
The function `cbisort' is used in order to sort an array using insert sort.
- void cbisort(void *base, int nmemb, int size, int(*compar)(const void *, const void *));
- `base' spacifies the pointer to an array. `nmemb' specifies the number of elements of the array. `size' specifies the size of each element. `compar' specifies the pointer to comparing function. The two arguments specify the pointers of elements. The comparing function should returns positive if the former is big, negative if the latter is big, 0 if both are equal. Insert sort is useful only if most elements have been sorted already.
The function `cbssort' is used in order to sort an array using shell sort.
- void cbssort(void *base, int nmemb, int size, int(*compar)(const void *, const void *));
- `base' spacifies the pointer to an array. `nmemb' specifies the number of elements of the array. `size' specifies the size of each element. `compar' specifies the pointer to comparing function. The two arguments specify the pointers of elements. The comparing function should returns positive if the former is big, negative if the latter is big, 0 if both are equal. If most elements have been sorted, shell sort may be faster than heap sort or quick sort.
The function `cbhsort' is used in order to sort an array using heap sort.
- void cbhsort(void *base, int nmemb, int size, int(*compar)(const void *, const void *));
- `base' spacifies the pointer to an array. `nmemb' specifies the number of elements of the array. `size' specifies the size of each element. `compar' specifies the pointer to comparing function. The two arguments specify the pointers of elements. The comparing function should returns positive if the former is big, negative if the latter is big, 0 if both are equal. Although heap sort is robust against bias of input, quick sort is faster in most cases.
The function `cbqsort' is used in order to sort an array using quick sort.
- void cbqsort(void *base, int nmemb, int size, int(*compar)(const void *, const void *));
- `base' spacifies the pointer to an array. `nmemb' specifies the number of elements of the array. `size' specifies the size of each element. `compar' specifies the pointer to comparing function. The two arguments specify the pointers of elements. The comparing function should returns positive if the former is big, negative if the latter is big, 0 if both are equal. Being sensitive to bias of input, quick sort is the fastest sorting algorithm.
The function `cbdatumopen' is used in order to get a datum handle.
- CBDATUM *cbdatumopen(const char *ptr, int size);
- `ptr' specifies the pointer to the region of the initial content. If it is `NULL', an empty datum is created. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'. The return value is a datum handle.
The function `cbdatumdup' is used in order to copy a datum.
- CBDATUM *cbdatumdup(const CBDATUM *datum);
- `datum' specifies a datum handle. The return value is a new datum handle.
The function `cbdatumclose' is used in order to free a datum handle.
- void cbdatumclose(CBDATUM *datum);
- `datum' specifies a datum handle. Because the region of a closed handle is released, it becomes impossible to use the handle.
The function `cbdatumcat' is used in order to concatenate a datum and a region.
- void cbdatumcat(CBDATUM *datum, const char *ptr, int size);
- `datum' specifies a datum handle. `ptr' specifies the pointer to the region to be appended. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'.
The function `cbdatumptr' is used in order to get the pointer of the region of a datum.
- const char *cbdatumptr(const CBDATUM *datum);
- `datum' specifies a datum handle. The return value is the pointer of the region of a datum. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string.
The function `cbdatumsize' is used in order to get the size of the region of a datum.
- int cbdatumsize(const CBDATUM *datum);
- `datum' specifies a datum handle. The return value is the size of the region of a datum.
The function `cbdatumsetsize' is used in order to change the size of the region of a datum.
- void cbdatumsetsize(CBDATUM *datum, int size);
- `datum' specifies a datum handle. `size' specifies the new size of the region. If the new size is bigger than the one of old, the surplus region is filled with zero codes.
The function `cblistopen' is used in order to get a list handle.
- CBLIST *cblistopen(void);
- The return value is a list handle.
The function `cblistdup' is used in order to copy a list.
- CBLIST *cblistdup(const CBLIST *list);
- `list' specifies a list handle. The return value is a new list handle.
The function `cbsplit' is used in order to make a list by splitting a serial datum.
- CBLIST *cbsplit(const char *ptr, int size, const char *delim);
- `ptr' specifies the pointer to the region of the source content. If it is `NULL', an empty datum is created. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'. `dalim' specifies a string containing delimiting characters. If it is `NULL', zero code is used as a delimiter. The return value is a list handle.
The function `cblistclose' is used in order to close a list handle.
- void cblistclose(CBLIST *list);
- `list' specifies a list handle. Because the region of a closed handle is released, it becomes impossible to use the handle.
The function `cblistnum' is used in order to get the number of elements of a list.
- int cblistnum(const CBLIST *list);
- `list' specifies a list handle. The return value is the number of elements of the list.
The function `cblistval' is used in order to get the pointer to the region of an element.
- const char *cblistval(const CBLIST *list, int index, int *sp);
- `list' specifies a list handle. `index' specifies the index of an element. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. The return value is the pointer to the region of the element. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. If `index' is equal to or more than the number of elements, the return value is `NULL'.
The function `cblistpush' is used in order to add an element at the end of a list.
- void cblistpush(CBLIST *list, const char *ptr, int size);
- `list' specifies a list handle. `ptr' specifies the pointer to the region of an element. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'.
The function `cblistpop' is used in order to remove an element of the end of a list.
- char *cblistpop(CBLIST *list, int *sp);
- `list' specifies a list handle. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. The return value is the pointer to the region of the value. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. If the list is empty, the return value is `NULL'.
The function `cblistunshift' is used in order to add an element at the top of a list.
- void cblistunshift(CBLIST *list, const char *ptr, int size);
- `list' specifies a list handle. `ptr' specifies the pointer to the region of an element. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'.
The function `cblistshift' is used in order to remove an element of the top of a list.
- char *cblistshift(CBLIST *list, int *sp);
- `list' specifies a list handle. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. The return value is the pointer to the region of the value. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. If the list is empty, the return value is `NULL'.
The function `cblistinsert' is used in orderto add an element at the specified location of a list.
- void cblistinsert(CBLIST *list, int index, const char *ptr, int size);
- `list' specifies a list handle. `index' specifies the index of an element. `ptr' specifies the pointer to the region of the element. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'.
The function `cblistremove' is used in order to remove an element at the specified location of a list.
- char *cblistremove(CBLIST *list, int index, int *sp);
- `list' specifies a list handle. `index' specifies the index of an element. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. The return value is the pointer to the region of the value. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. Because the region of the return value is allocated with the `malloc' call, it should be released with the `free' call if it is no longer in use. If `index' is equal to or more than the number of elements, no element is removed and the return value is `NULL'.
The function `cblistsort' is used in order to sort elements of a list in lexical order.
- void cblistsort(CBLIST *list);
- `list' specifies a list handle. Quick sort is used for sorting.
The function `cblistlsearch' is used in order to search a list for an element using liner search.
- int cblistlsearch(const CBLIST *list, const char *ptr, int size);
- `list' specifies a list handle. `ptr' specifies the pointer to the region of a key. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'. The return value is the index of a corresponding element or -1 if there is no corresponding element. If two or more elements corresponds, the former returns.
The function `cblistbsearch' is used in order to search a list for an element using binary search.
- int cblistbsearch(const CBLIST *list, const char *ptr, int size);
- `list' specifies a list handle. It should be sorted in lexical order. `ptr' specifies the pointer to the region of a key. `size' specifies the size of the region. If it is negative, the size is assigned with `strlen(ptr)'. The return value is the index of a corresponding element or -1 if there is no corresponding element. If two or more elements corresponds, which returnes is not defined.
The function `cbmapopen' is used in order to get a map handle.
- CBMAP *cbmapopen(int bnum);
- `bnum' specifies the number of elements of the bucket array. If it is not more than 0, the default value is specified. The return value is a map handle.
The function `cbmapdup' is used in order to copy a map.
- CBMAP *cbmapdup(CBMAP *map);
- `map' specifies a map handle. The return value is a new map handle. The iterator of the source map is initialized.
The function `cbmapclose' is used in order to close a map handle.
- void cbmapclose(CBMAP *map);
- `map' specifies a map handle. Because the region of a closed handle is released, it becomes impossible to use the handle.
The function `cbmapput' is used in order to store a record.
- int cbmapput(CBMAP *map, const char *kbuf, int ksiz, const char *vbuf, int vsiz, int over);
- `map' specifies a map handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `vbuf' specifies the pointer to the region of a value. `vsiz' specifies the size of the region of the value. If it is negative, the size is assigned with `strlen(vbuf)'. `over' specifies whether the value of the duplicated record is overwritten or not. If `over' is false and the key duplicated, the return value is false, else, it is true.
The function `cbmapout' is used in order to delete a record.
- int cbmapout(CBMAP *map, const char *kbuf, int ksiz);
- `map' specifies a map handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. If successful, the return value is true. False is returned when no record corresponds to the specified key.
The function `cbmapget' is used in order to retrieve a record.
- const char *cbmapget(const CBMAP *map, const char *kbuf, int ksiz, int *sp);
- `map' specifies a map handle. `kbuf' specifies the pointer to the region of a key. `ksiz' specifies the size of the region of the key. If it is negative, the size is assigned with `strlen(kbuf)'. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the value of the corresponding record. `NULL' is returned when no record corresponds. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string.
The function `cbmapiterinit' is used in order to initialize the iterator of a map handle.
- void cbmapiterinit(CBMAP *map);
- `map' specifies a map handle. The iterator is used in order to access the key of every record stored in a map.
The function `cbmapiternext' is used in order to get the next key of the iterator.
- const char *cbmapiternext(CBMAP *map, int *sp);
- `map' specifies a map handle. `sp' specifies a pointer to the variable to which the size of the region of the return value assigned. If it is `NULL', it is not used. If successful, the return value is the pointer to the region of the next key, else, it is `NULL'. `NULL' is returned when no record is to be get out of the iterator. Because an additional zero code is appended at the end of the region of the return value, the return value can be treated as a character string. The order of iteration is assured to be the same of the one of storing.
The function `cbmaprnum' is used in order to get the number of the records stored in a map.
- int cbmaprnum(const CBMAP *map);
- `map' specifies a map handle. The return value is the number of the records stored in the map.
How to build programs using Cabin is the same as the case of Depot.
Although each function of Cabin is not reentrant, it does not use any static object internally. So, it can be used as a thread-safe function if each calling is under exclusion control, on the assumption that `errno', `malloc' and so on are thread-safe.
Cabin has the following command line interfaces.
The command `cbtest' is a utility for facility test and performance test. Measure the execution time of the command. This command is used in the following format. `rnum' specifies the number of records.
- cbtest sort [-d] rnum
- Perform test of sorting algorithms.
- cbtest list [-d] rnum
- Perform writing test of list.
- cbtest map [-d] rnum
- Perform writing test of map.
- cbtest wicked rnum
- Perform updating operations of list and map selected at random.
Options feature the following.
- -d : read and show data of the result.
This command returns 0 on success, another on failure.
The contents of a database file managed by Depot can by devided roughly into the following three sections: the header section, the bucket section and the record section.
The header section places at the beginning of the file and its length is constant 48 bytes. The following information are stored in the header section.
- magic number: from offset 0, contains "[DEPOT]\n\f" for big endian or "[depot]\n\f" for little endian.
- byte order: at offset 12, the value is true if the file is for big endian.
- file size: from offset 16, type of `int'
- number of the bucket: from offset 24, type of `int'
- number of records: from offset 32, type of `int'
The bucket section places after the header section and its length is determined according to the number of the bucket. Each element of the bucket stores an offset of the root node of each separate chain.
The record section places after the bucket section and occupies to the end of the file. The element of the record section contains the following information.
- flag (for deleting): type of `int'
- second hash value: type of `int'
- size of the key: type of `int'
- size of the value: type of `int'
- size of the padding: type of `int'
- offset of the left child: type of `int'
- offset of the right child: type of `int'
- entity of the key: serial bytes with variable length
- entity of the value: serial bytes with variable length
- padding data: void serial bytes with variable length
Because the database file is not sparse, move, copy, unlink and ftp and so on with the file are possible. Because Depot reads and writes data without normalization of byte order, it is impossible to share the same file between the environment with different byte order.
When you distribute a database file of Depot via network, the MIME type suggested to be `application/x-qdbm'. Suffix of the file name is suggested to be `.qdb'. When you distribute a database directory of Curia, you may convert the directory tree to an archve of such type as TAR.
For the command `file' to recognize database files, append the following expressions into `magic' file.
0 string [DEPOT]\n\f database file of QDBM, big endian
>16 belong x \b, filesize: %d
>24 belong x \b, buckets: %d
>32 belong x \b, records: %d
0 string [depot]\n\f database file of QDBM, little endian
>16 lelong x \b, filesize: %d
>24 lelong x \b, buckets: %d
>32 lelong x \b, records: %d
Each document of QDBM should be calibrated by native English speakers.
There is no such bug which are found but not fixed, as crash by segmentation fault, unexpected data vanishing, memory leak and so on.
Database files are vulnerable for I/O error. QDBM has neither feature for rollback nor backup. Applications are responsible for handling such errors as disk-full and killing interruptive signals.
A database file can not be located on a file system which does not support file locking. Certain implementation of NFS has this problem.
If you find any bug, report it to the author, with the information of the version of QDBM, the operating system and the compiler.
- Q. : After all, how different from GDBM (NDBM, SDBM, Berkeley DB)?
- A. : Processing speed is higher, a database file is smaller, API is simpler. A highly important thing is that efficiency in time and space is very good when records are frequently overwritten, so, scalability in practical use is high. Moreover, even when constructing such a large database that the number of storing record is more than one million, processing speed does not slowdown deathly, filesize does not grow extremely.
- Q. : Are there good sample codes for applications?
- A. : Read `dptsv.c', `dptest.c' and `dpmgr.c' in the order. The package of QDBM contains them.
- Q. : How are the performance indexes?
- A. : The author measured the real time for storing and retrieving with the command `dptest' and `time'. The number of the elements of the bucket array is the twice of the number of the records. On a machine, with 2.53GHz Pentium 4, with 333MHz 1GB RAM, with Linux 2.4, it spends 5.0 seconds to store 1,000,000 records, it spends 4.5 seconds to retrieve all of the records, it spends 50.9 seconds to store 10,000,000 records, it spends 44.9 seconds to retrieve all of the records. On a machine, with 500MHz Pentium 3, with 133MHz 192MB RAM, with Linux 2.4, it spends 12.0 seconds to store 1,000,000 records, it spends 11.1 seconds to retrieve all of the records, it spends 495.0 seconds to store 10,000,000 records, it spends 414.8 seconds to retrieve all of the records.
- Q: How shuold I use alignment?
- A: If your application repeats writing with overwrite or concatenate mode. Alignment saves the rapid growth of the size of the database file. Because the best suited size of alignment of each application is defferent, you should leran it by experiment. For the meantime, about 32 is suitable.
- Q. : How should I tune the system for performance?
- A. : Install more RAM on your machine than the size of a database. Then, enlarge I/O buffer and cut down on flushing durty buffers. Filesystem is also important. On Linux, although EXT2 is usually fastest, EXT3 is faster in some cases. ReiserFS is okey. The other modes of EXT3 are very slow. About other filesystems, you should learn them by experiment.
- Q. : Can I build QDBM using CC instead of GCC?
- A. : Actually, yes. Try to build QDBM by `make unix' and install it by `make install-unix'.
- Q. : What does `QDBM' mean?
- A. : `QDBM' stands for `Quick DataBase Manager'. It means that processing speed is high, and that you can write applications quickly.
QDBM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.
QDBM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with QDBM (See the file `COPYING'); if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
QDBM was written by Mikio Hirabayashi. You can contact the author by e-mail to <mikio@users.sourceforge.net>. Any suggestion or bug report is welcome to the author.