Persistent Lookup Table

Not available in Community Designer

This lookup table is commercial and can only be used with the commercial license of CloverETL Designer.

This type of lookup table serves a great number of data records. The data records are stored in a files; only a few records are cached in main memory. These files are in jdbm format ( http://jdbm.sourceforge.net). When you specify file name, two files will be created: with db and lg extensions.

Persistent lookup table can work in two modes: with key duplicates and without key duplicate. If you switch between the modes, you should delete and refill the lookup table.

Without key duplicates

With Allow key duplicates property unchecked, the persistent lookup table does not allow storing multiple records with the same key value. You can choose whether to store the first one or the last one with help of Replace checkbox.

This is the default option.

With key duplicates

With Allow key duplicates property enabled, you can store multiple records with the same key to the table. The Replace property is not used. Key duplicates in persistent lookup table are available since 4.3.0.

Persistent lookup table internally uses B+Tree to store the records. If node is mentined here, it is the node of the B+Tree.

Creating Persistent Lookup Table

In the first step of wizard, choose Persistent lookup.

In the second step of wizard, set up the requied properties: give a Name to the lookup table, select the corresponding Metadata, specify the File where the data records of the lookup table will be stored and the Key that should be used to look up data records from the table.

Advanced Properties

To overwrite old records by newer ones, check the Replace checkbox. With the checkbox checked, the latest record with the same key is stored. Otherwise the first record with the same key would be stored.

You can disable transactions with Disable transactions. Disabling transactions increases graph performance, however, it can cause data loss if manipulation with the table is interupted.

Commit interval defines the number of records that are committed at once. When the limit or end of phase is reached, the records are committed to the lookup table.

By specifying Page size, you are defining the number of entries (records) per node of B+Tree (in the implementation).

Cache size specifies the maximum number of nodes (of B+Ttree) in cache.

Allow key duplicates allows storing multiple records with the same key value.

[Important]Important

Replace checkbox is ignored in lookup tables with key duplicates.

At the end, you only need to click OK and then Finish.

Persistent Lookup Table Wizard

Figure 34.14. Persistent Lookup Table Wizard


Persistent Lookup Table Configuration Tweaks

Performance of persistent lookup table can be affected by the advanced parameters. These parameters configure the internal B+Tree implementation and size of caches.

To speed up reading, increase cache size.

To speed up writing, increase commit interval.

Compatibility

Since 4.3.0, you can use Allow key duplicates to allow storing duplicated key values into the table.