This lookup table is commercial and can only be used with the commercial license of CloverETL Designer.
All data records stored in this lookup table are kept in memory. For this reason, to store all data records from the lookup table, sufficient memory must be available. If data records are loaded to aspell lookup table from a data file, the size of available memory should be approximately at least 7 times bigger than that of the data file. However, this multiplier is different for different types of data records stored in the data file.
If you are working with data records that are similar but not fully identical, you should use this type of lookup table. For example, you can use Aspell lookup table for addresses.
Aspell lookup table allows you to have multiple records with the same key value.
In the Aspell lookup table wizard, you set up the required properties. You must give a Name to the lookup table, select the corresponding Metadata, select the Lookup key field that should be used to look up data records from the table (must be of string data type).
You can also specify the Data file URL
where the data records of the lookup table will be stored
and the charset of data file (Data file charset).
The default charset is UTF-8
.
You can set the threshold that should be used by the lookup table (Spelling threshold).
It must be higher than 0.
The higher the threshold, the more tolerant is the component to spelling errors.
Its default value is 230
.
It is the edit_distance
value from the query to the results.
Words with this value higher that the specified limit are not included in the results.
You can also change the default costs of individual operations (Edit costs):
Case cost
Used when the case of one character is changed.
Transpose cost
Used when one character is transposed with another in the string.
Delete cost
Used when one character is deleted from the string.
Insert cost
Used when one character is inserted to the string.
Replace cost
Used when one character is replaced by another one.
You need to decide whether the letters with diacritical marks
are considered identical with those without these marks.
To do that, you need to set the value of Remove diacritical marks attribute.
If you want diacritical marks to be removed before computing the edit_distance
value,
you need to set this value to true
.
This way, letters with diacritical marks are considered equal to their Latin equivalents.
(Default value is false
.
By default, letters with diacritical marks are considered different from those without.)
If you want best guesses to be included in the results,
set the Include best guesses to true
.
Default value is false
.
Best guesses are the words whose edit_distance
value is higher than the Spelling threshold,
for which there is no other better counterpart.
At the end, you only need to click
and then .Figure 34.15. Aspell Lookup Table Wizard
Important | |
---|---|
If you want to know what is the distance between lookup table and edge values,
you must add another field of numeric type to lookup table metadata.
Set this field to Autofilling ( Select this field in the Edit distance field combo. When you are using Aspell lookup table in LookupJoin, you can map this lookup table field to corresponding field on the output port 0. This way, values that will be stored in the specified Edit distance field of lookup table will be sent to the output to another specified field. |