CloverDataWriter

Not available in Community Designer

Short Description
Ports
Metadata
CloverDataWriter Attributes
Details
Examples
Compatibility
See also

Short Description

CloverDataWriter writes data to files in our internal binary Clover data format.

ComponentData outputInput portsOutput portsTransformationTransf. requiredJavaCTLAuto-propagated metadata
CloverDataWriterClover binary file10-1
no
no
no
no
no

Icon

Ports

Port typeNumberRequiredDescriptionMetadata
Input0
yes
For received data recordsAny
Input0
no
For port writing. See Writing to Output Port. byte or cbyte

Metadata

CloverDataWriter does not propagate metadata.

CloverDataWriter has no metadata template.

Input metadata can have any metadata type.

Output metadata of CloverDataWriter has one field. The field has datatype byte or cbyte.

CloverDataWriter Attributes

AttributeReqDescriptionPossible values
Basic
File URLyesAttribute specifying where received data will be written (Clover data file, dictionary). See Supported File URL Formats for Writers. 
Append By default, new records overwrite the older ones. If set to true, new records are appended to the older records stored in the output file(s).false (default) | true
Advanced
Create directories By default, non-existing directories are not created. If set to true, they are created.false (default) | true
Compress level Sets the compression level (0 - no compression, 1 - fastest compression, 9 - best compression).1 (default) | 0-9
Number of skipped records Number of records to be skipped. See Selecting Output Records.0-N
Max number of records Maximum number of records to be written to the output file. See Selecting Output Records.0-N
Records per file Limits the number of records written to one file.0-N
Exclude fields Sequence of field names separated by semicolon that will not be written to the output.any field(s), e.g. field1;field3
Partition key Sequence of field names separated by semicolon defining the records distribution into different output files. Records with the same Partition key are written to the same output file. According to the selected Partition file tag use the proper placeholder ($ or #) in the file name mask, see Partitioning Output into Different Output Files. Field(s) to be used in partitioning to several output files.any field(s), e.g. field1;field3
Partition lookup table ID of lookup table serving for selecting records that should be written to output file(s). See Partitioning Output into Different Output Files for more information.e.g. MyLookupTable001
Partition file tag By default, output files are numbered. If it is set to Key file tag, output files are named according to the values of Partition key or Partition output fields. See Partitioning Output into Different Output Files for more information.Number file tag (default) | Key file tag
Partition output fields Fields of Partition lookup table whose values serve to name output file(s). See Partitioning Output into Different Output Files for more information. 
Partition unassigned file name Name of the file into which the unassigned records should be written if there are any. If not specified, data records whose key values are not contained in Partition lookup table are discarded. See Partitioning Output into Different Output Files for more information. 
Sorted input In case of partitioning into multiple output files is turned on, all output files are opened at once. Which could lead to undesirable memory footprint for many output files (thousands). Moreover, for example unix-based OS usually have very strict limitation of number of simultaneously opened files (1024) per process. In case you run into one of these limitations, consider sorting the data according to partition key using one of our standard sorting components and set this attribute to true. The partitioning algorithm does not need to keep all output files opened, just the last one is opened at one time. See Partitioning Output into Different Output Files for more information.false (default) | true
Create empty files If set to false, prevents the component from creating empty output file when there are no input records.true (default) | false
Deprecated
Save metadata This attribute is ignored since CloverETL version 4.0. false (default) | true
Save index This attribute is ignored since CloverETL version 4.0. false (default) | true

Details

CloverDataWriter internally uses compression by default. Additional zipping is redundant. See the Compress level attribute.

CloverDataWriter can write maps and lists.

With this component, you can write data in this internal format that allows fast access to data. CloverDataWriter is faster than FlatFileWriter.

Examples

Writing to Clover File
Appending to Existing File
Writing to non-existing Directories
Skipping Leading Records
Writing at most N records per file
Omitting uninteresting fields
Parting records into several files according to input field
Parting records into several files according to input field using lookup table

Writing to Clover File

Write records to Clover file.

Solution

Set up the File URL attribute.

AttributeValue
File URL${DATAOUT_DIR}/my-clover-file.cdf

If the file exists, the data in the file is overwritten.

Appending to Existing File

Append records of each graph run to an existing file my-clover-file.cdf.

Solution

Set up File URL and Append attributes.

AttributeValue
File URL${DATAOUT_DIR}/my-clover-file.cdf
Appendtrue

Writing to non-existing Directories

Write data to file my-clover-file.cdf in the directory cdrw. The directory may not exist.

Solution

Use attributes File URL and Create directories.

AttributeValue
File URL${DATAOUT_DIR}/cdrw/my-clover-file.cdf
Create directoriestrue

Skipping Leading Records

The first 10 records should be omitted. Write the rest of the records.

Solution

Use attributes File URL and Number of skipped records.

AttributeValue
File URL${DATAOUT_DIR}/my-clover-file.cdf
Number of skipped records10

Writing at most N records per file

Write at most 100 records.

Solution

Use attributes File URL and Max number of records.

AttributeValue
File URL${DATAOUT_DIR}/my-clover-file.cdf
Max number of records100

Omitting uninteresting fields

Metadata on the input edge of CloverDataWriter has fields ID, Firstname,Surname and Salary. Save list containing Firstname and Surname to Clover data file employees.cdf.

Solution

Use attributes File URL and Exclude fields.

AttributeValue
File URL${DATAOUT_DIR}/employees.cdf
Exclude fieldsID;Salary

Parting records into several files according to input field

List of students contains fields Firstname, Lastname and Mark. Categorize records into several files according to the mark. The created files will have names: students_A.cdf, ... students_F.cdf.

Solution

Use attributes File URL, Partition key and Partition file tag.

AttributeValue
File URL${DATAOUT_DIR}/students_#.cdf
Partition keyMark
Partition file tagKey file tag

Note: Records with students without mark will be saved into file students_.cdf.

Parting records into several files according to input field using lookup table

The input data contain number of active customers for particular countries. The countries are of different regions. Categorize records into the files according to the region.

CZ|105
UK|651
US|827
...

The input metadata contain fields CountryCode and Customers but nothing in the record denotes the region directly. You have list of country codes with corresponding region to be used for partitioning.

CZ|Europe
UK|Europe
US|America
...

Some country codes may not be present in the list, store records with country codes not present in the list into separate file region_missing.cdf.

Solution

Use the attributes File URL, Partition key, Partition lookup table, Partition file tag, Partition output fields, Partition unassigned file name. You need a lookup table CountryCodeRegion too.

AttributeValue
File URL${DATAOUT_DIR}/region_#.cdf
Partition keyCountryCode
Partition lookup tableCountryCodeRegion
Partition file tagKey file tag
Partition output fieldsContinent
Partition unassigned file namemissing

The files region_Europe.cdf, region_America.cdf, ... and region_missing.cdf will be created.

Compatibility

2.9

Since 2.9 version of CloverETL CloverDataWriter writes also a header to output files with version number. For this reason, CloverDataReader expects that files in Clover binary format contain such a header with the version number. CloverDataReader 2.9 cannot read files written by older versions of CloverETL nor these older versions can read data written by CloverDataWriter 2.9.

4.0

The internal structure of zip archive has changed, graphs relying on the structure will stop working. Graphs using a plain file URL without any internal entry specification are not affected.

zip:(${DATAIN_DIR}/customers.zip) - will work
zip:(${DATAIN_DIR}/customers.zip)#DATA/customers - won't work

As Clover format can use compression internally, addition of next compression level is redundant.

Values of parameters Save metadata and Save index are not used since CloverETL 4.0.

4.4.0-M2

Since 4.4.0-M2, CloverDataWriter can write to output port just to byte or cbyte field.

See also

CloverDataReader
Common Properties of Components
Specific Attribute Types
Common Properties of Writers
Writers Comparison
Partitioning Output into Different Output Files