HadoopWriter

Home \| Table of Contents	HadoopWriter	CloverETL 4.7.0
Prev	Writers	Next

Short Description

HadoopWriter writes data into Hadoop sequence files.

Component	Data output	Input ports	Output ports	Transformation	Transf. required	Java	CTL	Auto-propagated metadata
HadoopWriter	Hadoop sequence file	1	0

Icon

Ports

Port type	Number	Required	Description	Metadata
Input	0		For input data records	Any

Metadata

HadoopWriter does not propagate metadata.

HadoopWriter has no metadata template.

HadoopWriter Attributes

Attribute	Description	Possible values
Basic
Hadoop connection	Hadoop connection with Hadoop libraries containing Hadoop sequence file writer implementation. If Hadoop connection ID is specified in a `hdfs://` URL in the File URL attribute, value of this attribute is ignored.	Hadoop connection ID
File URL	URL to a output file on HDFS or local file system. URLs without protocol (i.e. absolute or relative path actually) or with the `file://` protocol are considered to be located on the local file system. If the output file should be located on the HDFS, use URL in form of `hdfs://ConnID/path/to/file`, where `ConnID` is ID of a Hadoop connection (Hadoop connection component attribute will be ignored), and `/path/to/myfile` is absolute path on corresponding HDFS to file with name `myfile`.
Key field	Name of an input record field carrying key for each written key-value pair.
Value field	Name of an input record field carrying value for each written key-value pair.
Advanced
Create empty files	If set to `false`, prevents the component from creating empty output file when there are no input records.	true (default) \| false

Details

HadoopWriter writes data into special Hadoop sequence file (org.apache.hadoop.io.SequenceFile). These files contain key-value pairs and are used in MapReduce jobs as input/output file formats. The component can write single file as well as partitioned file which have to be located on HDFS or local file system.

Exact version of file format created by the HadoopWriter component depends on Hadoop libraries which you supply in Hadoop connection referenced from the File URL attribute. In general, sequence files created by one version of Hadoop may not be readable by different version.

If writing to local file system, additional .crc files are created if Hadoop connection with default settings is used. That is because, by default, Hadoop interacts with local file system using org.apache.hadoop.fs.LocalFileSystem which creates checksum files for each written file. When reading such files, checksum is verified. You can disable checksum creation/verification by adding this key-value pair in the Hadoop Parameters of the Hadoop connection: fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem

For technical details about Hadoop sequence files, have a look at Apache Hadoop Wiki.

Notes and Limitations

Currently, writing compressed data is not supported.

HadoopWriter cannot write lists and maps.

Troubleshooting

If you write data to sequence file on local file system, you can encounter the following error message in error log:

Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified

or

Cannot run program "cygpath": CreateProcess error=2, The system cannot find the file specified

To solve this problem, disable checksum creation/verification using fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem Hadoop parameter in Hadoop connection configuration.

This issue is related to non-POSIX operating systems (MS Windows).

Prev	Up	Next
FlatFileWriter	Home \| Table of Contents	InfobrightDataWriter