Short Description |
Ports |
Metadata |
HadoopWriter Attributes |
Details |
Troubleshooting |
See also |
HadoopWriter writes data into Hadoop sequence files.
Component | Data output | Input ports | Output ports | Transformation | Transf. required | Java | CTL | Auto-propagated metadata |
---|---|---|---|---|---|---|---|---|
HadoopWriter | Hadoop sequence file | 1 | 0 |
Port type | Number | Required | Description | Metadata |
---|---|---|---|---|
Input | 0 | For input data records | Any |
HadoopWriter does not propagate metadata.
HadoopWriter has no metadata template.
Attribute | Req | Description | Possible values |
---|---|---|---|
Basic | |||
Hadoop connection |
Hadoop connection with Hadoop libraries containing Hadoop sequence file writer implementation.
If Hadoop connection ID is specified in a hdfs:// URL in the File URL attribute, value
of this attribute is ignored.
| Hadoop connection ID | |
File URL |
URL to a output file on HDFS or local file system.
URLs without protocol (i.e. absolute or relative path actually)
or with the
If the output file should be located on the HDFS, use URL in form of | ||
Key field | Name of an input record field carrying key for each written key-value pair. | ||
Value field | Name of an input record field carrying value for each written key-value pair. | ||
Advanced | |||
Create empty files | If set to false ,
prevents the component from creating empty output file
when there are no input records. | true (default) | false |
HadoopWriter writes data into special Hadoop sequence file
(org.apache.hadoop.io.SequenceFile
).
These files contain key-value pairs and are used in MapReduce jobs as input/output file formats.
The component can write single file as well as partitioned file which have to be located on HDFS
or local file system.
Exact version of file format created by the HadoopWriter component depends on Hadoop libraries which you supply in Hadoop connection referenced from the File URL attribute. In general, sequence files created by one version of Hadoop may not be readable by different version.
If writing to local file system, additional .crc
files
are created if Hadoop connection with default settings is used.
That is because, by default, Hadoop interacts with local file system
using org.apache.hadoop.fs.LocalFileSystem
which creates checksum files for each written file. When reading
such files, checksum is verified.
You can disable checksum creation/verification by adding this key-value
pair in the Hadoop Parameters of the
Hadoop connection:
fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem
For technical details about Hadoop sequence files, have a look at Apache Hadoop Wiki.
Currently, writing compressed data is not supported.
HadoopWriter cannot write lists and maps.
If you write data to sequence file on local file system, you can encounter the following error message in error log:
Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specifiedor
Cannot run program "cygpath": CreateProcess error=2, The system cannot find the file specified
To solve this problem, disable checksum creation/verification using
fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem
Hadoop parameter in Hadoop connection configuration.
This issue is related to non-POSIX operating systems (MS Windows).