Hadoop components need to have Hadoop libraries accessible from CloverETL. The libraries are needed by HadoopReader, HadoopWriter, ExecuteMapReduce, HDFS and Hive.
The Hadoop libraries are necessary to establish Hadoop connection, see Hadoop connection.
There are two officially supported versions of Hadoop:
Cloudera 4
version 4.1.2
and Cloudera 5
version 5.6.0.
Other versions close to this one might work, but we cannot guarantee that.
Cloudera 4 |
Cloudera 5 |
The below mentioned libraries are needed for connection to Cloudera 4.
hadoop-common-2.0.0-cdh4.1.2.jar
hadoop-auth-2.0.0-cdh4.1.2.jar
guava-11.0.2.jar
avro-1.7.1.cloudera.2.jar
commons-cli-1.2.jar
commons-configuration-1.6.jar
commons-lang-2.5.jar
hadoop-hdfs-2.0.0-cdh4.1.2.jar
protobuf-java-2.4.0a.jar
aopalliance-1.0.jar
asm-3.2.jar
avro-1.7.1.cloudera.2.jar
commons-io-2.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-app-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-common-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-core-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-hs-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-jobclient-2.0.0-cdh4.1.2.jar
hadoop-mapreduce-client-shuffle-2.0.0-cdh4.1.2.jar
jackson-core-asl-1.8.8.jar
jackson-mapper-asl-1.8.8.jar
javax.inject-1.jar
jersey-core-1.8.jar
jersey-guice-1.8.jar
jersey-server-1.8.jar
log4j-1.2.17.jar
netty-3.2.4.Final.jar
paranamer-2.3.jar
protobuf-java-2.4.0a.jar
snappy-java-1.0.4.1.jar
hadoop-yarn-common-2.0.0-cdh4.1.2.jar
hadoop-yarn-api-2.0.0-cdh4.1.2.jar
hive-jdbc-0.8.1.jar
hadoop-core-0.20.205.jar
hive-exec-0.8.1.jar
hive-metastore-0.8.1.jar
hive-service-0.8.1.jar
libfb303-0.7.0.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar
The below mentioned libraries are needed for connection to Cloudera 5.
hadoop-common-2.6.0-cdh5.6.0.jar
hadoop-auth-2.6.0-cdh5.6.0.jar
guava-15.0.jar
avro-1.7.6-cdh5.6.0.jar
htrace-core4-4.0.1-incubating.jar
servlet-api-3.0.jar
hadoop-hdfs-2.6.0-cdh5.6.0.jar
protobuf-java-2.5.0.jar
hadoop-annotations-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-app-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-common-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-core-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-hs-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-jobclient-2.6.0-cdh5.6.0.jar
hadoop-mapreduce-client-shuffle-2.6.0-cdh5.6.0.jar
jackson-core-asl-1.9.2.jar
jackson-mapper-asl-1.9.12.jar
hadoop-yarn-api-2.6.0-cdh5.6.0.jar
hadoop-yarn-client-2.6.0-cdh5.6.0.jar
hadoop-yarn-common-2.6.0-cdh5.6.0.jar
hive-jdbc-1.1.0-cdh5.6.0.jar
hive-exec-1.1.0-cdh5.6.0.jar
hive-metastore-1.1.0-cdh5.6.0.jar
hive-service-1.1.0-cdh5.6.0.jar
libfb303-0.9.2.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
The libraries can be found in your CDH installation or in package downloaded from Cloudera.
Required libraries from CDH reside in the directories from following list.
/usr/lib/hadoop
/usr/lib/hadoop-hdfs
/usr/lib/hadoop-mapreduce
/usr/lib/hadoop-yarn
+ 3rd party libraries are located in lib subdirectories
The files can be found also in package downloaded from Cloudera on following locations.
share/hadoop/common
share/hadoop/hdfs
share/hadoop/mapreduce2
share/hadoop/yarn
+ lib subdirectories