Libraries Needed for Hadoop

Hadoop components need to have Hadoop libraries accessible from CloverETL. The libraries are needed by HadoopReader, HadoopWriter, ExecuteMapReduce, HDFS and Hive.

The Hadoop libraries are necessary to establish Hadoop connection, see Hadoop connection.

There are two officially supported versions of Hadoop: Cloudera 4 version 4.1.2 and Cloudera 5 version 5.6.0. Other versions close to this one might work, but we cannot guarantee that.

Cloudera 4
Cloudera 5

Cloudera 4

The below mentioned libraries are needed for connection to Cloudera 4.

Common libraries
  • hadoop-common-2.0.0-cdh4.1.2.jar

  • hadoop-auth-2.0.0-cdh4.1.2.jar

  • guava-11.0.2.jar

  • avro-1.7.1.cloudera.2.jar

  • commons-cli-1.2.jar

  • commons-configuration-1.6.jar

  • commons-lang-2.5.jar

HDFS
  • hadoop-hdfs-2.0.0-cdh4.1.2.jar

  • protobuf-java-2.4.0a.jar

MapReduce
  • aopalliance-1.0.jar

  • asm-3.2.jar

  • avro-1.7.1.cloudera.2.jar

  • commons-io-2.1.jar

  • guice-3.0.jar

  • guice-servlet-3.0.jar

  • hadoop-annotations-2.0.0-cdh4.1.2.jar

  • hadoop-mapreduce-client-app-2.0.0-cdh4.1.2.jar

  • hadoop-mapreduce-client-common-2.0.0-cdh4.1.2.jar

  • hadoop-mapreduce-client-core-2.0.0-cdh4.1.2.jar

  • hadoop-mapreduce-client-hs-2.0.0-cdh4.1.2.jar

  • hadoop-mapreduce-client-jobclient-2.0.0-cdh4.1.2.jar

  • hadoop-mapreduce-client-shuffle-2.0.0-cdh4.1.2.jar

  • jackson-core-asl-1.8.8.jar

  • jackson-mapper-asl-1.8.8.jar

  • javax.inject-1.jar

  • jersey-core-1.8.jar

  • jersey-guice-1.8.jar

  • jersey-server-1.8.jar

  • log4j-1.2.17.jar

  • netty-3.2.4.Final.jar

  • paranamer-2.3.jar

  • protobuf-java-2.4.0a.jar

  • snappy-java-1.0.4.1.jar

  • hadoop-yarn-common-2.0.0-cdh4.1.2.jar

  • hadoop-yarn-api-2.0.0-cdh4.1.2.jar

Hive
  • hive-jdbc-0.8.1.jar

  • hadoop-core-0.20.205.jar

  • hive-exec-0.8.1.jar

  • hive-metastore-0.8.1.jar

  • hive-service-0.8.1.jar

  • libfb303-0.7.0.jar

  • slf4j-api-1.6.1.jar

  • slf4j-log4j12-1.6.1.jar

Cloudera 5

The below mentioned libraries are needed for connection to Cloudera 5.

Common libraries
  • hadoop-common-2.6.0-cdh5.6.0.jar

  • hadoop-auth-2.6.0-cdh5.6.0.jar

  • guava-15.0.jar

  • avro-1.7.6-cdh5.6.0.jar

  • htrace-core4-4.0.1-incubating.jar

  • servlet-api-3.0.jar

HDFS
  • hadoop-hdfs-2.6.0-cdh5.6.0.jar

  • protobuf-java-2.5.0.jar

MapReduce
  • hadoop-annotations-2.6.0-cdh5.6.0.jar

  • hadoop-mapreduce-client-app-2.6.0-cdh5.6.0.jar

  • hadoop-mapreduce-client-common-2.6.0-cdh5.6.0.jar

  • hadoop-mapreduce-client-core-2.6.0-cdh5.6.0.jar

  • hadoop-mapreduce-client-hs-2.6.0-cdh5.6.0.jar

  • hadoop-mapreduce-client-jobclient-2.6.0-cdh5.6.0.jar

  • hadoop-mapreduce-client-shuffle-2.6.0-cdh5.6.0.jar

  • jackson-core-asl-1.9.2.jar

  • jackson-mapper-asl-1.9.12.jar

  • hadoop-yarn-api-2.6.0-cdh5.6.0.jar

  • hadoop-yarn-client-2.6.0-cdh5.6.0.jar

  • hadoop-yarn-common-2.6.0-cdh5.6.0.jar

Hive
  • hive-jdbc-1.1.0-cdh5.6.0.jar

  • hive-exec-1.1.0-cdh5.6.0.jar

  • hive-metastore-1.1.0-cdh5.6.0.jar

  • hive-service-1.1.0-cdh5.6.0.jar

  • libfb303-0.9.2.jar

  • slf4j-api-1.7.5.jar

  • slf4j-log4j12-1.7.5.jar

The libraries can be found in your CDH installation or in package downloaded from Cloudera.

CDH installation

Required libraries from CDH reside in the directories from following list.

  • /usr/lib/hadoop

  • /usr/lib/hadoop-hdfs

  • /usr/lib/hadoop-mapreduce

  • /usr/lib/hadoop-yarn

  • + 3rd party libraries are located in lib subdirectories

Package downloaded from Cloudera

The files can be found also in package downloaded from Cloudera on following locations.

  • share/hadoop/common

  • share/hadoop/hdfs

  • share/hadoop/mapreduce2

  • share/hadoop/yarn

  • + lib subdirectories