One-time HDFS Protocol Installation
- Install Java 1.6 or later on all Greenplum Database hosts: master, segment, and standby master.
- Install a supported Hadoop distribution on all hosts. The distribution
must be the same on all hosts. For Hadoop installation information, see the Hadoop
distribution documentation.Greenplum Database supports the following Hadoop distributions:
Table 1. Hadoop Distributions Hadoop Distribution Version gp_hadoop_ target_version Pivotal HD Pivotal HD 3.0 gphd-3.0 Pivotal HD 2.0, 2.1 Pivotal HD 1.01
gphd-2.0 Greenplum HD Greenplum HD 1.2 gphd-1.2 Greenplum HD 1.1 gphd-1.1 (default) Cloudera CDH 5.2, 5.3 cdh5 CDH 5.0, 5.1 cdh4.1 CDH 4.12 - CDH 4.7 cdh4.1 Hortonworks Data Platform HDP 2.1, 2.2 hdp2 MapR3 MapR 4.x gpmr-1.2 MapR 1.x, 2.x, 3.x gpmr-1.0 Apache Hadoop 2.x hadoop2 Note:For the latest information regarding supported Hadoop distributions, see the Greenplum Database Release Notes for your release.1. Pivotal HD 1.0 is a distribution of Hadoop 2.0.
2. For CDH 4.1, only CDH4 with MRv1 is supported.3. MapR requires the MapR client software.
- After installation, ensure that the Greenplum system user (gpadmin) has read and execute access to the Hadoop libraries or to the Greenplum MR client.
- Set the following environment variables on all segments:
- JAVA_HOME – the Java home directory
- HADOOP_HOME – the Hadoop home directory
export JAVA_HOME=/usr/java/default export HADOOP_HOME=/usr/lib/gphd
The variables must be set in the ~gpadmin/.bashrc or the ~gpadmin/.bash_profile file so that the gpadmin user shell environment can locate the Java home and Hadoop home.
- Set the following Greenplum Database server configuration parameters
and restart Greenplum Database.
Table 2. Server Configuration Parameters for Hadoop Targets Configuration Parameter Description Default Value Set Classifications gp_hadoop_target_version The Hadoop target. Choose one of the following. gphd-1.0
gphd-1.1
gphd-1.2
gphd-2.0
gpmr-1.0
gpmr-1.2
hdp2
cdh3u2
cdh4.1
gphd-1.1 master session
reloadgp_hadoop_home When using Pivotal HD, specify the installation directory for Hadoop. For example, the default installation directory is /usr/lib/gphd. When using Greenplum HD 1.2 or earlier, specify the same value as the HADOOP_HOME environment variable.
NULL master session
reload
gpconfig -c gp_hadoop_target_version -v "'gphd-2.0'" gpconfig -c gp_hadoop_home -v "'/usr/lib/gphd'" gpstop -u
For information about the Greenplum Database utilities gpconfig and gpstop, see the Greenplum Database Utility Guide.