One-time HDFS Protocol Installation
- Install Java 1.6 or later on all Greenplum Database hosts: master, segment, and standby master.
- Install a supported Hadoop distribution on all hosts. The distribution
must be the same on all hosts. For Hadoop installation information, see the Hadoop
distribution documentation.Greenplum Database supports the following Hadoop distributions:
Table 1. Hadoop Distributions Hadoop Distribution Version gp_hadoop_ target_version Pivotal HD Pivotal HD 3.0 gphd-3.0 Pivotal HD 2.0, 2.1 Pivotal HD 1.01
gphd-2.0 Greenplum HD Greenplum HD 1.2 gphd-1.2 Greenplum HD 1.1 gphd-1.1 (default) Cloudera CDH 5.2, 5.3 cdh5 CDH 5.0, 5.1 cdh4.1 CDH 4.12 - CDH 4.7 cdh4.1 Hortonworks Data Platform HDP 2.1, 2.2 hdp2 MapR3 MapR 4.x gpmr-1.2 MapR 1.x, 2.x, 3.x gpmr-1.0 Apache Hadoop 2.x hadoop2 Note:For the latest information regarding supported Hadoop distributions, see the Greenplum Database Release Notes for your release.1. Pivotal HD 1.0 is a distribution of Hadoop 2.0.
2. For CDH 4.1, only CDH4 with MRv1 is supported.3. MapR requires the MapR client software.
- After installation, ensure that the Greenplum system user (gpadmin) has read and execute access to the Hadoop libraries or to the Greenplum MR client.
- Set the following environment variables on all segments:
- JAVA_HOME – the Java home directory
- HADOOP_HOME – the Hadoop home directory
export JAVA_HOME=/usr/java/default export HADOOP_HOME=/usr/lib/gphd
The variables must be set in the ~gpadmin/.bashrc or the ~gpadmin/.bash_profile file so that the gpadmin user shell environment can locate the Java home and Hadoop home.
- Set the following Greenplum Database server configuration parameters
and restart Greenplum Database.
For example, the following commands use the Greenplum Database utilities gpconfig and gpstop to set the server configuration parameters and restart Greenplum Database:
Table 2. Server Configuration Parameters for Hadoop Targets Configuration Parameter Description Default Value Set Classifications gp_hadoop_target_version The Hadoop target. Choose one of the following. gphd-1.0
gphd-1.1
gphd-1.2
gphd-2.0
gpmr-1.0
gpmr-1.2
hdp2
cdh3u2
cdh4.1
gphd-1.1 master session
reloadgp_hadoop_home When using Pivotal HD, specify the installation directory for Hadoop. For example, the default installation directory is /usr/lib/gphd. When using Greenplum HD 1.2 or earlier, specify the same value as the HADOOP_HOME environment variable.
NULL master session
reload
gpconfig -c gp_hadoop_target_version -v "'gphd-2.0'" gpconfig -c gp_hadoop_home -v "'/usr/lib/gphd'" gpstop -u
For information about the Greenplum Database utilities gpconfig and gpstop, see the Greenplum Database Utility Guide.