Hortonworks Data Platform

Non-Ambari Cluster Installation Guide

2015-12-21


Contents

1. Getting Ready to Install
1. Meet Minimum System Requirements
1.1. Hardware Recommendations
1.2. Operating System Requirements
1.3. Software Requirements
1.4. JDK Requirements
1.5. Metastore Database Requirements
2. Virtualization and Cloud Platforms
3. Configure the Remote Repositories
4. Decide on Deployment Type
5. Collect Information
6. Prepare the Environment
6.1. Enable NTP on the Cluster
6.2. Disable SELinux
6.3. Disable IPTables
7. Download Companion Files
8. Define Environment Parameters
9. [Optional] Create System Users and Groups
10. Determine HDP Memory Configuration Settings
10.1. Running the YARN Utility Script
10.2. Manually Calculating YARN and MapReduce Memory Configuration Settings
11. Configuring NameNode Heap Size
12. Allocate Adequate Log Space for HDP
13. Download the HDP Maven Artifacts
2. Installing HDFS, YARN, and MapReduce
1. Set Default File and Directory Permissions
2. Install the Hadoop Packages
3. Install Compression Libraries
3.1. Install Snappy
3.2. Install LZO
4. Create Directories
4.1. Create the NameNode Directories
4.2. Create the SecondaryNameNode Directories
4.3. Create DataNode and YARN NodeManager Local Directories
4.4. Create the Log and PID Directories
4.5. Symlink Directories with hdp-select
3. Installing Apache ZooKeeper
1. Install the ZooKeeper Package
2. Securing Zookeeper with Kerberos (optional)
3. Securing ZooKeeper Access
3.1. ZooKeeper Configuration
3.2. YARN Configuration
3.3. HDFS Configuration
4. Set Directories and Permissions
5. Set Up the Configuration Files
6. Start ZooKeeper
4. Setting Up the Hadoop Configuration
5. Validating the Core Hadoop Installation
1. Format and Start HDFS
2. Smoke Test HDFS
3. Configure YARN and MapReduce
4. Start YARN
5. Start MapReduce JobHistory Server
6. Smoke Test MapReduce
6. Installing Apache HBase
1. Install the HBase Package
2. Set Directories and Permissions
3. Set Up the Configuration Files
4. Validate the Installation
5. Starting the HBase Thrift and REST APIs
7. Installing Apache Phoenix
1. Installing the Phoenix Package
2. Configuring HBase for Phoenix
3. Configuring Phoenix to Run in a Secure Cluster
4. Validating the Phoenix Installation
5. Best Practices for Setting Client-side Timeouts
6. Troubleshooting Phoenix
8. Installing and Configuring Apache Tez
1. Prerequisites
2. Installing the Tez Package
3. Configuring Tez
4. Creating a New Tez View Instance
5. Validating the Tez Installation
6. Troubleshooting
9. Installing Apache Hive and Apache HCatalog
1. Installing the Hive-HCatalog Package
2. Setting Directories and Permissions
3. Setting Up the Hive/HCatalog Configuration Files
3.1. HDP-Utility script
3.2. Configure Hive and HiveServer2 for Tez
4. Setting Up the Database for the Hive Metastore
5. Setting up RDBMS for use with Hive Metastore
6. Creating Directories on HDFS
7. Enabling Tez for Hive Queries
8. Disabling Tez for Hive Queries
9. Configuring Tez with the Capacity Scheduler
10. Validating Hive-on-Tez Installation
10. Installing Apache Pig
1. Install the Pig Package
2. Validate the Installation
11. Installing Apache WebHCat
1. Install the WebHCat Package
2. Upload the Pig, Hive and Sqoop tarballs to HDFS
3. Set Directories and Permissions
4. Modify WebHCat Configuration Files
5. Set Up HDFS User and Prepare WebHCat Directories
6. Validate the Installation
12. Installing Apache Oozie
1. Install the Oozie Package
2. Set Directories and Permissions
3. Set Up the Oozie Configuration Files
3.1. For Derby
3.2. For MySQL
3.3. For PostgreSQL
3.4. For Oracle
4. Configure Your Database for Oozie
5. Setting up the Sharelib
6. Validate the Installation
13. Installing Apache Ranger
1. Installation Prerequisites
2. Installing Policy Manager
2.1. Install the Ranger Policy Manager
2.2. Install the Ranger Policy Administration Service
2.3. Start the Ranger Policy Administration Service
2.4. Configuring the Ranger Policy Administration Authentication Mode
2.5. Configuring Ranger Policy Administration High Availability
3. Installing UserSync
3.1. Using the LDAP Connection Check Tool
3.2. Install UserSync and Start the Service
4. Installing Ranger Plug-ins
4.1. Installing the Ranger HDFS Plug-in
4.2. Installing the Ranger YARN Plug-in
4.3. Installing the Ranger Kafka Plug-in
4.4. Installing the Ranger HBase Plug-in
4.5. Installing the Ranger Hive Plug-in
4.6. Installing the Ranger Knox Plug-in
4.7. Installing the Ranger Storm Plug-in
5. Enabling Audit Logging for HDFS and Solr
6. Verifying the Installation
14. Installing Hue
1. Prerequisites
2. Configure HDP
3. Install Hue
4. Configure Hue
5. Start Hue
6. Configuring Hue for an External Database
6.1. Using Hue with Oracle
6.2. Using Hue with MySQL
6.3. Using Hue with PostgreSQL
15. Installing Apache Sqoop
1. Install the Sqoop Package
2. Set Up the Sqoop Configuration
3. Validate the Sqoop Installation
16. Installing Apache Mahout
1. Install Mahout
2. Validate Mahout
17. Installing and Configuring Apache Flume
1. Understanding Flume
2. Installing Flume
3. Configuring Flume
4. Starting Flume
5. HDP and Flume
6. A Simple Example
18. Installing and Configuring Apache Storm
1. Install the Storm Package
2. Configure Storm
3. Configure a Process Controller
4. (Optional) Configure Kerberos Authentication for Storm
5. (Optional) Configuring Authorization for Storm
6. Validate the Installation
19. Installing and Configuring Apache Spark
1. Spark Prerequisites
2. Installing Spark
3. Configuring Spark
4. (Optional) Starting the Spark Thrift Server
5. Validating Spark
20. Installing and Configuring Apache Kafka
1. Install Kafka
2. Configure Kafka
3. Validate Kafka
21. Installing Apache Accumulo
1. Installing the Accumulo Package
2. Configuring Accumulo
3. Configuring the "Hosts" Files
4. Validating Accumulo
5. Smoke Testing Accumulo
22. Installing Apache Falcon
1. Installing the Falcon Package
2. Setting Directories and Permissions
3. Configuring Proxy Settings
4. Configuring Falcon Entities
5. Configuring Oozie for Falcon
6. Configuring Hive for Falcon
7. Configuring for Secure Clusters
8. Validate Falcon
23. Installing Apache Knox
1. Install the Knox Package on the Knox Server
2. Set up and Validate the Knox Gateway Installation
24. Installing Apache Slider
25. Installing and Configuring Apache Atlas
1. Atlas Prerequisites
2. Installing Atlas
3. Installing Atlas Metadata Hive Plugin
4. Configuring Hive Hook
5. Configuring the Graph Database
5.1. Choosing Between Storage Backends
5.2. Choosing Between Indexing Backends
5.3. Configure Atlas to Use HBase
5.4. Configure Atlas to Use SolrCloud
6. Configuring for Secure Clusters
7. Configuring Atlas in a Kerberized Cluster
8. Validating Atlas
26. Setting Up Kerberos Security for Manual Installs
27. Uninstalling HDP

List of Tables

1.1. Define Directories for Core Hadoop
1.2. Define Directories for Ecosystem Components
1.3. Define Users and Groups for Systems
1.4. Typical System Users and Groups
1.5. yarn-utils.py Options
1.6. Reserved Memory Recommendations
1.7. Recommended Values
1.8. YARN and MapReduce Configuration Setting Value Calculations
1.9. Example Value Calculations
1.10. Example Value Calculations
1.11. NameNode Heap Size Settings
8.1. Tez Configuration Parameters
9.1. Hive Configuration Parameters
11.1. Hadoop core-site.xml File Properties
13.1. install.properties Entries
13.2. Properties to Update in the install.properties File
13.3. Properties to Edit in the install.properties File
13.4. Properties to Edit in the install.properties File
13.5. Properties to Edit in the install.properties File
13.6. HBase Properties to Edit in the install.properties file
13.7. Hive-Related Properties to Edit in the install.properties File
13.8. Knox-Related Properties to Edit in the install.properties File
13.9. Storm-Related Properties to Edit in the install.properties file
14.1. Hue-Supported Browsers
14.2. Hue Dependencies on HDP Components
14.3. Variables to Configure HDFS Cluster
14.4. Variables to Configure the YARN Cluster
14.5. Beeswax Configuration Values
17.1. Flume 1.5.2 Dependencies
18.1. Required jaas.conf Sections for Cluster Nodes
18.2. Supported Authorizers
18.3. storm.yaml Configuration File Properties
18.4. worker-launcher.cfg File Configuration Properties
18.5. multitenant-scheduler.yaml Configuration File Properties
19.1. Prerequisites for running Spark 1.5.2
20.1. Kafka Configuration Properties
25.1. Atlas Cluster Prerequisites

loading table of contents...