HDFS Setup

Driverless AI allows you to explore HDFS data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with HDFS.

Description of Configuration Attributes

  • hdfs_auth_type: Selects HDFS authentication. Available values are:

    • principal
    • keytab
    • keytabimpersionation
    • noauth
  • hdfs_core_site_xml_path: The location of core-site.xml configuration file.

HDFS Setup in Docker

The following examples demonstrate how to configure the HDFS connector when Driverless AI is running inside Docker.

HDFS with No Authentication

This example enables the HDFS data connector and disables HDFS authentication. It does not pass any HDFS configuration file; however it configures Docker DNS by passing the name and IP of the HDFS name node. This allows users to reference data stored in HDFS directly using name node address, for example: hdfs://name.node/datasets/iris.csv.

nvidia-docker run \
 --add-host name.node:172.16.2.186 \
 -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,hdfs" \
 -e DRIVERLESS_AI_HDFS_AUTH_TYPE='noauth'  \
 -p 12345:12345 \
 --init -it --rm \
 -v /tmp/dtmp/:/tmp \
 -v /tmp/dlog/:/log \
 -u $(id -u):$(id -g) \
 opsh2oai/h2oai-runtime

HDFS with Keytab-Based Authentication

This example:

  • Places keytabs in the /tmp/dtmp folder on your machine and provides the file path as described below.
  • Configures the environment variable DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER to reference a user for whom the keytab was created (usually in the form of user@realm).
# Docker instructions
nvidia-docker run \
 -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,hdfs" \
 -e DRIVERLESS_AI_HDFS_AUTH_TYPE='Keytab'  \
 -e DRIVERLESS_AI_KEY_TAB_PATH='tmp/<<keytabname>>' \
 -e DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER='<<user@kerberosrealm>>' \
 -p 12345:12345 \
 --init -it --rm \
 -v /tmp/dtmp/:/tmp \
 -v /tmp/dlog/:/log \
 -u $(id -u):$(id -g) \
 opsh2oai/h2oai-runtime

HDFS with Keytab-Based Impersonation

The example:

  • Places keytabs in the /tmp/dtmp folder on your machine and provides the file path as described below.
  • Configures the DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER variable, which references a user for whom the keytab was created (usually in the form of user@realm).
  • Configures the DRIVERLESS_AI_HDFS_APP_LOGIN_USER variable, which references a user who is being impersonated (usually in the form of user@realm).
# Docker instructions
nvidia-docker run \
 -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,hdfs" \
 -e DRIVERLESS_AI_HDFS_AUTH_TYPE='Keytab'  \
 -e DRIVERLESS_AI_KEY_TAB_PATH='tmp/<<keytabname>>' \
 -e DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER='<<appuser@kerberosrealm>>' \
 -e DRIVERLESS_AI_HDFS_APP_LOGIN_USER='<<thisuser@kerberosrealm>>' \
 -p 12345:12345 \
 --init -it --rm \
 -v /tmp/dtmp/:/tmp \
 -v /tmp/dlog/:/log \
 -u $(id -u):$(id -g) \
 opsh2oai/h2oai-runtime