This tutorial will get you up and running in minutes with HDFS. You will install and configure the DC/OS HDFS package and retrieve the core-site.xml and hdfs-site.xml files. These XML files are used to configure client nodes of the HDFS cluster.
Prerequisites:
- DC/OS and DC/OS CLI installed with a minimum of five private agent nodes, each with at least two CPU shares and eight GB of RAM available to the HDFS service.
-
Depending on your security mode, HDFS requires a service authentication token for access to DC/OS. For more information, see Configuring DC/OS Access for HDFS.
Security mode Service Account Disabled Not available Permissive Optional Strict Required
-
Install the HDFS package.
$ dcos package install beta-hdfs
Tip: Type
dcos beta-hdfs
to view the HDFS CLI options. -
Show the currently configured HDFS nodes.
$ dcos beta-hdfs --name=hdfs config list
The output should resemble:
[ "1773cced-0805-4b36-9022-ce5f08cf373a" ]
-
Configure HDFS on your nodes.
-
SSH to the leading master node.
$ dcos node ssh --leader --master-proxy
-
Pull the HDFS Docker container down to your node and start an interactive pseudo-TTY session.
$ docker run -it mesosphere/hdfs-client:2.6.4 /bin/bash
The output should resemble:
Unable to find image 'mesosphere/hdfs-client:2.6.4' locally 2.6.4: Pulling from mesosphere/hdfs-client 6edcc89ed412: Pull complete bdf37643ee24: Pull complete ea0211d47051: Pull complete a3ed95caeb02: Pull complete 12bd7c00b7e6: Pull complete 9a93505f2bac: Pull complete 9cc2baa935ae: Pull complete 88e8b845a891: Pull complete 9a84bc18aaba: Pull complete Digest: sha256:02384bc96d770e3e1fc6102b2019cdceea74e81f8223b8cdc330a499f1df733e Status: Downloaded newer image for mesosphere/hdfs-client:2.6.4
By default, the client is configured to be configured to connect to an HDFS service named
hdfs
and no further client configuration is required. If you want to configure with a different name, run this command with name (<hdfs-name>
) specified:$ HDFS_SERVICE_NAME=<hdfs-name> ./configure-hdfs.sh
-
List the contents.
$ ./bin/hdfs dfs -ls /
The output should be empty.
-
Create a file on HDFS.
$ echo "Test" | ./bin/hdfs dfs -put - /test.txt
-
List the contents again.
$ ./bin/hdfs dfs -ls /
The output should now resemble:
Found 1 items -rw-r--r-- 3 root supergroup 5 2017-08-25 17:41 /test.txt
-
Read the file to ensure data integrity.
$ ./bin/hdfs dfs -cat /test.txt
The output should resemble:
Test
-
To configure other clients, return to the DC/OS CLI and retrieve the
hdfs-site.xml
andcore-site.xml
files. Use these XML files to configure client nodes of the HDFS cluster.-
Run this command to retrieve the
hdfs-site.xml
file.$ dcos beta-hdfs --name=hdfs endpoints hdfs-site.xml
The output should resemble:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>dfs.nameservice.id</name> <value>hdfs</value> </property> ... </configuration>
-
Run this command to retrieve the
core-site.xml
file.$ dcos beta-hdfs --name=hdfs endpoints core-site.xml
The output should resemble:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.nameservice.id</name> <value>hdfs</value> </property> ... </configuration>
-