HDFS is available in the Universe and can be installed by using either the web interface or the DC/OS CLI.
Prerequisites
- Depending on your security mode in Enterprise DC/OS, you may need to provision a service account before installing HDFS. Only someone with
superuser
permission can create the service account.strict
security mode requires a service account.permissive
security mode a service account is optional.disabled
security mode does not require a service account.
- A minimum of five agent nodes with eight GiB of memory and ten GiB of disk available on each agent.
- Each agent node must have these ports available: 8480, 8485, 9000, 9001, 9002, 9005, and 9006, and 9007.
Installation
Install HDFS from the DC/OS web interface. Find the package in Universe and perform an advanced installation. On the Service tab, scroll to the bottom and click the box next to AGREE TO BETA TERMS. Then, click REVIEW AND INSTALL.
This command creates a new HDFS cluster with two name nodes, three journal nodes, and five data nodes. Two clusters cannot share the same name. To install more than one HDFS cluster, customize the name
at install time for each additional instance. See the Custom Installation section for more information.
The default installation may not be sufficient for a production deployment, but all cluster operations will work. If you are planning a production deployment with 3 replicas of each value and with local quorum consistency for read and write operations (a very common use case), this configuration is sufficient for development and testing purposes, and it can be scaled to a production deployment.
Once you have installed HDFS, install the CLI.
$ dcos package install beta-hdfs --cli
Service Settings
Service Name
Each instance of HDFS in a given DC/OS cluster must be configured with a different service name. You can configure the service name in the service section of the advanced installation section of the DC/OS web interface or with a JSON options file when installing from the DC/OS CLI. See Multiple HDFS Cluster Installation for more information. The default service name (used in many examples here) is beta-hdfs
.
Custom Installation
If you are ready to ship into production, you will likely need to customize the deployment to suit the workload requirements of your application(s). Customize the default deployment by creating a JSON file, then pass it to dcos package install
using the --options
parameter.
Sample JSON options file named sample-hdfs.json
:
{
"data_node": {
"count": 10
}
}
The command below creates a cluster using sample-hdfs.json
:
$ dcos package install --options=sample-hdfs.json hdfs
Recommendation: Store your custom configuration in source control.
This cluster will have 10 data nodes instead of the default value of 3. See the Configuration section for a list of fields that can be customized via a options JSON file when the HDFS cluster is created.
Minimal Installation
Many of the other Infinity services currently support DC/OS Vagrant deployment. However, DC/OS HDFS currently only supports deployment with an HA name service managed by a Quorum Journal. The resource requirements for such a deployment make it prohibitive to install on a local development machine. The default deployment, is the minimal safe deployment for a DC/OS HDFS cluster. Community contributions to support deployment of a non-HA cluster, e.g. a single name node and data node with no failure detector, would be welcome.
Multiple HDFS cluster Installation
Installing multiple HDFS clusters is identical to installing an HDFS cluster with a custom configuration, as described above. Use a JSON options file to specify a unique name
for each installation:
$ cat hdfs1.json
{
"service": {
"name": "hdfs1"
}
}
$ dcos package install beta-hdfs --options=hdfs1.json
Use the --name
argument after install time to specify which HDFS instance to query. All dcos hdfs
CLI commands accept the --name
argument. If you do not specify a service name, the CLI assumes the default value, hdfs
.
Integration with DC/OS access controls
In Enterprise DC/OS 1.10 and above, you can integrate your SDK-based service with DC/OS ACLs to grant users and groups access to only certain services. You do this by installing your service into a folder, and then restricting access to some number of folders. Folders also allow you to namespace services. For instance, staging/hdfs
and production/hdfs
.
Steps:
- In the DC/OS GUI, create a group, then add a user to the group. Or, just create a user. Click Organization > Groups > + or Organization > Users > +. If you create a group, you must also create a user and add them to the group.
-
Give the user permissions for the folder where you will install your service. In this example, we are creating a user called
developer
, who will have access to the/testing
folder. Select the group or user you created. Select ADD PERMISSION and then toggle to INSERT PERMISSION STRING. Add each of the following permissions to your user or group, and then click ADD PERMISSIONS.dcos:adminrouter:service:marathon full dcos:service:marathon:marathon:services:/testing full dcos:adminrouter:ops:mesos full dcos:adminrouter:ops:slave full
- Install your service into a folder called
test
. Go to Catalog, then search for beta-hdfs. -
Click CONFIGURE and change the service name to
/testing/hdfs
, then deploy.The slashes in your service name are interpreted as folders. You are deploying HDFS in the
/testing
folder. Any user with access to the/testing
folder will have access to the service.
Important:
- Services cannot be renamed. Because the location of the service is specified in the name, you cannot move services between folders.
- DC/OS 1.9 and earlier does not accept slashes in service names. You may be able to create the service, but you will encounter unexpected problems.
Interacting with your foldered service
- Interact with your foldered service via the DC/OS CLI with this flag:
--name=/path/to/myservice
. - To interact with your foldered service over the web directly, use
http://<dcos-url>/service/path/to/myservice
. E.g.,http://<dcos-url>/service/testing/hdfs/v1/endpoints
.
Zones
Placement constraints can be applied to zones by referring to the @zone
key. For example, one could spread pods across a minimum of 3 different zones by specifying the constraint [["@zone", "GROUP_BY", "3"]]
.
Example
Suppose we have a Mesos cluster with zones a
,b
,c
.
Balanced Placement for a Single Region
{
...
"count": 6,
"placement": "[[\"@zone\", \"GROUP_BY\", \"3\"]]"
}
- Instances will all be evenly divided between zones
a
,b
,c
.
Colocation
An individual HDFS deployment will colocate name nodes with journal nodes, but it will not colocate two name nodes or two journal nodes on the same agent node in the cluster. Data nodes may be colocated with both name nodes and journal nodes. If multiple clusters are installed, they may share the same agent nodes in the cluster provided that no ports specified in the service configurations conflict for those node types.
Installation Plan
When the DC/OS HDFS service is initially installed, it generates an installation plan as shown below.
{
phases: [{
id: "0c64b701-6b8b-440e-93e1-6c43b05abfc2",
name: "jn-deploy",
steps: [{
id: "ba608f90-32c6-42e0-93c7-a6b51fc2e52b",
status: "COMPLETE",
name: "journal-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-0:[node] [ba608f90-32c6-42e0-93c7-a6b51fc2e52b]' has status: 'COMPLETE'."
}, {
id: "f29c13d8-a477-4e2b-84ac-046cd8a7d283",
status: "COMPLETE",
name: "journal-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-1:[node] [f29c13d8-a477-4e2b-84ac-046cd8a7d283]' has status: 'COMPLETE'."
}, {
id: "75da5719-9296-4872-887a-cc1ab157191b",
status: "COMPLETE",
name: "journal-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-2:[node] [75da5719-9296-4872-887a-cc1ab157191b]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "967aadc8-ef25-402e-b81a-9fcab0e57463",
name: "nn-deploy",
steps: [{
id: "33d02aa7-0927-428b-82d7-cec93cfde090",
status: "COMPLETE",
name: "name-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[format] [33d02aa7-0927-428b-82d7-cec93cfde090]' has status: 'COMPLETE'."
}, {
id: "62f048f4-7b6c-4ea0-8509-82b328d40e61",
status: "STARTING",
name: "name-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[node] [62f048f4-7b6c-4ea0-8509-82b328d40e61]' has status: 'STARTING'."
}, {
id: "e7b8aa27-39b2-4aba-9d87-3b47c752878f",
status: "PENDING",
name: "name-1:[bootstrap]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[bootstrap] [e7b8aa27-39b2-4aba-9d87-3b47c752878f]' has status: 'PENDING'."
}, {
id: "cd633e0d-6aac-45a9-b764-296676fc228d",
status: "PENDING",
name: "name-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[node] [cd633e0d-6aac-45a9-b764-296676fc228d]' has status: 'PENDING'."
}],
status: "IN_PROGRESS"
}, {
id: "248fa719-dc57-4ef9-9c48-5e6e1218b6c2",
name: "zkfc-deploy",
steps: [{
id: "812ae290-e6a4-4128-b486-7288e855bfe6",
status: "PENDING",
name: "zkfc-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[format] [812ae290-e6a4-4128-b486-7288e855bfe6]' has status: 'PENDING'."
}, {
id: "8a401585-a5af-4540-94f0-dada95093329",
status: "PENDING",
name: "zkfc-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[node] [8a401585-a5af-4540-94f0-dada95093329]' has status: 'PENDING'."
}, {
id: "7eabc15d-7feb-4546-8699-0fadac1f303b",
status: "PENDING",
name: "zkfc-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-1:[node] [7eabc15d-7feb-4546-8699-0fadac1f303b]' has status: 'PENDING'."
}],
status: "PENDING"
}, {
id: "ddb20a39-830b-417b-a901-5b9027e459f7",
name: "dn-deploy",
steps: [{
id: "018deb17-8853-4d3b-820c-6b2caa55743f",
status: "PENDING",
name: "data-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-0:[node] [018deb17-8853-4d3b-820c-6b2caa55743f]' has status: 'PENDING'."
}],
status: "PENDING"
}],
errors: [],
status: "IN_PROGRESS"
}
Viewing the Installation Plan
The plan can be viewed from the API via the REST endpoint. A curl example is provided below. See the REST API Authentication part of the REST API Reference section for information on how this request must be authenticated.
$ curl -v -H "Authorization: token=$(dcos config show core.dcos_acs_token)" http://<dcos_url>/service/hdfs/v1/plans/deploy
Plan Errors
The plan will display any errors that prevent installation in the errors list. The presence of any error indicates that the installation cannot progress. See the Troubleshooting section for information on resolving errors.
Quorum Journal
The first phase of the installation is the Quorum Journal phase. This phase will deploy three journal nodes to provide a Quorum Journal for the HA name service. Each step in the phase represents an individual journal node.
Name Service
The second phase of the installation is deployment of the HA name service. This phase deploys two name nodes. Needed format and bootstrap operations occur as necessary.
ZKFC
The third phase of the installation is deployment of the ZKFC nodes. This phase deploys two ZKFC nodes to enable ZooKeeper failure detection. Each step represents an individual ZKFC node, and there are always exactly two.
Distributed Storage
The final phase of the installation is deployment of the distributed storage service. This phase deploys the data nodes that are configured to act as storage for the cluster. The number of data nodes can be reconfigured post installation.
Pausing Installation
To pause installation, issue a REST API request as shown below. The installation will pause after completing installation of the current node and wait for user input.
$ curl -v -H "Authorization: token=$(dcos config show core.dcos_acs_token)" -X POST http://<dcos_url>/service/hdfs/v1/plans/deploy/interrupt
Resuming Installation
If the installation has been paused, the REST API request below will resume installation at the next pending node.
$ curl -v -H "Authorization: token=$(dcos config show core.dcos_acs_token)" -X POST http://<dcos_url>/service/hdfs/v1/plans/deploy/continue
Virtual networks
HDFS supports deployment on virtual networks on DC/OS (including the dcos
overlay network), allowing each container to have its own IP address and not use the ports resources on the agent. This can be specified by passing the following configuration during installation:
{
"service": {
"virtual_network_enabled": true
}
}
As mentioned in the developer guide once the service is deployed on a virtual network, it cannot be updated to use the host network.
TLS
HDFS can be launched with TLS encryption. Enabling TLS is only possible in permissive
and strict
cluster security modes on Enterprise DC/OS. Both modes require a service account. Additionally, a service account must have the dcos:superuser
permission. If the permission is missing the HDFS scheduler will not abe able to provision TLS artifacts.
Sample JSON options file named hdfs-tls.json
:
{
"service": {
"service_account_secret": "hdfs",
"service_account": "hdfs",
"security": {
"transport_encryption": {
"enabled": true
}
}
}
}
For more information about TLS in the SDK see the TLS documentation.
Placement Constraints
Placement constraints allow you to customize where a service is deployed in the DC/OS cluster. Depending on the service, some or all components may be configurable using Marathon operators (reference) with this syntax: field:OPERATOR[:parameter]
. For example, if the reference lists [["hostname", "UNIQUE"]]
, you should use hostname:UNIQUE
.
A common task is to specify a list of whitelisted systems to deploy to. To achieve this, use the following syntax for the placement constraint:
hostname:LIKE:10.0.0.159|10.0.1.202|10.0.3.3
You must include spare capacity in this list, so that if one of the whitelisted systems goes down, there is still enough room to repair your service (via pod replace
) without requiring that system.
Regions and Zones
Placement constraints can be applied to zones by referring to the @zone
key. For example, one could spread pods across a minimum of 3 different zones by specifying the constraint:
[["@zone", "GROUP_BY", "3"]]
When the region awareness feature is enabled (currently in beta), the @region
key can also be referenced for defining placement constraints. Any placement constraints that do not reference the @region
key are constrained to the local region.
Clients
Clients connecting to HDFS over a TLS connection must connect to an HTTPS specific port. Each node type (journal
, name
and data
) can be configured with different port numbers for TLS connections.
Clients can connect only over the TLS version 1.2.
Changing Configuration at Runtime
You can customize your cluster in-place when it is up and running.
The instructions below describe how to update the configuration for a running DC/OS service.
Enterprise DC/OS 1.10
Enterprise DC/OS 1.10 introduces a convenient command line option that allows for easier updates to a service’s configuration, as well as allowing users to inspect the status of an update, to pause and resume updates, and to restart or complete steps if necessary.
Prerequisites
- Enterprise DC/OS 1.10 or newer.
- Service with a version greater than 2.0.0-x.
- The DC/OS CLI installed and available.
- The service’s subcommand available and installed on your local machine.
- You can install just the subcommand CLI by running
dcos package install --cli beta-hdfs
. - If you are running an older version of the subcommand CLI that doesn’t have the
update
command, uninstall and reinstall your CLI.dcos package uninstall --cli beta-hdfs dcos package install --cli beta-hdfs
- You can install just the subcommand CLI by running
Preparing configuration
If you installed this service with Enterprise DC/OS 1.10, you can fetch the full configuration of a service (including any default values that were applied during installation). For example:
$ dcos beta-hdfs describe > options.json
Make any configuration changes to this options.json
file.
If you installed this service with a prior version of DC/OS, this configuration will not have been persisted by the the DC/OS package manager. You can instead use the options.json
file that was used when installing the service.
Note: You must specify all configuration values in the options.json
file when performing a configuration update. Any unspecified values will be reverted to the default values specified by the DC/OS service. See the “Recreating options.json
” section below for information on recovering these values.
Recreating options.json
(optional)
If the options.json
from when the service was last installed or updated is not available, you will need to manually recreate it using the following steps.
First, we’ll fetch the default application’s environment, current application’s environment, and the actual template that maps config values to the environment:
- Ensure you have jq installed.
- Set the service name that you’re using, for example:
$ SERVICE_NAME=beta-hdfs
- Get the version of the package that is currently installed:
$ PACKAGE_VERSION=$(dcos package list | grep $SERVICE_NAME | awk '{print $2}')
- Then fetch and save the environment variables that have been set for the service:
$ dcos marathon app show $SERVICE_NAME | jq .env > current_env.json
- To identify those values that are custom, we’ll get the default environment variables for this version of the service:
$ dcos package describe --package-version=$PACKAGE_VERSION --render --app $SERVICE_NAME | jq .env > default_env.json
- We’ll also get the entire application template:
$ dcos package describe $SERVICE_NAME --app > marathon.json.mustache
Now that you have these files, we’ll attempt to recreate the options.json
.
- Use JQ and
diff
to compare the two:$ diff <(jq -S . default_env.json) <(jq -S . current_env.json)
- Now compare these values to the values contained in the
env
section in application template:$ less marathon.json.mustache
- Use the variable names (e.g. ``) to create a new
options.json
file as described in Initial service configuration.
Starting the update
Once you are ready to begin, initiate an update using the DC/OS CLI, passing in the updated options.json
file:
$ dcos beta-hdfs update start --options=options.json
You will receive an acknowledgement message and the DC/OS package manager will restart the Scheduler in Marathon.
See Advanced update actions for commands you can use to inspect and manipulate an update after it has started.
Open Source DC/OS, Enterprise DC/OS 1.9 and Earlier
If you do not have Enterprise DC/OS 1.10 or later, the CLI commands above are not available. For Open Source DC/OS of any version, or Enterprise DC/OS 1.9 and earlier, you can perform changes from the DC/OS GUI.
These are the general steps to follow:
-
Go to the Services tab of the DC/OS GUI and click the name of the HDFS service to be updated.
-
Within the HDFS instance details view, click the vertical ellipsis menu in the upper right, then choose Edit.
-
Click the Environment tab and make your updates. For example, to increase the number of nodes, edit the value for
DATA_COUNT
. -
Click REVIEW & RUN to apply any changes and cleanly reload the HDFS scheduler. The HDFS cluster itself will persist across the change.
Configuration Deployment Strategy
Configuration updates are rolled out through execution of update plans. You can configure the way these plans are executed.
Configuration Update Plans
This configuration update strategy is analogous to the installation procedure above. If the configuration update is accepted, there will be no errors in the generated plan, and a rolling restart will be performed on all nodes to apply the updated configuration. However, the default strategy can be overridden by a strategy the user provides.
Configuration Update
Make the REST request below to view the current plan. See the REST API Authentication part of the REST API Reference section for information on how this request must be authenticated.
$ curl -v -H "Authorization: token=$(dcos config show core.dcos_acs_token)" "http://<dcos_url>/service/hdfs/v1/plans/deploy"
The response will look similar to this:
{
phases: [{
id: "77708b6f-52db-4361-a56f-4d2bd9d6bf09",
name: "jn-deploy",
steps: [{
id: "fe96b235-bf26-4b36-ad0e-8dd3494b5b63",
status: "COMPLETE",
name: "journal-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-0:[node] [fe96b235-bf26-4b36-ad0e-8dd3494b5b63]' has status: 'COMPLETE'."
}, {
id: "3a590b92-f2e8-439a-8951-4135b6c29b34",
status: "COMPLETE",
name: "journal-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-1:[node] [3a590b92-f2e8-439a-8951-4135b6c29b34]' has status: 'COMPLETE'."
}, {
id: "c079bfcc-b620-4e0b-93d5-604223c0538d",
status: "COMPLETE",
name: "journal-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-2:[node] [c079bfcc-b620-4e0b-93d5-604223c0538d]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "61a822e4-421d-4a46-b374-b0abaac6d2d4",
name: "nn-deploy",
steps: [{
id: "0361e2f5-6e5d-42c8-9696-420f38ba5398",
status: "COMPLETE",
name: "name-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[format] [0361e2f5-6e5d-42c8-9696-420f38ba5398]' has status: 'COMPLETE'."
}, {
id: "a3a07cd4-828d-4161-b607-a745f3845abf",
status: "COMPLETE",
name: "name-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[node] [a3a07cd4-828d-4161-b607-a745f3845abf]' has status: 'COMPLETE'."
}, {
id: "74ce8cf4-63cf-4ba2-ae6d-3ca6a389c66d",
status: "COMPLETE",
name: "name-1:[bootstrap]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[bootstrap] [74ce8cf4-63cf-4ba2-ae6d-3ca6a389c66d]' has status: 'COMPLETE'."
}, {
id: "d46eb578-eb55-40b7-a683-7e44543ce63e",
status: "COMPLETE",
name: "name-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[node] [d46eb578-eb55-40b7-a683-7e44543ce63e]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "f42c610b-e52b-4ce9-ab1c-887a123df234",
name: "zkfc-deploy",
steps: [{
id: "9a5be25e-9997-4be9-bc67-1722056c8e8a",
status: "COMPLETE",
name: "zkfc-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[format] [9a5be25e-9997-4be9-bc67-1722056c8e8a]' has status: 'COMPLETE'."
}, {
id: "2c8c9518-6d3c-48ed-bca1-58fd749da9c0",
status: "COMPLETE",
name: "zkfc-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[node] [2c8c9518-6d3c-48ed-bca1-58fd749da9c0]' has status: 'COMPLETE'."
}, {
id: "7e146767-21be-4d95-b83d-667a647b0503",
status: "COMPLETE",
name: "zkfc-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-1:[node] [7e146767-21be-4d95-b83d-667a647b0503]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "774c5fec-7195-4ffd-850f-df62cf629fa9",
name: "dn-deploy",
steps: [{
id: "f41b364f-5804-41ec-9d02-cd27f7b484ef",
status: "COMPLETE",
name: "data-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-0:[node] [f41b364f-5804-41ec-9d02-cd27f7b484ef]' has status: 'COMPLETE'."
}, {
id: "22d457dc-6ad4-4f6f-93f3-4c5071069503",
status: "STARTING",
name: "data-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-1:[node] [22d457dc-6ad4-4f6f-93f3-4c5071069503]' has status: 'STARTING'."
}, {
id: "a2798e72-83e2-4fad-a673-c5ff42ac9a0c",
status: "STARTING",
name: "data-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-2:[node] [a2798e72-83e2-4fad-a673-c5ff42ac9a0c]' has status: 'STARTING'."
}],
status: "STARTING"
}],
errors: [],
status: "STARTING"
}
If you want to interrupt a configuration update that is in progress, enter the interrupt
command.
$ curl -X -H "Authorization: token=$(dcos config show core.dcos_acs_token)" POST http:/<dcos_url>/service/hdfs/v1/plans/deploy/interrupt
If you query the plan again, the response will look like this (notice status: "Waiting"
):
{
phases: [{
id: "77708b6f-52db-4361-a56f-4d2bd9d6bf09",
name: "jn-deploy",
steps: [{
id: "fe96b235-bf26-4b36-ad0e-8dd3494b5b63",
status: "COMPLETE",
name: "journal-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-0:[node] [fe96b235-bf26-4b36-ad0e-8dd3494b5b63]' has status: 'COMPLETE'."
}, {
id: "3a590b92-f2e8-439a-8951-4135b6c29b34",
status: "COMPLETE",
name: "journal-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-1:[node] [3a590b92-f2e8-439a-8951-4135b6c29b34]' has status: 'COMPLETE'."
}, {
id: "c079bfcc-b620-4e0b-93d5-604223c0538d",
status: "COMPLETE",
name: "journal-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-2:[node] [c079bfcc-b620-4e0b-93d5-604223c0538d]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "61a822e4-421d-4a46-b374-b0abaac6d2d4",
name: "nn-deploy",
steps: [{
id: "0361e2f5-6e5d-42c8-9696-420f38ba5398",
status: "COMPLETE",
name: "name-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[format] [0361e2f5-6e5d-42c8-9696-420f38ba5398]' has status: 'COMPLETE'."
}, {
id: "a3a07cd4-828d-4161-b607-a745f3845abf",
status: "COMPLETE",
name: "name-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[node] [a3a07cd4-828d-4161-b607-a745f3845abf]' has status: 'COMPLETE'."
}, {
id: "74ce8cf4-63cf-4ba2-ae6d-3ca6a389c66d",
status: "COMPLETE",
name: "name-1:[bootstrap]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[bootstrap] [74ce8cf4-63cf-4ba2-ae6d-3ca6a389c66d]' has status: 'COMPLETE'."
}, {
id: "d46eb578-eb55-40b7-a683-7e44543ce63e",
status: "COMPLETE",
name: "name-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[node] [d46eb578-eb55-40b7-a683-7e44543ce63e]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "f42c610b-e52b-4ce9-ab1c-887a123df234",
name: "zkfc-deploy",
steps: [{
id: "9a5be25e-9997-4be9-bc67-1722056c8e8a",
status: "COMPLETE",
name: "zkfc-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[format] [9a5be25e-9997-4be9-bc67-1722056c8e8a]' has status: 'COMPLETE'."
}, {
id: "2c8c9518-6d3c-48ed-bca1-58fd749da9c0",
status: "COMPLETE",
name: "zkfc-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[node] [2c8c9518-6d3c-48ed-bca1-58fd749da9c0]' has status: 'COMPLETE'."
}, {
id: "7e146767-21be-4d95-b83d-667a647b0503",
status: "COMPLETE",
name: "zkfc-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-1:[node] [7e146767-21be-4d95-b83d-667a647b0503]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "774c5fec-7195-4ffd-850f-df62cf629fa9",
name: "dn-deploy",
steps: [{
id: "f41b364f-5804-41ec-9d02-cd27f7b484ef",
status: "COMPLETE",
name: "data-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-0:[node] [f41b364f-5804-41ec-9d02-cd27f7b484ef]' has status: 'COMPLETE'."
}, {
id: "22d457dc-6ad4-4f6f-93f3-4c5071069503",
status: "STARTING",
name: "data-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-1:[node] [22d457dc-6ad4-4f6f-93f3-4c5071069503]' has status: 'STARTING'."
}, {
id: "a2798e72-83e2-4fad-a673-c5ff42ac9a0c",
status: "PENDING",
name: "data-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-2:[node] [a2798e72-83e2-4fad-a673-c5ff42ac9a0c]' has status: 'PENDING'."
}],
status: "WAITING"
}],
errors: [],
status: "WAITING"
}
Note: The interrupt command can’t stop a block that is STARTING
, but it will stop the change on the subsequent blocks.
Enter the continue
command to resume the update process.
$ curl -X -H "Authorization: token=$(dcos config show core.dcos_acs_token)" POST http://<dcos_url>/service/hdfs/v1/plans/deploy/continue
After you execute the continue operation, the plan will look like this:
{
phases: [{
id: "77708b6f-52db-4361-a56f-4d2bd9d6bf09",
name: "jn-deploy",
steps: [{
id: "fe96b235-bf26-4b36-ad0e-8dd3494b5b63",
status: "COMPLETE",
name: "journal-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-0:[node] [fe96b235-bf26-4b36-ad0e-8dd3494b5b63]' has status: 'COMPLETE'."
}, {
id: "3a590b92-f2e8-439a-8951-4135b6c29b34",
status: "COMPLETE",
name: "journal-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-1:[node] [3a590b92-f2e8-439a-8951-4135b6c29b34]' has status: 'COMPLETE'."
}, {
id: "c079bfcc-b620-4e0b-93d5-604223c0538d",
status: "COMPLETE",
name: "journal-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'journal-2:[node] [c079bfcc-b620-4e0b-93d5-604223c0538d]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "61a822e4-421d-4a46-b374-b0abaac6d2d4",
name: "nn-deploy",
steps: [{
id: "0361e2f5-6e5d-42c8-9696-420f38ba5398",
status: "COMPLETE",
name: "name-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[format] [0361e2f5-6e5d-42c8-9696-420f38ba5398]' has status: 'COMPLETE'."
}, {
id: "a3a07cd4-828d-4161-b607-a745f3845abf",
status: "COMPLETE",
name: "name-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-0:[node] [a3a07cd4-828d-4161-b607-a745f3845abf]' has status: 'COMPLETE'."
}, {
id: "74ce8cf4-63cf-4ba2-ae6d-3ca6a389c66d",
status: "COMPLETE",
name: "name-1:[bootstrap]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[bootstrap] [74ce8cf4-63cf-4ba2-ae6d-3ca6a389c66d]' has status: 'COMPLETE'."
}, {
id: "d46eb578-eb55-40b7-a683-7e44543ce63e",
status: "COMPLETE",
name: "name-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'name-1:[node] [d46eb578-eb55-40b7-a683-7e44543ce63e]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "f42c610b-e52b-4ce9-ab1c-887a123df234",
name: "zkfc-deploy",
steps: [{
id: "9a5be25e-9997-4be9-bc67-1722056c8e8a",
status: "COMPLETE",
name: "zkfc-0:[format]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[format] [9a5be25e-9997-4be9-bc67-1722056c8e8a]' has status: 'COMPLETE'."
}, {
id: "2c8c9518-6d3c-48ed-bca1-58fd749da9c0",
status: "COMPLETE",
name: "zkfc-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-0:[node] [2c8c9518-6d3c-48ed-bca1-58fd749da9c0]' has status: 'COMPLETE'."
}, {
id: "7e146767-21be-4d95-b83d-667a647b0503",
status: "COMPLETE",
name: "zkfc-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'zkfc-1:[node] [7e146767-21be-4d95-b83d-667a647b0503]' has status: 'COMPLETE'."
}],
status: "COMPLETE"
}, {
id: "774c5fec-7195-4ffd-850f-df62cf629fa9",
name: "dn-deploy",
steps: [{
id: "f41b364f-5804-41ec-9d02-cd27f7b484ef",
status: "COMPLETE",
name: "data-0:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-0:[node] [f41b364f-5804-41ec-9d02-cd27f7b484ef]' has status: 'COMPLETE'."
}, {
id: "22d457dc-6ad4-4f6f-93f3-4c5071069503",
status: "STARTING",
name: "data-1:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-1:[node] [22d457dc-6ad4-4f6f-93f3-4c5071069503]' has status: 'STARTING'."
}, {
id: "a2798e72-83e2-4fad-a673-c5ff42ac9a0c",
status: "STARTING",
name: "data-2:[node]",
message: "com.mesosphere.sdk.scheduler.plan.DeploymentStep: 'data-2:[node] [a2798e72-83e2-4fad-a673-c5ff42ac9a0c]' has status: 'STARTING'."
}],
status: "STARTING"
}],
errors: [],
status: "STARTING"
}
Configuration Options
The following describes the most commonly used features of DC/OS Apache HDFS and how to configure them via the DC/OS CLI and the DC/OS GUI. There are two methods of configuring an HDFS cluster. The configuration may be specified using a JSON file during installation via the DC/OS command line (See the Installation section) or via modification to the Service Scheduler’s DC/OS environment at runtime (See the Configuration Update section). Note that some configuration options may only be specified at installation time.
Service Configuration
The service configuration object contains properties that MUST be specified during installation and CANNOT be modified after installation is in progress. This configuration object is similar across all DC/OS Infinity services. Service configuration example:
{
"service": {
"name": "hdfs",
"service_account": "hdfs-principal",
}
}
Property | Type | Description |
---|---|---|
name | string | The name of the HDFS service installation. This must be unique for each DC/OS HDFS service instance deployed on a DC/OS cluster. It will determine the ID of the HDFS nameservice, which must be unique within a DC/OS cluster. |
service_account | string | The DC/OS service account for the HDFS cluster. |
Change the Service Name
- In the DC/OS CLI, options.json:
name
= string (default:hdfs
)
Node Configuration
The node configuration objects correspond to the configuration for nodes in the HDFS cluster. Node configuration MUST be specified during installation and MAY be modified during configuration updates. All of the properties except disk
and disk_type
MAY be modified during the configuration update process.
Example node configuration:
"journal_node": {
"cpus": 0.5,
"mem": 4096,
"disk": 10240,
"disk_type": "ROOT",
"strategy": "parallel"
},
"name_node": {
"cpus": 0.5,
"mem": 4096,
"disk": 10240,
"disk_type": "ROOT"
},
"zkfc_node": {
"cpus": 0.5,
"mem": 4096
},
"data_node": {
"count": 3,
"cpus": 0.5,
"mem": 4096,
"disk": 10240,
"disk_type": "ROOT",
"strategy": "parallel"
}
Property | Type | Description |
---|---|---|
cpus | number | The number of cpu shares allocated to the node's process. |
mem | integer | The amount of memory, in MB, allocated to the node. This value MUST be larger than the specified max heap size. Make sure to allocate enough space for additional memory used by the JVM and other overhead. A good rule of thumb is allocate twice the heap size in MB for memory. |
disk | integer | The amount of disk, in MB, allocated to node. **Note:** Once this value is configured, it can not be changed. |
disk_type | string | The type of disk to use for storing data. Possible values: ROOT (default) and MOUNT. Note: Once this value is configured, it can not be changed.
|
strategy | string | The strategy used to deploy that node type. Possible values: parallel (default) and serial.
|
count | integer | The number of nodes of that node type for the cluster. There are always exactly two name nodes, so the name_node object has no count property. Users may select either 3 or 5 journal nodes. The default value of 3 is sufficient for most deployments and should only be overridden after careful thought. At least 3 data nodes should be configured, but this value may be increased to meet the storage needs of the deployment. |
HDFS File System Configuration
The HDFS file system network configuration, permissions, and compression is configured via the hdfs
JSON object. Once these properties are set at installation time they can not be reconfigured.
Example HDFS configuration:
{
"hdfs": {
"name_node_rpc_port": 9001,
"name_node_http_port": 9002,
"journal_node_rpc_port": 8485,
"journal_node_http_port": 8480,
"data_node_rpc_port": 9005,
"data_node_http_port": 9006,
"data_node_ipc_port": 9007,
"permissions_enabled": false,
"name_node_heartbeat_recheck_interval": 60000,
"compress_image": true,
"image_compression_codec": "org.apache.hadoop.io.compress.SnappyCodec"
}
}
Property | Type | Description |
---|---|---|
name_node_rpc_port | integer | The port on which the name nodes will listen for RPC connections. |
name_node_http_port | integer | The port on which the name nodes will listen for HTTP connections. |
journal_node_rpc_port | integer | The port on which the journal nodes will listen for RPC connections. |
journal_node_http_port | integer | The port on which the journal nodes will listen for HTTP connections. |
data_node_rpc_port | integer | The port on which the data nodes will listen for RPC connections. |
data_node_http_port | integer | The port on which the data nodes will listen for HTTP connections. |
data_node_ipc_port | integer | The port on which the data nodes will listen for IPC connections. This property is useful if you deploy a service that colocates with HDFS data nodes. It provides domain socket communication instead of RPC |
Operating System Configuration
In order for HDFS to function correctly, you must perform several important configuration modifications to the OS hosting the deployment.
Configuration Settings
HDFS requires OS-level configuration settings typical of a production storage server.
File | Setting | Value | Reason |
---|---|---|---|
/etc/sysctl.conf | vm.swappiness | 0 | If the OS swaps out the HDFS processes, they can fail to respond to RPC requests, resulting in the process being marked down by the cluster. This can be particularly troublesome for name nodes and journal nodes. |
/etc/security/limits.conf | nofile | unlimited | If this value is too low, a job that operate on the HDFS cluster may fail due to too may open file handles. |
/etc/security/limits.conf, /etc/security/limits.d/90-nproc.conf | nproc | 32768 | An HDFS node spawns many threads, which go towards kernel nproc count. If nproc is not set appropriately, the node will be killed. |