Running Tachyon on a Cluster

Standalone cluster

First download the Tachyon tar file, and extract it.

$ wget https://github.com/amplab/tachyon/releases/download/v0.6.4/tachyon-0.6.4-bin.tar.gz
$ tar xvfz tachyon-0.6.4-bin.tar.gz

In the tachyon/conf directory, copy tachyon-env.sh.template to tachyon-env.sh. Make sure JAVA_HOME points to a valid Java 6/7 installation. Add the IP addresses of all the worker nodes to the tachyon/conf/workers file. Finally, sync all the information to worker nodes.

Now, you can start Tachyon:

$ cd tachyon
$ ./bin/tachyon format
$ ./bin/tachyon-start.sh # use the right parameters here. e.g. all Mount

To verify that Tachyon is running, you can visit http://tachyon.master.hostname:19999, check the log in the folder tachyon/logs, or run a sample program:

$ ./bin/tachyon runTests

Note: If you are using EC2, make sure the security group settings on the master node allow incoming connections on the tachyon web UI port.

Using the bootstrap-conf argument to the bin/tachyon script

The tachyon script also contains logic to create a basic config for a cluster. If you run:

$ cd tachyon
$ ./bin/tachyon bootstrap-conf <tachyon_master_hostname>

and there is no existing tachyon/conf/tachyon-env.sh file, then the script will create one with the appropriate settings for a cluster with a master node running at <tachyon_master_hostname>.

This script needs to be run on each node you wish to configure.

The script will configure your workers to use 2/3 of the total memory on each worker. This amount can be changed by editing the created tachyon/conf/tachyon-env.sh file on the worker.

EC2 cluster with Spark

If you use Spark to launch an EC2 cluster, Tachyon will be installed and configured by default.