Configure Underlayer Storage System

Tachyon can run with different underlayer storage systems. This guide describes how to accomplish the following:

Setup supported underlayer storage systems

HDFS

The additional prerequisite for this part is Hadoop HDFS. By default, Tachyon is set to use HDFS version 1.0.4. You can use another Hadoop version by changing the hadoop.version tag in pom.xml in Tachyon and recompiling it. You can also set the hadoop version when compiling with maven:

$ mvn -Dhadoop.version=2.2.0 clean package

Edit tachyon-env.sh file. Setup TACHYON_UNDERFS_ADDRESS=hdfs://HDFS_HOSTNAME:HDFS_PORT. You may also need to setup JAVA_HOME in the same file.

Amazon S3

Edit tachyon-env.sh file. Setup TACHYON_UNDERFS_ADDRESS=s3n://S3_BUCKET/s3_directory and the necessary credentials such as fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey under TACHYON_JAVA_OPTS. You may also need to setup JAVA_HOME in the same file. Make sure that you set the underfs address to a directory and bucket that already exist (you can create new buckets and directories from the S3 web interface).

Some users have run into additional issues getting Tachyon working with S3. The hadoop-client package requires the jets3t package to use S3, but for some reason doesn’t pull it in as a depedency. One way to fix this is to repackage Tachyon, adding jets3t as a dependency. For example, the following should work with hadoop version 2.3.0, although depending on your version of Hadoop, you may need an older version of jets3t:

<dependency>
  <groupId>net.java.dev.jets3t</groupId>
  <artifactId>jets3t</artifactId>
  <version>0.9.0</version>
  <exclusions>
    <exclusion>
      <groupId>commons-codec</groupId>
      <artifactId>commons-codec</artifactId>
      <!-- <version>1.3</version> -->
    </exclusion>
  </exclusions>
</dependency>

If you are using a Tachyon client that is running separately from the Tachyon master/workers (in a seperate JVM), then you need to make sure that your AWS credentials are provided to the Tachyon client JVM processes as well. The easiest way to do this is to add them add them as command line options when starting your client JVM process. For example:

$ java -Xmx3g -Dfs.s3n.awsAccessKeyId=MY_ACCESS_KEY -Dfs.s3n.awsSecretAccessKey=MY_SECRET_KEY -cp my_jar_path.jar com.MyClass myArgs

GlusterFS

Prerequisites

You need to install GlusterFS on your cluster and create a GlusterFS volume GlusterFS usage.

Compile Tachyon with GlusterFS: mvn clean install -Dtest.profile=glusterfs -Dhadoop.version=2.3.0 -Dtachyon.underfs.glusterfs.mounts=/vol -Dtachyon.underfs.glusterfs.volumes=testvol , where /vol is a valid GlusterFS mount point.

Mount a GlusterFS filesystem

Assume the GlusterFS bricks are co-located with Tachyon nodes, the GlusterFS volume name is gsvol, and the mount point is /vol.

On each Tachyon node, edit /etc/fstab and add localhost:gsvol /vol glusterfs. Then mount the GlusterFS filesystem:

$ mount -a -t glusterfs

Configure Tachyon to use GlusterFS filesystem

Next, config Tachyon in tachyon folder:

$ cd /root/tachyon/conf
$ cp tachyon-glusterfs-env.sh.template tachyon-env.sh

Add the following lines to the tachyon-env.sh file:

TACHYON_UNDERFS_GLUSTER_VOLUMES=tachyon_vol
TACHYON_UNDERFS_GLUSTER_MOUNTS=/vol

Sync the configuration to all nodes.

Format the filesystem

$ cd /root/tachyon/
$ ./bin/tachyon format

Verify that GlusterFS is ready to use

$ cd /vol
$ ls

You should find that a tachyon folder has been created.

Add other underlayer storage system

Besides the above under file system, Tachyon can run on top of other under file system. In order to do so, a user should implement this UnderFileSystem interface. Here are the implementations of HDFS and Local File System.