Running Apache Flink on Tachyon
This guide describes how to get Tachyon running with Apache Flink, so that you can easily work with files stored in Tachyon.
Prerequisites
The prerequisite for this part is that you have Java. We also assume that you have set up Tachyon and Flink in accordance to these guides Local Mode or Cluster Mode.
Please find the guides for setting up Flink on the Apache Flink website.
Configuration
Apache Flink allows to use Tachyon through a generic file system wrapper for Hadoop file systems. Therefore, the configuration of Tachyon is done mostly in Hadoop configuration files.
Set property in core-site.xml
If you have a Hadoop setup next to the Flink installation, add the following property to the
core-site.xml
configuration file:
<property>
<name>fs.tachyon.impl</name>
<value>tachyon.hadoop.TFS</value>
</property>
In case you don’t have a Hadoop setup, you have to create a file called core-site.xml
with the
following contents:
<configuration>
<property>
<name>fs.tachyon.impl</name>
<value>tachyon.hadoop.TFS</value>
</property>
</configuration>
Specify path to core-site.xml
in conf/flink-config.yaml
Next, you have to specify the path to the Hadoop configuration in Flink. Therefore, open the
conf/flink-config.yaml
file in the Flink root directory and set the fs.hdfs.hadoopconf
configuration value to the directory containing the core-site.xml
. For newer Hadoop versions,
the directory usually ends with etc/hadoop/
.
Make tachyon-0.6.4.jar
available to Flink
In the last step, we need to make the Tachyon jar
file available to Flink, because it contains the
configured tachyon.hadoop.TFS
class.
There are different ways to achieve that:
- Put the
tachyon-0.6.4.jar
file into thelib/
directory of Flink (for local and standalone cluster setups) - Put the
tachyon-0.6.4.jar
file into theship/
directory for Flink on YARN. -
Specify the location of the jar file in the
HADOOP_CLASSPATH
environment variable (make sure its available on all cluster nodes as well). For example like this:export HADOOP_CLASSPATH=/pathToTachyon/client/target/tachyon-client-0.6.4-jar-with-dependencies.jar
Using Tachyon with Flink
To use Tachyon with Flink, just specify paths with the tachyon://
scheme.
If Tachyon is installed locally, a valid path would look like this
tachyon://localhost:19998/user/hduser/gutenberg
.