QuickstartΒΆ
You can get up and running with the full Confluent platform quickly on a single server. In this quickstart we’ll show how to run ZooKeeper, Kafka, Kafka Connect, and Control Center and then write and read some data from/to Kafka.
Download and install the Confluent platform. In this quickstart we’ll use the zip archive, but there are many other installation options.
$ wget http://packages.confluent.io/archive/3.0/confluent-3.0.1-2.11.zip $ unzip confluent-3.0.1-2.11.zip $ cd confluent-3.0.1
Here is a high-level view of the contents of the package:
confluent-3.0.1/bin/ # Driver scripts for starting/stopping services confluent-3.0.1/etc/ # Configuration files confluent-3.0.1/share/java/ # Jars
If you installed from deb or rpm packages, the contents are installed globally and you’ll need to adjust the paths used below:
/usr/bin/ # Driver scripts for starting/stopping services, prefixed with <package> names /etc/<package>/ # Configuration files /usr/share/java/<package>/ # Jars
Start Zookeeper. Since this is a long-running service, you should run it in its own terminal (or at least run it in the background and redirect output to a file):
# The following commands assume you exactly followed the instructions above. # This means, for example, that at this point your current working directory # must be confluent-3.0.1/. $ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
Start Kafka, also in its own terminal.
$ ./bin/kafka-server-start ./etc/kafka/server.properties
Start the Schema Registry, also in its own terminal.
$ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties
Copy the settings for Kafka Connect, and add support for the interceptors:
$ cp etc/schema-registry/connect-avro-distributed.properties /tmp/connect-distributed.properties $ echo "" >> /tmp/connect-distributed.properties $ cat <<EOF >> /tmp/connect-distributed.properties consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor EOF
Start Kafka Connect in its own terminal.
$ ./bin/connect-distributed /tmp/connect-distributed.properties
Start Control Center in its own terminal (set to run with one replica):
$ cp etc/confluent-control-center/control-center.properties /tmp/control-center.properties $ cat <<EOF >> /tmp/control-center.properties confluent.controlcenter.internal.topics.partitions=1 confluent.controlcenter.internal.topics.replication=1 confluent.monitoring.interceptor.topic.partitions=1 confluent.monitoring.interceptor.topic.replication=1 EOF $ ./bin/control-center-start /tmp/control-center.properties
Now we have all the services running and can start building a data pipeline. As an example, let’s create a small job to create data. Open an editor, enter the following text (our apologies to William Carlos Williams), and save this as “totail.sh”.
#!/usr/bin/env bash file=/tmp/totail.txt while true; do echo This is just to say >> ${file} echo >> ${file} echo I have eaten >> ${file} echo the plums >> ${file} echo that were in >> ${file} echo the icebox >> ${file} echo >> ${file} echo and which >> ${file} echo you were probably >> ${file} echo saving >> ${file} echo for breakfast >> ${file} echo >> ${file} echo Forgive me >> ${file} echo they were delicious >> ${file} echo so sweet >> ${file} echo and so cold >> ${file} sleep 1 done
Start this script. (It writes the poem to
/tmp/totail.txt
once per second. We will use Kafka Connect to load that into a Kafka topic.)$ bash totail.sh
Use the Kafka Topics tool to create a new topic:
$ ./bin/kafka-topics --zookeeper localhost:2181 --create --topic poem \ --partitions 1 --replication-factor 1
Now, open your web browser, and go to the URL http://localhost:9021/. This will open up the web interface for Control Center.
Click on the Kafka Connect button on the left side. You will see a list of sources. Click the “new source” button. Create a new source: class is FileSource, input file is
/tmp/totail.txt
, topic is “poem”. Save the new source. Give it a name like “Test Poem Source.” You will see it in a list of sources.Click the “sinks” tab. Click the “new sink” button. Create a new source: class is FileSink, output file is
/tmp/sunk.txt
, topic is “poem”, max tasks is 1. Give it the name “Test Poem Sink.” You will see it in a list of sinks.In a terminal window, open the file
/tmp/sunk.txt
. This file will have almost the same contents as/tmp/totail.txt
(it may be a few lines behind, depending on when you check).Now that you have data flowing into and out of Kafka, let’s monitor what’s going on! Click on the button on the left side that says “Stream Monitoring.” Very soon (a couple seconds on a fast server, longer on an overworked laptop), a chart will appear showing the total number of messages produced and consumed on the cluster. If you scroll down, you will see more details on the consumer group for your sink.
When you’re done testing, you can use Ctrl+C
to shutdown each service, in the reverse order that you started them.
This simple guide only covered Kafka, Kafka Connect, and Control Center. See the documentation for each component for a quickstart guide specific to that component: