QuickstartΒΆ

You can get up and running with the full Confluent platform quickly on a single server. If you are interested in deploying with Docker, please refer to our Docker Quickstart.

In this quickstart we’ll show how to run ZooKeeper, Kafka, Kafka Connect, and Control Center and then write and read some data to/from Kafka.

  1. Download and install the Confluent platform. In this quickstart we’ll use the zip archive, but there are many other installation options.

    Here is a high-level view of the contents of the package:

    confluent-3.1.1/bin/        # Driver scripts for starting/stopping services
    confluent-3.1.1/etc/        # Configuration files
    confluent-3.1.1/share/java/ # Jars
    

    If you installed from deb or rpm packages, the contents are installed globally and you’ll need to adjust the paths used below:

    /usr/bin/                  # Driver scripts for starting/stopping services, prefixed with <package> names
    /etc/<package>/            # Configuration files
    /usr/share/java/<package>/ # Jars
    
  2. Start Zookeeper. Since this is a long-running service, you should run it in its own terminal (or at least run it in the background and redirect output to a file):

    # The following commands assume you exactly followed the instructions above.
    # This means, for example, that at this point your current working directory
    # must be confluent-3.1.1/.
    $ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
    
  3. Start Kafka, also in its own terminal.

    $ ./bin/kafka-server-start ./etc/kafka/server.properties
    
  4. Copy the settings for Kafka Connect, and add support for the interceptors:

    $ cp etc/kafka/connect-distributed.properties /tmp/connect-distributed.properties
    $ echo "" >> /tmp/connect-distributed.properties
    $ cat <<EOF >> /tmp/connect-distributed.properties
    consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
    producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
    EOF
    
  5. Start Kafka Connect in its own terminal.

    $ ./bin/connect-distributed /tmp/connect-distributed.properties
    
  6. Start Control Center in its own terminal (set to run with one replica):

    $ cp etc/confluent-control-center/control-center.properties /tmp/control-center.properties
    $ cat <<EOF >> /tmp/control-center.properties
    confluent.controlcenter.internal.topics.partitions=1
    confluent.controlcenter.internal.topics.replication=1
    confluent.controlcenter.command.topic.replication=1
    confluent.monitoring.interceptor.topic.partitions=1
    confluent.monitoring.interceptor.topic.replication=1
    EOF
    $ ./bin/control-center-start /tmp/control-center.properties
    
  7. Now we have all the services running and can start building a data pipeline. As an example, let’s create a small job to create data. Open an editor, enter the following text (our apologies to William Carlos Williams), and save this as “totail.sh”.

    #!/usr/bin/env bash
    
    file=/tmp/totail.txt
    
    while true; do
        echo This is just to say >> ${file}
        echo >> ${file}
        echo I have eaten >> ${file}
        echo the plums >> ${file}
        echo that were in >> ${file}
        echo the icebox >> ${file}
        echo >> ${file}
        echo and which >> ${file}
        echo you were probably >> ${file}
        echo saving >> ${file}
        echo for breakfast >> ${file}
        echo >> ${file}
        echo Forgive me >> ${file}
        echo they were delicious >> ${file}
        echo so sweet >> ${file}
        echo and so cold >> ${file}
        sleep 1
    done
    
  8. Start this script. (It writes the poem to /tmp/totail.txt once per second. We will use Kafka Connect to load that into a Kafka topic.)

    $ bash totail.sh
    
  9. Use the Kafka Topics tool to create a new topic:

    $ ./bin/kafka-topics --zookeeper localhost:2181 --create --topic poem \
       --partitions 1 --replication-factor 1
    
  10. Now, open your web browser, and go to the URL http://localhost:9021/. This will open up the web interface for Control Center.

  11. Click on the Kafka Connect button on the left side. You will see a list of sources. Click the “new source” button. Create a new source: class is FileSource, input file is /tmp/totail.txt, topic is “poem”. Save the new source. Give it a name like “Test Poem Source.” You will see it in a list of sources.

  12. Click the “sinks” tab. Click the “new sink” button. Create a new source: class is FileSink, output file is /tmp/sunk.txt, topic is “poem”, max tasks is 1. Give it the name “Test Poem Sink.” You will see it in a list of sinks.

  13. In a terminal window, open the file /tmp/sunk.txt. This file will have almost the same contents as /tmp/totail.txt (it may be a few lines behind, depending on when you check).

  14. Now that you have data flowing into and out of Kafka, let’s monitor what’s going on! Click on the button on the left side that says “Stream Monitoring.” Very soon (a couple seconds on a fast server, longer on an overworked laptop), a chart will appear showing the total number of messages produced and consumed on the cluster. If you scroll down, you will see more details on the consumer group for your sink.

When you’re done testing, you can use Ctrl+C to shutdown each service, in the reverse order that you started them.

This simple guide only covered Kafka, Kafka Connect, and Control Center. See the documentation for each component for a quickstart guide specific to that component: