Quickstart

You can get up and running with the full Confluent platform quickly on a single server. If you are interested in deploying with Docker, please refer to our Docker Quickstart.

In this quickstart we’ll show how to run ZooKeeper, Kafka, Kafka Connect, and Control Center and then write and read some data to/from Kafka.

Base Installation

  1. Download and install the Confluent platform. In this quickstart we’ll use the zip archive, but there are many other installation options.

    The operating system being used in this quickstart is Ubuntu and java is installed. If you’re unsure if you have java installed, run

$ java -version

If you receive an error run

$ sudo apt-get update && sudo apt-get install default-jre
  1. Download and extract the zip archive of the Confluent platform and place it in your home directory.

    Note

    In this quickstart we will use the zip archive, but there are many other installation options.

    Here is a high-level view of the contents of the package:

    confluent-3.2.1/bin/        # Driver scripts for starting/stopping services
    confluent-3.2.1/etc/        # Configuration files
    confluent-3.2.1/share/java/ # Jars
    

Zookeeper Configuration

  1. Start Zookeeper. Since this is a long-running service, you should run it in its own terminal, run it in the background and redirect output to a file or use screen. For our purposes here, we’ll use screen.

    Note

    Your current working directory should be the unzipped archive directory.

    $ screen -dmS zookeeper bash -c './bin/zookeeper-server-start ./etc/kafka/zookeeper.properties; exec bash'
    

Broker Configuration

  1. Configure Confluent Metrics Reporter by uncommenting the following settings in ./etc/kafka/server.properties

    $ cp ./etc/kafka/server.properties /tmp/kafka-server.properties && \
    sed -i 's/#metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter/metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter/g' /tmp/kafka-server.properties && \
    sed -i 's/#confluent.metrics.reporter.bootstrap.servers=localhost:9092/confluent.metrics.reporter.bootstrap.servers=localhost:9092/g' /tmp/kafka-server.properties && \
    sed -i 's/#confluent.metrics.reporter.zookeeper.connect=localhost:2181/confluent.metrics.reporter.zookeeper.connect=localhost:2181/g' /tmp/kafka-server.properties && \
    sed -i 's/#confluent.metrics.reporter.topic.replicas=1/confluent.metrics.reporter.topic.replicas=1/g' /tmp/kafka-server.properties
    
  2. Start Kafka in a new screen.

    $ screen -dmS kafka bash -c "./bin/kafka-server-start /tmp/kafka-server.properties; exec bash"
    
  3. Copy the settings for Kafka Connect and add support for the interceptors:

    $ cp ./etc/kafka/connect-distributed.properties /tmp/connect-distributed.properties
    $ cat <<EOF >> /tmp/connect-distributed.properties
    
    # Interceptor setup
    consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
    producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
    EOF
    
  4. Start Kafka Connect in its own screen.

    $ screen -dmS connect-distributed bash -c "./bin/connect-distributed /tmp/connect-distributed.properties; exec bash"
    

Control Center Configuration

  1. Start Control Center in its own terminal (set to run with one replica):

    $ cp ./etc/confluent-control-center/control-center.properties /tmp/control-center.properties
    $ cat <<EOF >> /tmp/control-center.properties
    
    # Quickstart partition and replication values
    confluent.controlcenter.internal.topics.partitions=1
    confluent.controlcenter.internal.topics.replication=1
    confluent.controlcenter.command.topic.replication=1
    confluent.monitoring.interceptor.topic.partitions=1
    confluent.monitoring.interceptor.topic.replication=1
    EOF
    $ screen -dmS control-center bash -c "./bin/control-center-start /tmp/control-center.properties; exec bash"
    

Stream Monitoring Setup

  1. Now we have all the services running and can start building a data pipeline. As an example, let’s create a small job to create data. Open an editor, enter the following text (our apologies to William Carlos Williams), and save this as “totail.sh”.

    cat <<EOF >> /tmp/totail.sh
    #!/usr/bin/env bash
    
    file=/tmp/totail.txt
    
    while true; do
        echo This is just to say >> \${file}
        echo >> \${file}
        echo I have eaten >> \${file}
        echo the plums >> \${file}
        echo that were in >> \${file}
        echo the icebox >> \${file}
        echo >> \${file}
        echo and which >> \${file}
        echo you were probably >> \${file}
        echo saving >> \${file}
        echo for breakfast >> \${file}
        echo >> \${file}
        echo Forgive me >> \${file}
        echo they were delicious >> \${file}
        echo so sweet >> \${file}
        echo and so cold >> \${file}
        sleep 1
    done
    EOF
    
  2. Start this script. (It writes the poem to /tmp/totail.txt once per second. We will use Kafka Connect to load that into a Kafka topic.)

    $ sudo chmod u+x /tmp/totail.sh
    $ screen -dmS totail bash -c "/tmp/totail.sh; exec bash"
    
  3. Use the Kafka Topics tool to create a new topic:

    $ ./bin/kafka-topics --zookeeper localhost:2181 --create --topic poem \
       --partitions 1 --replication-factor 1
    
  4. If everything has been done correctly to this point, you should have 5 screens running each named for the services running them. The numbers beside the name will vary.

    $ screen -ls
      There are screens on:
        11369.totail
        11360.control-center
        11296.connect-distributed
        11223.kafka
        11184.zookeeper
      5 Sockets in /var/run/screen/S-ubuntu.
    
  5. Click on the Kafka Connect button on the left side. On this page you can see a list of sources that have been configured - by default it will be empty. Click the “New source” button.

  6. From the Connection Class dropdown menu select FileStreamSourceConnector. Specify the Connection Name as Poem File Source. Once you have specified a name for the connection a set of other configuration options will appear.

  7. In the General section specify the file as /tmp/totail.txt and the topic as poem.

  8. Click on the Continue button, and then Save & Finish to apply the new configuration.

  9. From the Kafka Connect tab click on the “Sinks” tab. Click the “New sink” button. From the Topics dropdown list, choose “poem”. Click Continue.

  10. In the next screen set the Connection Class to “FileStreamSinkConnector” and set the Connection Name as Poem File Sink. Once you have specified a name for the connection a set of other configuration options will appear.

  11. In the General section specify the file as /tmp/sunk.txt. Click Continue and then “Save & Finish”.

  12. Now that you have data flowing into and out of Kafka, let’s monitor what’s going on! Open your web browser, and go to the URL http://localhost:9021/. This will open up the web interface for Control Center.

  13. Click on the button on the left side that says “Stream Monitoring.” Very soon (a couple seconds on a fast server, longer on an overworked laptop), a chart will appear showing the total number of messages produced and consumed on the cluster. If you scroll down, you will see more details on the consumer group for your sink.

When you’re done testing, you can use pkill screen to shut all the screens down.

This simple guide only covered Kafka, Kafka Connect, and Control Center. See the documentation for each component for a quickstart guide specific to that component: