Introduction

Kafka Connect is a framework for scalably and reliably streaming data between Apache Kafka and other data systems. Connect makes it simple to use existing connector implementations for common data sources and sinks to move data into and out of Kafka. Kafka Connect’s applications are wide ranging. A source connector can ingest entire databases and stream table updates to Kafka topics or even collect metrics from all of your application servers into Kafka topics, making the data available for stream processing with low latency. A sink connector can deliver data from Kafka topics into secondary indexes like Elasticsearch or into batch systems such as Hadoop for offline analysis.

Kafka Connect is focused on streaming data to and from Kafka. This focus makes it much simpler for developers to write high quality, reliable, and high performance connector plugins and makes it possible for the framework to make guarantees that are difficult to achieve in other frameworks. Kafka Connect is an integral component of an ETL pipeline when combined with Kafka and a stream processing framework.

Kafka Connect can run either as a standalone process for testing and one-off jobs, or as a distributed, scalable, fault tolerant service supporting an entire organization. This allows it to scale down to development, testing, and small production deployments with a low barrier to entry and low operational overhead, and to scale up to support a large organization’s data pipeline.

The main benefits of using Kafka Connect are:

  • Data Centric Pipeline – use meaningful data abstractions to pull or push data to Kafka.
  • Flexibility and Scalability – run with streaming and batch-oriented systems on a single node or scaled to an organization-wide service.
  • Reusability and Extensibility – leverage existing connectors or extend them to tailor to your needs and lower time to production.

Requirements

  • Kafka 0.10.0.1-cp1
  • Required for Avro support: Schema Registry 3.0.1