Upgrade Guide

Upgrading from CP 3.0.x (Kafka 0.10.0.x-cp2) to CP 3.1.2 (Kafka 0.10.1.1-cp1)

Compatibility

Kafka Streams applications built with CP 3.1.2 (Kafka 0.10.1.1-cp1) are only compatible with Kafka clusters running CP 3.1.2 (Kafka 0.10.1.1-cp1).

  • Forward-compatible to CP 3.1.2 clusters (Kafka 0.10.0.x-cp2): Existing Kafka Streams applications built with CP 3.0.x (Kafka 0.10.0.x-cp2) will work with upgraded Kafka clusters running CP 3.1.2 (Kafka 0.10.1.1-cp1).

Upgrading your Kafka Streams applications to CP 3.1.2

To make use of CP 3.1.2 (Kafka 0.10.1.1-cp1), you just need to update the Kafka Streams dependency of your application to use the version number 0.10.1.1-cp1, and then recompile your application.

For example, in your pom.xml file:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-streams</artifactId>
    <!-- update version from 0.10.0.x-cp1 to 0.10.1.1-cp1 -->
    <version>0.10.1.1-cp1</version>
</dependency>

API changes

Kafka Streams and its API were improved and modified since the release of CP 3.0.x. Some of these changes are breaking changes that require you to update the code of your Kafka Streams applications. In this section we focus on only these breaking changes.

Stream grouping and aggregation

Grouping (i.e., repartitioning) and aggregation of the KStream API was significantly changed to be aligned with the KTable API. Instead of using a single method with many parameters, grouping and aggregation is now split into two steps. First, a KStream is transformed into a KGroupedStream that is a repartitioned copy of the original KStream. Afterwards, an aggregation can be performed on the KGroupedStream, resulting in a new KTable that contains the result of the aggregation.

Thus, the methods KStream#aggregateByKey(...), KStream#reduceByKey(...), and KStream#countByKey(...) were replaced by KStream#groupBy(...) and KStream#groupByKey(...) which return a KGroupedStream. While KStream#groupByKey(...) groups on the current key, KStream#groupBy(...) sets a new key and re-partitions the data to build groups on the new key. The new class KGroupedStream provides the corresponding methods aggregate(...), reduce(...), and count(...).

KStream stream = builder.stream(...);
Reducer reducer = new Reducer() { /* ... */ };

// old API
KTable newTable = stream.reduceByKey(reducer, name);

// new API, Group by existing key
KTable newTable = stream.groupByKey().reduce(reducer, name);
// or Group by a different key
KTable otherTable = stream.groupBy((key,value) -> value).reduce(reducer, name);

Auto Repartitioning

Previously when performing KStream#join(...), KStream#outerJoin(...) or KStream#leftJoin(...) operations after a key changing operation, i.e, KStream#map(...), KStream#flatMap(...), KStream#selectKey(...) the developer was required to call KStream#through(...) to repartition the mapped KStream This is no longer required. Repartitioning now happens automatically for all join operations.

KStream streamOne = builder.stream(...);
KStream streamTwo = builder.stream(...);
KeyValueMapper selector = new KeyValueMapper() { /* ... */ };
ValueJoiner joiner = new ValueJoiner { /* ... */ };
JoinWindows windows = JoinWindows.of("the-join").within(60 * 1000);

// old API
KStream oldJoined = streamOne.selectKey(selector)
                             .through("repartitioned-topic")
                             .join(streamTwo,
                                   joiner,
                                   windows);

// new API
KStream newJoined = streamOne.selectKey((key,value) -> value)
                             .join(streamTwo,
                                   joiner
                                   windows);

TopologyBuilder

Two public method signatures have been changed on TopologyBuilder, TopologyBuilder#sourceTopics(String applicationId) and TopologyBuilder#topicGroups(String applicationId). These methods no longer take applicationId as a parameter and instead you should call TopologyBuilder#setApplicationId(String applicationId) before calling one of these methods.

TopologyBuilder builder = new TopologyBuilder();
...

// old API
Set<String> topics = topologyBuilder.sourceTopics("applicationId");
Map<Integer, TopicsInfo> topicGroups = topologyBuilder.topicGroups("applicationId");

// new API
topologyBuilder.setApplicationId("applicationId");
Set<String> topics = topologyBuilder.sourceTopics();
Map<Integer, TopicsInfo> topicGroups = topologyBuilder.topicGroups();

DSL: New parameters to specify state store names

Apache Kafka 0.10.1 introduces Interactive Queries, which allow you to directly query state stores of a Kafka Streams application. This new feature required a few changes to the operators in the DSL. Starting with Kafka 0.10.1, state stores must be always be “named”, which includes both explicitly used state stores (e.g., defined by the user) and internally used state stores (e.g., created behind the scenes by operations such as count()). This naming is a prerequisite to make state stores queryable. As a result of this, the previous “operator name” is now the state store name. This change affects KStreamBuilder#table(...) and windowed aggregates KGroupedStream#count(...), #reduce(...), and #aggregate(...).

// old API
builder.table("topic");
builder.table(keySerde, valSerde, "topic");

table2 = table1.through("topic");

stream.countByKey(TimeWindows.of("windowName", 1000)); // window has a name

// new API
builder.table("topic", "storeName"); // requires to provide a store name to make KTable queryable
builder.table(keySerde, valSerde, "topic", "storeName"); // requires to provide a store name to make KTable queryable

table2 = table1.through("topic", "storeName"); // requires to provide a store name to make KTable queryable

// for changes of countByKey() -> groupByKey().count(...), please see example above
// for changes of TimeWindows.of(...), please see example below
stream.groupByKey().count(TimeWindows.of(1000), "countStoreName"); // window name removed, store name added

Windowing

The API for JoinWindows was improved. It is not longer possible to define a window with a default size (of zero). Furthermore, windows are not named anymore. Rather, any such naming is now done for state stores. See section DSL: New parameters to specify state store names above).

// old API
JoinWindows.of("name"); // defines window with size zero
JoinWindows.of("name").within(60 * 1000L);

TimeWindows.of("name", 60 * 1000L);
UnlimitedWindows.of("name", 60 * 1000L);

// new API, no name, requires window size
JoinWindows.of(0); // no name; set window size explicitly to zero
JoinWindows.of(60 * 1000L); // no name

TimeWindows.of(60 * 1000L); // not required to specify a name anymore
UnlimitedWindows.of(); // not required to specify a name anymore

Full upgrade workflow

A typical workflow for upgrading Kafka Streams applications from CP 3.0.x to CP 3.1.2 has the following steps:

  1. Upgrade your application: See upgrade instructions above.
  2. Stop the old application: Stop the old version of your application, i.e. stop all the application instances that are still running the old version of the application.
  3. Upgrade your Kafka cluster: See kafka updrade instructions.
  4. Start the upgraded application: Start the upgraded version of your application, with as many instances as needed. By default, the upgraded application will resume processing its input data from the point where the old version was stopped (see previous step).