The Kafka service will be listed as “Unhealthy” when it detects any underreplicated partitions. This error condition usually indicates a malfunctioning broker. Use the dcos beta-kafka topic under_replicated_partitions
and dcos beta-kafka topic describe <topic-name>
commands to find the problem broker and determine what actions are required.
Possible repair actions include dcos beta-kafka broker restart <broker-id>
and dcos beta-kafka broker replace <broker-id>
. The replace operation is destructive and will irrevocably lose all data associated with the broker. The restart operation is not destructive and indicates an attempt to restart a broker process.
Configuration Update Errors
The bolded entries below indicate the necessary changes needed to create a valid configuration:
$ curl -H "Authorization: token=$AUTH_TOKEN" "$DCOS_URI/service/kafka/v1/plan" GET /service/kafka/v1/plan HTTP/1.1 { "phases": [ { "id": "c26bec40-3290-4501-b3da-945d0abef55f", "name": "Reconciliation", "steps": [ { "id": "e56d2e4a-e05b-42ad-b4a0-d74b68d206af", "message": "Reconciliation complete", "name": "Reconciliation", "status": "COMPLETE" }, "status": "COMPLETE" ] }, { "id": "226a780e-132f-4fea-b584-7712b07cf357", "name": "Update to: 72cecf77-dbc5-4ae6-8f91-c88702b9a6a8", "steps": [ { "id": "d4e72ee8-4608-423a-9566-1632ff0ab211", "message": "Broker-0 is COMPLETE", "name": "broker-0", "status": "COMPLETE" }, { "id": "3ea30deb-9660-42f1-ad23-bd418d718999", "message": "Broker-1 is COMPLETE", "name": "broker-1", "status": "COMPLETE" }, { "id": "4da21440-de73-4772-9c85-877f2677e62a", "message": "Broker-2 is COMPLETE", "name": "broker-2", "status": "COMPLETE" } ], "status": "COMPLETE" } ], "errors": [ "Validation error on field \"BROKER_COUNT\": Decreasing this value (from 3 to 2) is not supported." ], "status": "Error" }
Replacing a Permanently Failed Server
If a machine has permanently failed, manual intervention is required to replace the broker or brokers that resided on that machine. Because DC/OS Kafka uses persistent volumes, the service continuously attempts to replace brokers where their data has been persisted. In the case where a machine has permanently failed, use the Kafka CLI to replace the brokers.
In the example below, the broker with id 0
will be replaced on new machine as long as cluster resources are sufficient to satisfy the service’s placement constraints and resource requirements.
```bash
$ dcos beta-kafka broker replace 0
```
Extending the Kill Grace Period
If the Kafka brokers are not completing the clean shutdown within the configured
brokers.kill_grace_period
(Kill Grace Period), extend the Kill Grace Period, see Managing - Extend the Kill Grace Period.