The Kafka service will be listed as “Unhealthy” when it detects any underreplicated partitions. This error condition usually indicates a malfunctioning broker. Use the dcos beta-kafka topic under_replicated_partitions and dcos beta-kafka topic describe <topic-name> commands to find the problem broker and determine what actions are required.
Possible repair actions include dcos beta-kafka broker restart <broker-id> and dcos beta-kafka broker replace <broker-id>. The replace operation is destructive and will irrevocably lose all data associated with the broker. The restart operation is not destructive and indicates an attempt to restart a broker process.
Configuration Update Errors
The bolded entries below indicate the necessary changes needed to create a valid configuration:
$ curl -H "Authorization: token=$AUTH_TOKEN" "$DCOS_URI/service/kafka/v1/plan"
GET /service/kafka/v1/plan HTTP/1.1
{
"phases": [
{
"id": "c26bec40-3290-4501-b3da-945d0abef55f",
"name": "Reconciliation",
"steps": [
{
"id": "e56d2e4a-e05b-42ad-b4a0-d74b68d206af",
"message": "Reconciliation complete",
"name": "Reconciliation",
"status": "COMPLETE"
},
"status": "COMPLETE"
]
},
{
"id": "226a780e-132f-4fea-b584-7712b07cf357",
"name": "Update to: 72cecf77-dbc5-4ae6-8f91-c88702b9a6a8",
"steps": [
{
"id": "d4e72ee8-4608-423a-9566-1632ff0ab211",
"message": "Broker-0 is COMPLETE",
"name": "broker-0",
"status": "COMPLETE"
},
{
"id": "3ea30deb-9660-42f1-ad23-bd418d718999",
"message": "Broker-1 is COMPLETE",
"name": "broker-1",
"status": "COMPLETE"
},
{
"id": "4da21440-de73-4772-9c85-877f2677e62a",
"message": "Broker-2 is COMPLETE",
"name": "broker-2",
"status": "COMPLETE"
}
],
"status": "COMPLETE"
}
],
"errors": [
"Validation error on field \"BROKER_COUNT\": Decreasing this value (from 3 to 2) is not supported."
],
"status": "Error"
}
Replacing a Permanently Failed Server
If a machine has permanently failed, manual intervention is required to replace the broker or brokers that resided on that machine. Because DC/OS Kafka uses persistent volumes, the service continuously attempts to replace brokers where their data has been persisted. In the case where a machine has permanently failed, use the Kafka CLI to replace the brokers.
In the example below, the broker with id 0 will be replaced on new machine as long as cluster resources are sufficient to satisfy the service’s placement constraints and resource requirements.
```bash
$ dcos beta-kafka broker replace 0
```
Extending the Kill Grace Period
If the Kafka brokers are not completing the clean shutdown within the configured
brokers.kill_grace_period (Kill Grace Period), extend the Kill Grace Period, see Managing - Extend the Kill Grace Period.