Description
Description
The issue is seen when a node is brought down in a 3-node cluster wherein one of the topic partitions had an ISR count of 1 against a min-ISR setting of 2. The Kafka producer client is configured to report errors using error_cb
callback setting. It is observed that the error code is reported as _MSG_TIMED_OUT
instead of a more intuitive _ISR_INSUFF
leading me to believe that a message was being sent to the node that was down.
How to reproduce
- Bring down a node in a minimal 3 node cluster and to a state where the ISR count for one/more topic partitions is less than the minimum configured ISR setting for the topic/broker.
- Create a producer with
acks=all
setting - Try to produce messages to a topic-partition with the min-ISR insufficiency
Checklist
Please provide the following information:
- confluent-kafka-python and librdkafka version (
confluent_kafka.version()
andconfluent_kafka.libversion()
):
confluent_kafka.version()
('2.3.0', 33751040)
confluent_kafka.libversion()
('2.3.0', 33751295)
-
Apache Kafka broker version:
3.6.0
-
Client configuration:
{ # Error reporting 'error_cb': self._error_notification, #'debug': 'broker,topic,msg', 'debug': 'broker,topic,msg', # Use compression 'compression.codec': 'lz4', # Set acks for reliability 'acks': 'all', 'max.in.flight.requests.per.connection': 1, # Configuration settings for HA 'topic.metadata.refresh.interval.ms': 120000 }
-
Operating system:
Debian GNU/Linux 11 (bullseye)
-
Provide client logs (with
'debug': '..'
as necessary)
2024-02-18 09:14:18,997 [PID=1:TID=Thread-6482:cfx.rda_messaging.kafka_client:_error_notification:788] ERROR - KafkaError{code=_TRANSPORT,val=-195,str="sasl_ssl://rda-kafka-controller-2.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/2: Connection setup timed out in state CONNECT (after 30025ms in state CONNECT, 1 identical error(s) suppressed)"}
2024-02-18 09:14:18,998 [PID=1:TID=Thread-6482:cfx.rda_messaging.kafka_client:_delivery_notification:759] ERROR - Message delivery failed: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}! Message=b'{"type": "SOURCE-EVENT", "sourceEventId": "f2b4d3de-46a6-4c7c-9243-f57c2489e18b", "id": "f2b4d3de-46a6-4c7c-9243-f57c2489e18b", "sourceSystemId": "56795ec7-f19b-429b-913b-b69bdc4a6baf", "sourceSystemName": "Kafka", "projectId": "337ddf08-cc96-11ee-8d32-0a26cc0b6dec", "eventCategory": "alerts", "customerId": "ac0c16424518460c8c0ef0f632515731", "createdat": 1708244619015.706, "sourceReceivedAt": 1708244619015.6702, "parentEventId": null, "payload": "{\"customerId\": \"58c5fe78-ca58-46a8-8db9-a172fec7d2bc\", \"assetId\": \"021456a7-b56a-4e0b-93a0-017f60a5d75f\", \"componentId\": \"f21056f7-cd02-43f6-b44c-c2fe2bcd12d1\", \"alertCategory\": \"Disk Partition\", \"alertType\": \"Disk Partition\", \"sourceSystemId\": \"CFX_PULSE\", \"projectId\": \"814ab092-83c3-4d0c-8805-063936ba3666\", \"environmentId\": \"a74b7e03-5d48-45fc-95fc-8df9ec47f3b2\", \"severity\": \"MAJOR\", \"assetType\": \"infra-account\", \"assetIpAddress\": \"10.199.0.118\", \"assetName\": \"SIM-m0.0-1-250\", \"componentName\": \"1.1\", \"sourceMechanism\": \"SNMP\", \"message\": \"Disk Partition \\\"flash[Disk:flash]\\\" Utilization [80.95%] exceeded configured value 25.0%\", \"raisedAt\": 1708241018415.1145, \"minimumOccurrence\": 0, \"sourceSystemName\": \"Syslog_udp_Notifications\"}", "status": "Completed"}', Key=b'f2b4d3de-46a6-4c7c-9243-f57c2489e18b'
.................
2024-02-19 05:26:45,329 [PID=1:TID=Thread-70:cfx.rda_messaging.kafka_client:_error_notification:789] ERROR - KafkaError{code=_TRANSPORT,val=-195,str="sasl_ssl://rda-kafka-broker-2.rda-kafka-broker-headless.rda-fabric.svc.cluster.local:9093/bootstrap: Connection setup timed out in state CONNECT (after 30028ms in state CONNECT)"} {"sourceevent": {"id": "06497124-e2fb-4cd3-a348-2cf14b3d849a", "received": 1708320405328.9392}}
%7|1708320417.127|PRODUCE|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ac0c16424518460c8c0ef0f632515731.ingestion-tracker [12]: Produce MessageSet with 1 message(s) (1025 bytes, ApiVersion 7, MsgVersion 2, MsgId 0, BaseSeq -1, PID{Invalid}, lz4)
%7|1708320417.170|REQERR|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ProduceRequest failed: Broker: Not enough in-sync replicas: explicit actions Retry,MsgNotPersisted
%7|1708320417.170|MSGSET|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ac0c16424518460c8c0ef0f632515731.ingestion-tracker [12]: MessageSet with 1 message(s) (MsgId 0, BaseSeq -1) encountered error: Broker: Not enough in-sync replicas (actions Retry,MsgNotPersisted) -
Provide broker log excerpts
-
Critical issue