Skip to content

Kafka producer does not report _ISR_INSUFF as error code when ISR count is less; instead reports _MSG_TIMED_OUT #1711

Open
@prashantochatterjee

Description

@prashantochatterjee

Description

The issue is seen when a node is brought down in a 3-node cluster wherein one of the topic partitions had an ISR count of 1 against a min-ISR setting of 2. The Kafka producer client is configured to report errors using error_cb callback setting. It is observed that the error code is reported as _MSG_TIMED_OUT instead of a more intuitive _ISR_INSUFF leading me to believe that a message was being sent to the node that was down.

How to reproduce

  • Bring down a node in a minimal 3 node cluster and to a state where the ISR count for one/more topic partitions is less than the minimum configured ISR setting for the topic/broker.
  • Create a producer with acks=all setting
  • Try to produce messages to a topic-partition with the min-ISR insufficiency

Checklist

Please provide the following information:

  • confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):

confluent_kafka.version()
('2.3.0', 33751040)
confluent_kafka.libversion()
('2.3.0', 33751295)

  • Apache Kafka broker version: 3.6.0

  • Client configuration: { # Error reporting 'error_cb': self._error_notification, #'debug': 'broker,topic,msg', 'debug': 'broker,topic,msg', # Use compression 'compression.codec': 'lz4', # Set acks for reliability 'acks': 'all', 'max.in.flight.requests.per.connection': 1, # Configuration settings for HA 'topic.metadata.refresh.interval.ms': 120000 }

  • Operating system: Debian GNU/Linux 11 (bullseye)

  • Provide client logs (with 'debug': '..' as necessary)
    2024-02-18 09:14:18,997 [PID=1:TID=Thread-6482:cfx.rda_messaging.kafka_client:_error_notification:788] ERROR - KafkaError{code=_TRANSPORT,val=-195,str="sasl_ssl://rda-kafka-controller-2.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/2: Connection setup timed out in state CONNECT (after 30025ms in state CONNECT, 1 identical error(s) suppressed)"}
    2024-02-18 09:14:18,998 [PID=1:TID=Thread-6482:cfx.rda_messaging.kafka_client:_delivery_notification:759] ERROR - Message delivery failed: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}! Message=b'{"type": "SOURCE-EVENT", "sourceEventId": "f2b4d3de-46a6-4c7c-9243-f57c2489e18b", "id": "f2b4d3de-46a6-4c7c-9243-f57c2489e18b", "sourceSystemId": "56795ec7-f19b-429b-913b-b69bdc4a6baf", "sourceSystemName": "Kafka", "projectId": "337ddf08-cc96-11ee-8d32-0a26cc0b6dec", "eventCategory": "alerts", "customerId": "ac0c16424518460c8c0ef0f632515731", "createdat": 1708244619015.706, "sourceReceivedAt": 1708244619015.6702, "parentEventId": null, "payload": "{\"customerId\": \"58c5fe78-ca58-46a8-8db9-a172fec7d2bc\", \"assetId\": \"021456a7-b56a-4e0b-93a0-017f60a5d75f\", \"componentId\": \"f21056f7-cd02-43f6-b44c-c2fe2bcd12d1\", \"alertCategory\": \"Disk Partition\", \"alertType\": \"Disk Partition\", \"sourceSystemId\": \"CFX_PULSE\", \"projectId\": \"814ab092-83c3-4d0c-8805-063936ba3666\", \"environmentId\": \"a74b7e03-5d48-45fc-95fc-8df9ec47f3b2\", \"severity\": \"MAJOR\", \"assetType\": \"infra-account\", \"assetIpAddress\": \"10.199.0.118\", \"assetName\": \"SIM-m0.0-1-250\", \"componentName\": \"1.1\", \"sourceMechanism\": \"SNMP\", \"message\": \"Disk Partition \\\"flash[Disk:flash]\\\" Utilization [80.95%] exceeded configured value 25.0%\", \"raisedAt\": 1708241018415.1145, \"minimumOccurrence\": 0, \"sourceSystemName\": \"Syslog_udp_Notifications\"}", "status": "Completed"}', Key=b'f2b4d3de-46a6-4c7c-9243-f57c2489e18b'
    .................
    2024-02-19 05:26:45,329 [PID=1:TID=Thread-70:cfx.rda_messaging.kafka_client:_error_notification:789] ERROR - KafkaError{code=_TRANSPORT,val=-195,str="sasl_ssl://rda-kafka-broker-2.rda-kafka-broker-headless.rda-fabric.svc.cluster.local:9093/bootstrap: Connection setup timed out in state CONNECT (after 30028ms in state CONNECT)"} {"sourceevent": {"id": "06497124-e2fb-4cd3-a348-2cf14b3d849a", "received": 1708320405328.9392}}
    %7|1708320417.127|PRODUCE|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ac0c16424518460c8c0ef0f632515731.ingestion-tracker [12]: Produce MessageSet with 1 message(s) (1025 bytes, ApiVersion 7, MsgVersion 2, MsgId 0, BaseSeq -1, PID{Invalid}, lz4)
    %7|1708320417.170|REQERR|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ProduceRequest failed: Broker: Not enough in-sync replicas: explicit actions Retry,MsgNotPersisted
    %7|1708320417.170|MSGSET|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ac0c16424518460c8c0ef0f632515731.ingestion-tracker [12]: MessageSet with 1 message(s) (MsgId 0, BaseSeq -1) encountered error: Broker: Not enough in-sync replicas (actions Retry,MsgNotPersisted)

  • Provide broker log excerpts

  • Critical issue

Metadata

Metadata

Assignees

Labels

enhancementRequesting a feature change

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions