Description
Description
I was working on code to produce messages to a Kafka topic. The messages are protobuf bytes and I used SerializingProducer to pass the schema information. I tried a separate method where I imitated what was done here
It was able to produce and flush messages at a rate of about 12 messages per second. For my use case, this is way too slow.
When I just used Producer and took out any schema information, the rate suddenly jumped to ~100s of messages per second.
How to reproduce
- Write a job to put thousands of messages onto a Kafka topic
- Have the job put schema information into each message and time it
- Compare it to the same job that does put in schema information
Checklist
Please provide the following information:
- confluent-kafka-python and librdkafka version (
confluent_kafka.version()
andconfluent_kafka.libversion()
):
From requirements.txt
with the Python library:
confluent-kafka==1.7.0
From console:
>>> import confluent_kafka
>>> confluent_kafka.libversion()
('1.7.0', 17236223)
>>> confluent_kafka.version()
('1.7.0', 17235968)
>>>
-
Apache Kafka broker version:
Confluent Cloud -
Client configuration:
{...}
Producer config:
{'bootstrap.servers': '...',
'error_cb': <function error_cb at 0x7fd2dc01f820>,
'sasl.mechanism': 'PLAIN',
'sasl.password': '***************************',
'sasl.username': '***************',
'security.protocol': 'SASL_SSL'}
- Operating system:
Run from docker container derived from Python 3.8.8 base
First line of Dockerfile:
FROM python:3.8.8
- Provide client logs (with
'debug': '..'
as necessary)
Using SerializingProducer:
INFO:root:Now adding 221 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:05:42 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:05:42.031972+00:00 : Adding message starting `user_i_d: "******` onto Kafka buffer under topic `***`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 221 messages: 34.54440498352051 seconds
Using Producer:
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now adding 54 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:06:16.675951+00:00 : Adding message starting `b'\n\****\x1` onto Kafka buffer under topic `****`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 54 messages: 0.18948936462402344 seconds
- Critical issue: Not critical, have a workaround