Skip to content

SerializingProducer is much slower than Producer in Python #1440

Open
@zacharydestefano89

Description

@zacharydestefano89

Description

I was working on code to produce messages to a Kafka topic. The messages are protobuf bytes and I used SerializingProducer to pass the schema information. I tried a separate method where I imitated what was done here

It was able to produce and flush messages at a rate of about 12 messages per second. For my use case, this is way too slow.

When I just used Producer and took out any schema information, the rate suddenly jumped to ~100s of messages per second.

How to reproduce

  1. Write a job to put thousands of messages onto a Kafka topic
  2. Have the job put schema information into each message and time it
  3. Compare it to the same job that does put in schema information

Checklist

Please provide the following information:

  • confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):

From requirements.txt with the Python library:
confluent-kafka==1.7.0

From console:

>>> import confluent_kafka
>>> confluent_kafka.libversion()
('1.7.0', 17236223)
>>> confluent_kafka.version()
('1.7.0', 17235968)
>>> 
  • Apache Kafka broker version:
    Confluent Cloud

  • Client configuration: {...}

Producer config:

{'bootstrap.servers': '...',
 'error_cb': <function error_cb at 0x7fd2dc01f820>,
 'sasl.mechanism': 'PLAIN',
 'sasl.password': '***************************',
 'sasl.username': '***************',
 'security.protocol': 'SASL_SSL'}
  • Operating system:

Run from docker container derived from Python 3.8.8 base

First line of Dockerfile:
FROM python:3.8.8

  • Provide client logs (with 'debug': '..' as necessary)

Using SerializingProducer:

INFO:root:Now adding 221 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:05:42 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:05:42.031972+00:00 : Adding message starting `user_i_d: "******` onto Kafka buffer under topic `***`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 221 messages: 34.54440498352051 seconds

Using Producer:

[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now adding 54 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:06:16.675951+00:00 : Adding message starting `b'\n\****\x1` onto Kafka buffer under topic `****`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 54 messages: 0.18948936462402344 seconds
  • Critical issue: Not critical, have a workaround

Metadata

Metadata

Assignees

Labels

enhancementRequesting a feature changequestionA question about how to use or about expected behavior of the library

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions