suspicious memory leak in producer (rdkafka)

Description
===========

I have a microservice that consumes messages from Kafka, do some work with it, and publish the result back to Kafka.

However it quickly get OOMKilled after started.

With help of memory profile, I managed to figure out that it's `rdk:broker0` contributed the biggest memory usage (In my example it's a 384MiB pod in Kubernetes)

![image](https://user-images.githubusercontent.com/7172560/171782920-4060d2c5-792e-41a0-b1ee-cf9de7f89552.png)

As seen in this report, there is no Python object that holds anything larger than 2MB from GC; It's `rdk:broker0` holding `4460` allocations and `165MiB` of memory unreleased.

Here is the `KafkaProducerService` code that calls `Producer`:

```python
# irrelevant code omitted 

class KafkaProducerService:
    '''
    main class
    '''
    def __init__(self, bootstrap_servers: str, topic: str) -> None:
        self.__producer = Producer({
            'bootstrap.servers': bootstrap_servers
        })

        self.__topic = topic
        self.__count = 0
        self.__previous_count_timestamp = datetime.now().timestamp()

    def publish(self, key: str, value: bytes, headers: dict) -> None:
        '''
        publish message
        '''
        try:
            self.__producer.produce(
                self.__topic,
                key=key,
                value=value,
                headers=headers
            )
        except BufferError as error:
            raise RuntimeError(
                "internal producer message queue is full"
            ) from error
        except KafkaException as error:
            raise RuntimeError(
                "error adding to producer message queue"
            ) from error

        num_messages_to_be_delievered = len(self.__producer)
        if num_messages_to_be_delievered > 1000:
            log.debug("wait for %s messages to be delivered to Kafka...",
                      num_messages_to_be_delievered)
            try:
                num_message = self.__producer.flush()
            except KafkaException as error:
                raise RuntimeError(
                    "error when flushing producer message queue to Kafka"
                ) from error

            log.debug("%d messages still in Kafka", num_message)

        self.__count += 1
        self.__count_published()

    def __count_published(self) -> None:
        current_count_timestamp = datetime.now().timestamp()
        if current_count_timestamp - self.__previous_count_timestamp >= 1:
            self.__previous_count_timestamp = current_count_timestamp

            if self.__count == 0:
                return

            log.info("%d messages published (%s messages pending for delivery)",
                     self.__count, len(self.__producer))
            self.__count = 0

```

How to reproduce
================

1. Install https://bloomberg.github.io/memray
2. Use the snippet code above, build a Python script that keeps re-publishing Kafka messages back to the same topic.
  - each message should be bigger than 50k
4. Run the script with memray for like 5mins.


Checklist
=========
Please provide the following information:

 - [x] confluent-kafka-python and librdkafka version (`confluent_kafka.version()` and `confluent_kafka.libversion()`):

```python
>>> confluent_kafka.version()
('1.8.2', 17302016)
>>> confluent_kafka.libversion()
('1.6.0', 17170687)
```

 - [x] Apache Kafka broker version:

```
3.0.0
```

 - [x] Client configuration: `{...}`

default, except `bootstrap.servers`

 - [x] Operating system:

Reproduced on both Alpine and Debian (Bullseye)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

suspicious memory leak in producer (rdkafka) #1361

Description

How to reproduce

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

suspicious memory leak in producer (rdkafka) #1361

Description

Description

How to reproduce

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions