KAFKA-19760: RecordTooLargeExceptions in group coordinator when offsets.topic.compression.codec is used #20653

izzyharker · 2025-10-07T14:53:33Z

The group coordinator has been having issues with unknown errors. The
theory is that this is caused by optimistic compression estimates which
cause unchecked batch overflows when trying to write.

This PR adds a check for uncompressed record size to flush batches more
eagerly and avoid overfilling partially-full batches. This should make
the group coordinator errors less frequent.

Also added tests to ensure this change does not impact desired behavior
for large compressible records.

Reviewers: Sean Quah [email protected], David Jacot
[email protected]

squah-confluent

Thanks for the patch! I left a few comments.

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java

coordinator-common/src/test/java/org/apache/kafka/coordinator/common/runtime/TestUtil.java

...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java

squah-confluent

Thanks for updating the PR!

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java

squah-confluent · 2025-10-09T08:35:25Z

...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java

    }

+    @Test
+    public void testLargeCompressibleRecordTriggersFlushAndSucceeds() throws Exception {


Could we add a variant of this test that would try to pack the large record into the same batch as write1 under the old implementation? 3x of the max batch size is too large for that and we'd need something like 0.75 of the max batch size. We can use @ParameterizedTest, @ValueSource and add a double parameter for the fraction of the max batch size to use.

Added a variant but separated it out into a different test because they behave slightly differently. When the record is smaller than batch size but won't fit in the current batch, it flushes the current batch but then doesn't trigger a second flush on the new batch. The 3x batch size triggers a second flush as well so having two tests to check both behaviors is good.

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java

dajac · 2025-10-14T09:07:01Z

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java

+                    int estimatedSizeUpperBound = AbstractRecords.estimateSizeInBytes(
+                        currentBatch.builder.magic(),
+                        CompressionType.NONE,
+                        recordsToAppend
+                    );


It is kind of annoying that we compute the size twice, especially that estimatedSize is estimatedSizeUpperBound with a fixed factor. Could we combine?

Otherwise, I wonder whether we should just remove the check on estimatedSize below and rely on the check from the log layer. What do you think?

I don't think we can combine them. But we can remove the estimatedSize check and rely on the log layer. Then when we have an overly large atomic write, we will 1) flush the current batch, 2) write a new batch, 3) flush it immediately, which fails. The downside is that we will do a bunch of extra work for oversized writes.

@izzyharker what do you think about removing the estimatedSize check?

It's probably fine either way? If the issue doing the size check twice we could use the estimated ratio on the uncompressed size as an estimate rather than making two method calls.

The estimatedSize is likely wrong anyway. I am for removing it. @squah-confluent Is it ok for you?

Yes, I'm fine with removing the check against estimatedSize entirely.

Sounds good.

dajac · 2025-10-14T09:14:23Z

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java

-                // If flushing fails, we don't catch the exception in order to let
-                // the caller fail the current operation.
-                maybeFlushCurrentBatch(currentTimeMs);
+                if (isAtomic && !currentBatch.builder.hasRoomFor(0)) {


I am not sure to understand the isAtomic here. Let's say that we have two records that must be written automatically, we still have space in the batch for others. Why do we force a flush?

We don't force a flush. We only flush if we took the atomic path and uncompressed batch size >= max.message.size, which means the next atomic write will flush the current batch before writing.

Ok. Would it be possible to put !currentBatch.builder.hasRoomFor(0) within maybeFlushCurrentBatch or do we really need to rely on isAtomic?

I don't see the harm in moving it to maybeFlushCurrentBatch.

…server_errors_iharker

...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java

dajac

lgtm, thanks

…ts.topic.compression.codec is used (#20653) The group coordinator has been having issues with unknown errors. The theory is that this is caused by optimistic compression estimates which cause unchecked batch overflows when trying to write. This PR adds a check for uncompressed record size to flush batches more eagerly and avoid overfilling partially-full batches. This should make the group coordinator errors less frequent. Also added tests to ensure this change does not impact desired behavior for large compressible records. Reviewers: Sean Quah <[email protected]>, David Jacot <[email protected]>

dajac · 2025-10-16T09:15:00Z

Merged to trunk and 4.1. I was not able to cherry-pick it to 4.0 due to some conflicts. @izzyharker Could you please open a PR for 4.0 branch?

…ts.topic.compression.codec is used (4.0) (#20715) This PR backports the change from #20653 to 4.0 The group coordinator has been having issues with unknown errors. The theory is that this is caused by optimistic compression estimates which cause unchecked batch overflows when trying to write. This PR adds a check for uncompressed record size to flush batches more eagerly and avoid overfilling partially-full batches. This should make the group coordinator errors less frequent. Also added tests to ensure this change does not impact desired behavior for large compressible records. Reviewers: Sean Quah <[email protected]>, David Jacot <[email protected]>

chia7712 · 2025-10-18T14:51:03Z

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java

        private void maybeFlushCurrentBatch(long currentTimeMs) {
            if (currentBatch != null) {
-                if (currentBatch.builder.isTransactional() || (currentBatch.appendTimeMs - currentTimeMs) >= appendLingerMs) {
+                if (currentBatch.builder.isTransactional() || (currentBatch.appendTimeMs - currentTimeMs) >= appendLingerMs || !currentBatch.builder.hasRoomFor(0)) {


@majialoong and I were discussing the condition (currentBatch.appendTimeMs - currentTimeMs) >= appendLingerMs. The correct version seems to be (currentTimeMs - currentBatch.appendTimeMs) >= appendLingerMs. Or we can remove it, since a lingerTimeoutTask already exists

WDYT?

izzyharker added 2 commits October 7, 2025 09:48

Updated size check

e018783

Test updates

a518c8d

github-actions bot added triage PRs from the community group-coordinator labels Oct 7, 2025

izzyharker marked this pull request as ready for review October 7, 2025 14:58

dajac self-requested a review October 7, 2025 15:00

dajac added ci-approved and removed triage PRs from the community labels Oct 7, 2025

dajac changed the title ~~[KAFKA-19760]: RecordTooLargeExceptions in group coordinator when offsets.topic.compression.codec is used~~ KAFKA-19760: RecordTooLargeExceptions in group coordinator when offsets.topic.compression.codec is used Oct 7, 2025

izzyharker added 2 commits October 7, 2025 10:04

Test imports

1ce3946

Style fix

b71eab3

squah-confluent reviewed Oct 8, 2025

View reviewed changes

izzyharker added 2 commits October 8, 2025 10:03

Tests updates

654f20c

Updated comments and method name

011b013

izzyharker requested a review from squah-confluent October 8, 2025 15:48

squah-confluent reviewed Oct 8, 2025

View reviewed changes

...tor-common/src/main/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntime.java Outdated Show resolved Hide resolved

Update javadoc

7f96557

squah-confluent reviewed Oct 9, 2025

View reviewed changes

izzyharker added 3 commits October 9, 2025 10:22

Updated javadoc

94fe851

Updated tests

8185e0c

fixed tests

bbf6756

dajac reviewed Oct 14, 2025

View reviewed changes

github-actions bot added streams core Kafka Broker tools connect dependencies Pull requests that update a dependency file storage Pull requests that target the storage module build Gradle build or GitHub Actions docker Official Docker image labels Oct 14, 2025

github-actions bot added the clients label Oct 14, 2025

Merge branch 'trunk' of https://github.com/apache/kafka into unknown_…

d17a9ff

…server_errors_iharker

airlock-confluentinc bot force-pushed the unknown_server_errors_iharker branch from 9bfdfdf to d17a9ff Compare October 14, 2025 15:26

Moved batch full check to maybeFlushCurrentBatch

b043882

dajac reviewed Oct 15, 2025

View reviewed changes

...common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeTest.java Show resolved Hide resolved

izzyharker added 2 commits October 15, 2025 09:20

Added missing @test

01d1957

removed estimatedSize check

17d35b0

izzyharker requested a review from dajac October 15, 2025 15:27

dajac approved these changes Oct 16, 2025

View reviewed changes

dajac merged commit 388739f into apache:trunk Oct 16, 2025
22 checks passed

dajac deleted the unknown_server_errors_iharker branch October 16, 2025 09:10

izzyharker mentioned this pull request Oct 16, 2025

KAFKA-19760: RecordTooLargeExceptions in group coordinator when offsets.topic.compression.codec is used (4.0) #20715

Merged

chia7712 reviewed Oct 18, 2025

View reviewed changes

KAFKA-19760: RecordTooLargeExceptions in group coordinator when offsets.topic.compression.codec is used #20653

KAFKA-19760: RecordTooLargeExceptions in group coordinator when offsets.topic.compression.codec is used #20653

Uh oh!

Conversation

izzyharker commented Oct 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squah-confluent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

squah-confluent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

izzyharker Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dajac commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

izzyharker commented Oct 7, 2025 •

edited by github-actions bot

Loading

izzyharker Oct 14, 2025 •

edited

Loading