-
Notifications
You must be signed in to change notification settings - Fork 0
Clone squah fix group coordinator deleting groups with pending transactional offsets #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
When deleting pending transactional offsets, we must preserve the list of groups associated with the producer ID, otherwise we cannot clean up the list of pending transactions for the group once the transaction is committed or aborted.
When a group has pending transactional offsets but no committed offsets, we can accidentally delete it while cleaning up expired offsets. Add a check to avoid this case.
Summary of ChangesHello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where the group coordinator could prematurely delete groups that still had ongoing transactional offset commits. By introducing a mechanism to explicitly preserve group metadata for pending transactions and updating the cleanup logic to account for these transactions, the change ensures that groups are only removed when all associated transactional activity has concluded. This enhancement maintains data consistency and prevents potential data loss in transactional offset management. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Clone squash fix group coordinator deleting groups with pending transactional offsetsTL;DR: Prevents group deletion when pending transactional offsets exist, ensuring data consistency during transactions. Refacto PR SummaryFixes a critical bug in the OffsetMetadataManager that was prematurely deleting groups with pending transactional offsets. Change HighlightsClick to expand
Sequence DiagramsequenceDiagram
participant Client as Client
participant Coordinator as GroupCoordinator
participant OffsetManager as OffsetMetadataManager
participant Storage as Storage
Client->>Coordinator: Begin Transaction (producerId=10)
Coordinator->>OffsetManager: commitOffset(transactional)
OffsetManager->>OffsetManager: Store in pendingTransactionalOffsets with preserveGroups=true
Client->>Coordinator: Delete regular offsets
Coordinator->>OffsetManager: deleteOffsets(groupId)
OffsetManager->>Storage: Delete regular offsets
Note over OffsetManager: Group retained due to pending transaction
Client->>Coordinator: Commit/Abort Transaction
Coordinator->>OffsetManager: replayEndTransactionMarker
OffsetManager->>OffsetManager: Process transaction completion
Note over OffsetManager: Group can now be deleted if empty
Testing GuideClick to expand
|
/refacto-bot |
Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (1)group-coordinator/src/test/java/org/apache/kafka/coordinator/group/OffsetMetadataManagerTest.java (1)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Refacto is reviewing this PR. Please wait for the review comments to be posted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug where the group coordinator could prematurely delete a group that still had pending transactional offsets. The fix correctly prevents group cleanup if open transactions exist and ensures that the internal data structures for pending offsets preserve group information, even if all pending offsets for that group are deleted within the transaction. The production code changes are sound and well-targeted. The new tests validate the fix, but I've suggested a simplification for one of them to improve clarity and maintainability.
// The group should be deletable now. | ||
context.commitOffset("group-id", "foo", 0, 100L, 0, commitTimestamp); | ||
context.time.sleep(Duration.ofMinutes(1).toMillis()); | ||
|
||
records = new ArrayList<>(); | ||
assertTrue(context.cleanupExpiredOffsets("group-id", records)); | ||
assertEquals(expectedRecords, records); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second half of this test is a bit confusing. After the transaction is committed on line 2646, the group should be deletable. However, the test then re-commits and re-expires the same offset to verify this. This is redundant and makes the test harder to understand.
A simpler approach would be to directly assert that cleanupExpiredOffsets
returns true
right after the transaction is committed. Since no more offsets are left to expire, the records list should be empty. This would more clearly test the condition for the group being deletable.
// The group should be deletable now. | |
context.commitOffset("group-id", "foo", 0, 100L, 0, commitTimestamp); | |
context.time.sleep(Duration.ofMinutes(1).toMillis()); | |
records = new ArrayList<>(); | |
assertTrue(context.cleanupExpiredOffsets("group-id", records)); | |
assertEquals(expectedRecords, records); | |
// The group should be deletable now as there are no more offsets and no pending transactions. | |
records = new ArrayList<>(); | |
assertTrue(context.cleanupExpiredOffsets("group-id", records)); | |
// No more offsets are expired, so the records list should be empty. | |
assertEquals(List.of(), records); |
Code Review: Transaction Management in Offset Handling👍 Well Done
📌 Files Processed
📝 Additional Comments
|
offsets.offsetsByGroup.get(groupId); | ||
if (offsetsByTopic == null) { | ||
return true; | ||
return !openTransactionsByGroup.containsKey(groupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group Deletion Logic
The logic only checks openTransactionsByGroup but doesn't verify if there are pending transactional offsets for the group. This could lead to data loss if a group has pending transactional offsets but no open transactions.
return !openTransactionsByGroup.containsKey(groupId) && !hasPendingTransactionalOffsets(groupId);
Commitable Suggestion
return !openTransactionsByGroup.containsKey(groupId); | |
return !openTransactionsByGroup.containsKey(groupId) && !hasPendingTransactionalOffsets(groupId); |
Standards
- ISO-IEC-25010-Reliability-Fault-Tolerance
- ISO-IEC-25010-Functional-Correctness-Appropriateness
- DbC-State-Consistency
offsets.offsetsByGroup.get(groupId); | ||
if (offsetsByTopic == null) { | ||
return true; | ||
return !openTransactionsByGroup.containsKey(groupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group Deletion Logic
The original code unconditionally returns true when offsetsByTopic is null, indicating the group can be deleted. This is incorrect when the group has pending transactional offsets, as it would lead to premature group deletion. The fix correctly checks if there are open transactions for the group.
return !openTransactionsByGroup.containsKey(groupId) && !hasPendingTransactionalOffsets(groupId);
Commitable Suggestion
return !openTransactionsByGroup.containsKey(groupId); | |
return !openTransactionsByGroup.containsKey(groupId) && !hasPendingTransactionalOffsets(groupId); |
Standards
- Business-Rule-State-Consistency
- Logic-Verification-Control-Flow
- Algorithm-Correctness-Condition-Logic
topicOffsets.remove(topic); | ||
|
||
if (topicOffsets.isEmpty()) | ||
if (!preserveGroups && topicOffsets.isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group Removal Condition
The original code unconditionally removes groups when their topic offsets are empty, ignoring pending transactions. The fix adds a conditional check (!preserveGroups) to prevent removing groups that have pending transactional offsets, maintaining data consistency during transaction processing.
if (!preserveGroups && topicOffsets.isEmpty())
Commitable Suggestion
if (!preserveGroups && topicOffsets.isEmpty()) | |
if (!preserveGroups && topicOffsets.isEmpty()) |
Standards
- Business-Rule-State-Consistency
- Logic-Verification-Control-Flow
- Algorithm-Correctness-Condition-Logic
// the transaction is committed; or removed when the transaction is aborted. | ||
pendingTransactionalOffsets | ||
.computeIfAbsent(producerId, __ -> new Offsets()) | ||
.computeIfAbsent(producerId, __ -> new Offsets(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transactional Offsets Handling
The original code creates new Offsets instances without preserving empty groups for pending transactions. The fix creates Offsets with preserveGroups=true for pending transactional offsets, ensuring groups with pending transactions aren't prematurely deleted when offsets are removed.
.computeIfAbsent(producerId, __ -> new Offsets(true))
Commitable Suggestion
.computeIfAbsent(producerId, __ -> new Offsets(true)) | |
.computeIfAbsent(producerId, __ -> new Offsets(true)) |
Standards
- Business-Rule-State-Consistency
- Logic-Verification-Data-Flow
- Algorithm-Correctness-Parameter-Logic
this.config = config; | ||
this.metrics = metrics; | ||
this.offsets = new Offsets(); | ||
this.offsets = new Offsets(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default Offsets Configuration
The original code creates the main offsets store without explicitly setting group preservation behavior. The fix creates Offsets with preserveGroups=false for the main store, making the behavior explicit and consistent with the intended design where empty groups should be removed from the main store.
this.offsets = new Offsets(false);
Commitable Suggestion
this.offsets = new Offsets(false); | |
this.offsets = new Offsets(false); |
Standards
- Business-Rule-State-Consistency
- Logic-Verification-Data-Flow
- Algorithm-Correctness-Parameter-Logic
private Offsets(boolean preserveGroups) { | ||
this.preserveGroups = preserveGroups; | ||
this.offsetsByGroup = new TimelineHashMap<>(snapshotRegistry, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group Preservation Logic
The original Offsets constructor lacks a parameter to control group preservation behavior. This leads to inconsistent handling of groups with pending transactions. The fix adds a preserveGroups parameter to explicitly control whether empty groups should be preserved during offset removal.
private Offsets(boolean preserveGroups) {
this.preserveGroups = preserveGroups;
this.offsetsByGroup = new TimelineHashMap<>(snapshotRegistry, 0);
}
/**
* Default constructor that doesn't preserve empty groups.
*/
private Offsets() {
this(false);
}
Commitable Suggestion
private Offsets(boolean preserveGroups) { | |
this.preserveGroups = preserveGroups; | |
this.offsetsByGroup = new TimelineHashMap<>(snapshotRegistry, 0); | |
private Offsets(boolean preserveGroups) { | |
this.preserveGroups = preserveGroups; | |
this.offsetsByGroup = new TimelineHashMap<>(snapshotRegistry, 0); | |
} | |
/** | |
* Default constructor that doesn't preserve empty groups. | |
*/ | |
private Offsets() { | |
this(false); | |
} |
Standards
- Business-Rule-State-Consistency
- Logic-Verification-Data-Flow
- Algorithm-Correctness-Parameter-Logic
offsets.offsetsByGroup.get(groupId); | ||
if (offsetsByTopic == null) { | ||
return true; | ||
return !openTransactionsByGroup.containsKey(groupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient Group Check
The group existence check has been modified to check openTransactionsByGroup which is more expensive than the previous empty check. This operation now requires a hash lookup for every group check, potentially affecting performance in high-throughput scenarios.
Standards
- ISO-IEC-25010-Performance-Efficiency-Time-Behavior
- Optimization-Pattern-Conditional-Efficiency
- Algorithmic-Complexity-Lookup-Performance
private Offsets(boolean preserveGroups) { | ||
this.preserveGroups = preserveGroups; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean Parameter Clarity
The boolean parameter creates a primitive obsession code smell. Consider creating an explicit enum or constant for the preserveGroups parameter to improve code readability and make the intention clearer at instantiation sites.
Standards
- Clean-Code-Parameters
- Refactoring-Replace-Primitive
- Maintainability-Quality-Readability
this.config = config; | ||
this.metrics = metrics; | ||
this.offsets = new Offsets(); | ||
this.offsets = new Offsets(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default Parameter Value
Using a literal false value without context makes the code's intent unclear. Consider creating a named constant like DEFAULT_PRESERVE_GROUPS = false to document the default behavior and improve code readability.
Standards
- Clean-Code-Constants
- Maintainability-Quality-Self-Documentation
- Clean-Code-Naming
offsets.offsetsByGroup.get(groupId); | ||
if (offsetsByTopic == null) { | ||
return true; | ||
return !openTransactionsByGroup.containsKey(groupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transaction State Check
The negated condition makes the logic harder to follow. Consider extracting this check into a descriptive method like hasOpenTransactions(groupId) and inverting the logic to make the intent clearer and improve maintainability.
Standards
- Clean-Code-Functions
- Maintainability-Quality-Readability
- Clean-Code-Conditionals
// the transaction is committed; or removed when the transaction is aborted. | ||
pendingTransactionalOffsets | ||
.computeIfAbsent(producerId, __ -> new Offsets()) | ||
.computeIfAbsent(producerId, __ -> new Offsets(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve Flag Documentation
The true literal lacks context at the call site. Consider creating a named constant like PRESERVE_GROUPS_FOR_TRANSACTIONS = true to document the behavior and improve maintainability when reading transactional offset handling code.
Standards
- Clean-Code-Constants
- Maintainability-Quality-Self-Documentation
- Clean-Code-Naming
Delete this text and replace it with a detailed description of your change. The
PR title and body will become the squashed commit message.
If you would like to tag individuals, add some commentary, upload images, or
include other supplemental information that should not be part of the eventual
commit message, please use a separate comment.
If applicable, please include a summary of the testing strategy (including
rationale) for the proposed change. Unit and/or integration tests are expected
for any behavior change and system tests should be considered for larger
changes.
Summary by CodeRabbit
Bug Fixes
Tests