Skip to content

Conversation

@visz11
Copy link
Collaborator

@visz11 visz11 commented Dec 15, 2025

CodeAnt-AI Description

Add topic and group hash computations to Group

What Changed

  • Added a deterministic topic-hash function that fingerprints a topic by its id, name, partition count, and per-partition racks (sorted and joined) using Murmur3.
  • Added a group-hash function that combines per-topic hashes in lexicographic topic-name order to produce a stable group fingerprint.
  • Added unit tests covering hash determinism and sensitivity to changes in topic id, name, partition count, partition order, and rack ordering.
  • Updated import control to allow required hashing library dependencies for the new functions and tests.

Impact

✅ Faster detection of topic changes affecting consumer groups
✅ Fewer missed group metadata changes caused by unordered topic fields
✅ Verified topic/group hashing behavior via new unit tests

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link

codeant-ai bot commented Dec 15, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@visz11
Copy link
Collaborator Author

visz11 commented Dec 15, 2025

@refacto-visz

@github-actions github-actions bot added the build label Dec 15, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces core hashing capabilities to the Kafka group coordinator, enabling the computation of unique identifiers for topics and consumer groups. By integrating the Guava hashing library, it provides a standardized and efficient way to generate hashes based on critical metadata such as topic IDs, names, partition configurations, and broker rack assignments. This enhancement is crucial for internal consistency checks or optimization strategies that rely on stable and reproducible identifiers.

Highlights

  • Hashing Functionality: Introduced computeTopicHash and computeGroupHash methods in the Group class to generate unique hashes for topics and groups, respectively.
  • Guava Integration: Added the Guava library as a new dependency to leverage its robust hashing utilities, specifically Murmur3.
  • Comprehensive Testing: Included new unit tests in GroupTest.java to thoroughly validate the correctness and consistency of the newly implemented hashing functions under various scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Dec 15, 2025

Warning

Rate limit exceeded

@visz11 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ed79212 and 36cb999.

📒 Files selected for processing (5)
  • build.gradle (1 hunks)
  • checkstyle/import-control-group-coordinator.xml (1 hunks)
  • gradle/dependencies.gradle (2 hunks)
  • group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2 hunks)
  • group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1 hunks)
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clone-KAFKA-17747-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 15, 2025
@refacto-visz
Copy link

refacto-visz bot commented Dec 15, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces hashing mechanisms for topics and consumer groups, which is a valuable addition. The implementation uses Guava's hashing library and includes a comprehensive set of unit tests to validate the new logic. My main feedback is to improve the robustness of the topic hash by using the full 128 bits of the topic UUID instead of its 32-bit hash code. This will significantly reduce the probability of hash collisions.

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using topicImage.id().hashCode() for hashing the topic ID is not ideal as it loses information from the 128-bit UUID, increasing the risk of hash collisions. Two different UUIDs could potentially have the same integer hash code. To create a more robust and collision-resistant hash, it's better to use the full 128 bits of the UUID by hashing the most and least significant bits.

Suggested change
.putLong(topicImage.id().hashCode()) // topic Id
.putLong(topicImage.id().getMostSignificantBits()).putLong(topicImage.id().getLeastSignificantBits()) // topic Id

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In line with the suggested change in Group.java to use the full UUID for hashing, this test should be updated to use getMostSignificantBits() and getLeastSignificantBits() instead of hashCode() when constructing the expected hash. This ensures the tests remain consistent with the improved hashing logic. Please apply the same fix to testComputeTopicHashWithDifferentMagicByte, testComputeTopicHashWithDifferentPartitionOrder, and testComputeTopicHashWithDifferentRackOrder.

Suggested change
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putLong(FOO_TOPIC_ID.getMostSignificantBits()).putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id

@refacto-visz
Copy link

refacto-visz bot commented Dec 15, 2025

Code Review: Group Hashing Implementation

PR Confidence Score: 🟨 4 / 5

👍 Well Done
Deterministic Hashing Logic

The new hashing logic correctly uses sorting to ensure deterministic output, which is critical for correctness and prevents timing attacks.

Comprehensive Test Coverage

The added tests thoroughly cover various scenarios including determinism, ordering, and metadata changes, ensuring robust validation of the hashing functions.

📁 Selected files for review (5)
  • build.gradle
  • checkstyle/import-control-group-coordinator.xml
  • gradle/dependencies.gradle
  • group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
  • group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
📝 Additional Comments
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (4)
Inefficient String Creation

A new stream is created and collected into an intermediate string for each partition within the loop. This allocates a new string object per partition, which could lead to increased memory pressure and garbage collection if topics have a very large number of partitions.

Standards:

  • ISO-IEC-25010-Performance-Efficiency-Resource-Utilization
  • Optimization-Pattern-Object-Allocation
Information Disclosure Risk

The hash computation includes sensitive topology information like topic names, IDs, and partition counts in plaintext. This metadata could be extracted through hash analysis techniques, potentially revealing internal cluster structure to unauthorized parties.

Standards:

  • CWE-200
  • OWASP-A01
  • NIST-SSDF-PW.1
Verbose Optional Unwrapping

The stream operation uses filter(Optional::isPresent).map(Optional::get) to handle Optional values. This pattern is functional but verbose. If the project's Java version is 9 or higher, this can be simplified to flatMap(Optional::stream) for better readability.

Standards:

  • Clean-Code-Functions
Missing High-Level Explanation

The method's Javadoc explains what fields are included in the hash but omits the high-level purpose. Adding context on why this hash is computed would improve long-term maintainability by clarifying the impact of future changes to this logic.

Standards:

  • Clean-Code-Comments

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete UUID Hashing

The hashing algorithm uses the 32-bit integer hash code of the 128-bit topic UUID. This discards 96 bits of information from the UUID, significantly increasing the likelihood of hash collisions for different topics. This could lead to metadata changes being missed.

Standards
  • Logic-Verification-Data-Integrity
  • Algorithm-Correctness-Hashing

Comment on lines +229 to +236
static long computeGroupHash(Map<String, Long> topicHashes) {
return Hashing.combineOrdered(
topicHashes.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> HashCode.fromLong(e.getValue()))
.toList()
).asLong();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash Collision Risk

The computeGroupHash method combines hash values without salt or additional entropy, creating predictable hash patterns. Attackers could exploit hash collision vulnerabilities to cause hash table attacks or denial of service through algorithmic complexity attacks.

Standards
  • CWE-328
  • OWASP-A02
  • NIST-SSDF-PW.1

* @return The hash of the topic.
*/
static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {
HashFunction hf = Hashing.murmur3_128();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak Hash Algorithm

MurmurHash3 is a non-cryptographic hash function designed for speed rather than security. While suitable for hash tables, it's vulnerable to hash collision attacks where malicious input can be crafted to produce identical hash values.

Standards
  • CWE-328
  • OWASP-A02
  • NIST-SSDF-PW.1

@refacto-visz
Copy link

refacto-visz bot commented Dec 15, 2025

PR already reviewed at the latest commit: 36cb999.
Please try again with new changes.

@codeant-ai
Copy link

codeant-ai bot commented Dec 15, 2025

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Topic ID hashing
    The topic id is hashed using topicImage.id().hashCode() which reduces a UUID (or other id type) to an int and loses entropy. This may cause unnecessary collisions and reduce the effectiveness of the Murmur3 hash. Prefer incorporating the full UUID bits or a stable byte representation instead of its hashCode().

  • Null/compatibility handling in group hash
    The group hash uses Stream.toList() and maps values directly with HashCode.fromLong(e.getValue()). If codebase targets an older Java version this may not compile (Stream.toList() compatibility). Also values in topicHashes may be null; passing null to fromLong would NPE. Consider using collect(Collectors.toList()) and defensively handling null values.

  • Determinism Risk
    The tests use Map.of(...) and then assert an ordered hash result. The iteration order of maps produced by Map.of is unspecified in the Java API (not guaranteed to be insertion-ordered). If Group.computeGroupHash expects a specific ordering or sorts keys internally, the test may be brittle or misleading. Use an explicitly ordered map (e.g., LinkedHashMap) to make the test deterministic and clear.

  • Possible Bug
    The tests build the expected topic hash by calling .putLong(FOO_TOPIC_ID.hashCode()). Using an int hashCode() and putting it as a long may not match how Group.computeTopicHash serializes the Uuid (it might use the raw 128-bit UUID, its string form, or both most/least significant bits). This can make the test pass or fail depending on the actual implementation and hides a mismatch risk — the expected hash should be computed using the same serialization used by the production code.

  • Fragile Rack String Assumption
    The expected hasher input uses hard-coded rack strings like "rack0;rack1" and "rack1;rack2". This assumes the MetadataImageBuilder().addRacks() produces those exact rack lists and formatting. If the rack serialization or ordering changes in MetadataImage or MetadataImageBuilder, the tests will break even if Group.computeTopicHash is correct. Ensure the expected string is derived from the same source/format logic used by the production code.

Comment on lines +230 to +235
return Hashing.combineOrdered(
topicHashes.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> HashCode.fromLong(e.getValue()))
.toList()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The computeGroupHash method passes the list of topic hashes directly to Hashing.combineOrdered, which in Guava throws an IllegalArgumentException when the iterable is empty; this will cause a runtime failure if the method is ever called with an empty topicHashes map (e.g., for a group with no topics), so it should explicitly handle the empty case with a defined hash value. [logic error]

Severity Level: Minor ⚠️

Suggested change
return Hashing.combineOrdered(
topicHashes.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> HashCode.fromLong(e.getValue()))
.toList()
List<HashCode> topicHashCodes = topicHashes.entrySet()
.stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> HashCode.fromLong(e.getValue()))
.toList();
if (topicHashCodes.isEmpty()) {
// Define a stable hash value for groups without topics.
return 0L;
}
return Hashing.combineOrdered(topicHashCodes).asLong();
Why it matters? ⭐

The suggestion highlights a real runtime edge case: Guava's Hashing.combineOrdered(Iterable) throws IllegalArgumentException when given an empty iterable. The current implementation passes the stream result directly and thus will fail if topicHashes is empty (possible for a group with no topics). The improved code provides a safe, stable behavior (returns a defined hash for empty input) and prevents a potential crash. This is a functional bugfix, not just cosmetic.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
**Line:** 230:235
**Comment:**
	*Logic Error: The `computeGroupHash` method passes the list of topic hashes directly to `Hashing.combineOrdered`, which in Guava throws an `IllegalArgumentException` when the iterable is empty; this will cause a runtime failure if the method is ever called with an empty `topicHashes` map (e.g., for a group with no topics), so it should explicitly handle the empty case with a defined hash value.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

Comment on lines +178 to +187
long result = Group.computeGroupHash(Map.of(
BAR_TOPIC_NAME, 123L,
FOO_TOPIC_NAME, 456L
));

long unexpected = Hashing.combineOrdered(List.of(
HashCode.fromLong(456L),
HashCode.fromLong(123L)
)).asLong();
assertNotEquals(unexpected, result);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The test named to verify behavior with different topic order does not actually call the hash function with a differently ordered map, it only changes the order in a manually computed hash, so the test would still pass even if the production implementation were incorrectly sensitive to map iteration order, giving a false sense that ordering is handled correctly. [logic error]

Severity Level: Minor ⚠️

Suggested change
long result = Group.computeGroupHash(Map.of(
BAR_TOPIC_NAME, 123L,
FOO_TOPIC_NAME, 456L
));
long unexpected = Hashing.combineOrdered(List.of(
HashCode.fromLong(456L),
HashCode.fromLong(123L)
)).asLong();
assertNotEquals(unexpected, result);
long result1 = Group.computeGroupHash(Map.of(
BAR_TOPIC_NAME, 123L,
FOO_TOPIC_NAME, 456L
));
long result2 = Group.computeGroupHash(Map.of(
FOO_TOPIC_NAME, 456L,
BAR_TOPIC_NAME, 123L
));
assertEquals(result1, result2);
Why it matters? ⭐

The current test only constructs one input map and then compares the result to a manually built ordered HashCode list; that doesn't exercise the producer code with a differently-ordered input map. The suggested change calls Group.computeGroupHash with the same entries in the opposite insertion order and asserts equality, which directly verifies that the implementation is order-independent. This is a meaningful behavioral test improvement (not just stylistic) and will catch real bugs where the implementation improperly depends on map iteration order.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
**Line:** 178:187
**Comment:**
	*Logic Error: The test named to verify behavior with different topic order does not actually call the hash function with a differently ordered map, it only changes the order in a manually computed hash, so the test would still pass even if the production implementation were incorrectly sensitive to map iteration order, giving a false sense that ordering is handled correctly.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai
Copy link

codeant-ai bot commented Dec 15, 2025

CodeAnt AI finished reviewing your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants