KAFKA-17747: Add compute topic and group hash #54

visz11 · 2025-12-15T09:34:46Z

CodeAnt-AI Description

Add topic and group hash computations to Group

What Changed

Added a deterministic topic-hash function that fingerprints a topic by its id, name, partition count, and per-partition racks (sorted and joined) using Murmur3.
Added a group-hash function that combines per-topic hashes in lexicographic topic-name order to produce a stable group fingerprint.
Added unit tests covering hash determinism and sensitivity to changes in topic id, name, partition count, partition order, and rack ordering.
Updated import control to allow required hashing library dependencies for the new functions and tests.

Impact

✅ Faster detection of topic changes affecting consumer groups
✅ Fewer missed group metadata changes caused by unordered topic fields
✅ Verified topic/group hashing behavior via new unit tests

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Signed-off-by: PoAn Yang <[email protected]>

codeant-ai · 2025-12-15T09:34:50Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

visz11 · 2025-12-15T09:34:57Z

@refacto-visz

gemini-code-assist · 2025-12-15T09:35:02Z

Summary of Changes

Hello @visz11, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces core hashing capabilities to the Kafka group coordinator, enabling the computation of unique identifiers for topics and consumer groups. By integrating the Guava hashing library, it provides a standardized and efficient way to generate hashes based on critical metadata such as topic IDs, names, partition configurations, and broker rack assignments. This enhancement is crucial for internal consistency checks or optimization strategies that rely on stable and reproducible identifiers.

Highlights

Hashing Functionality: Introduced computeTopicHash and computeGroupHash methods in the Group class to generate unique hashes for topics and groups, respectively.
Guava Integration: Added the Guava library as a new dependency to leverage its robust hashing utilities, specifically Murmur3.
Comprehensive Testing: Included new unit tests in GroupTest.java to thoroughly validate the correctness and consistency of the newly implemented hashing functions under various scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-12-15T09:35:16Z

Warning

Rate limit exceeded

@visz11 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ed79212 and 36cb999.

📒 Files selected for processing (5)

build.gradle (1 hunks)
checkstyle/import-control-group-coordinator.xml (1 hunks)
gradle/dependencies.gradle (2 hunks)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2 hunks)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1 hunks)

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch clone-KAFKA-17747-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

refacto-visz · 2025-12-15T09:35:48Z

Refacto is reviewing this PR. Please wait for the review comments to be posted.

gemini-code-assist

Code Review

This pull request introduces hashing mechanisms for topics and consumer groups, which is a valuable addition. The implementation uses Guava's hashing library and includes a comprehensive set of unit tests to validate the new logic. My main feedback is to improve the robustness of the topic hash by using the full 128 bits of the topic UUID instead of its 32-bit hash code. This will significantly reduce the probability of hash collisions.

gemini-code-assist · 2025-12-15T09:36:36Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id


Using topicImage.id().hashCode() for hashing the topic ID is not ideal as it loses information from the 128-bit UUID, increasing the risk of hash collisions. Two different UUIDs could potentially have the same integer hash code. To create a more robust and collision-resistant hash, it's better to use the full 128 bits of the UUID by hashing the most and least significant bits.

Suggested change

.putLong(topicImage.id().hashCode()) // topic Id

.putLong(topicImage.id().getMostSignificantBits()).putLong(topicImage.id().getLeastSignificantBits()) // topic Id

gemini-code-assist · 2025-12-15T09:36:37Z

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id


In line with the suggested change in Group.java to use the full UUID for hashing, this test should be updated to use getMostSignificantBits() and getLeastSignificantBits() instead of hashCode() when constructing the expected hash. This ensures the tests remain consistent with the improved hashing logic. Please apply the same fix to testComputeTopicHashWithDifferentMagicByte, testComputeTopicHashWithDifferentPartitionOrder, and testComputeTopicHashWithDifferentRackOrder.

Suggested change

.putLong(FOO_TOPIC_ID.hashCode()) // topic Id

.putLong(FOO_TOPIC_ID.getMostSignificantBits()).putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id

refacto-visz · 2025-12-15T09:37:30Z

Code Review: Group Hashing Implementation

PR Confidence Score: 🟨 4 / 5

👍 Well Done

Deterministic Hashing Logic

The new hashing logic correctly uses sorting to ensure deterministic output, which is critical for correctness and prevents timing attacks.

Comprehensive Test Coverage

The added tests thoroughly cover various scenarios including determinism, ordering, and metadata changes, ensuring robust validation of the hashing functions.

📁 Selected files for review (5)

build.gradle
checkstyle/import-control-group-coordinator.xml
gradle/dependencies.gradle
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

📝 Additional Comments

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (4)

Inefficient String Creation

A new stream is created and collected into an intermediate string for each partition within the loop. This allocates a new string object per partition, which could lead to increased memory pressure and garbage collection if topics have a very large number of partitions.

Standards:

ISO-IEC-25010-Performance-Efficiency-Resource-Utilization

Optimization-Pattern-Object-Allocation

Information Disclosure Risk

The hash computation includes sensitive topology information like topic names, IDs, and partition counts in plaintext. This metadata could be extracted through hash analysis techniques, potentially revealing internal cluster structure to unauthorized parties.

Standards:

CWE-200

OWASP-A01

NIST-SSDF-PW.1

Verbose Optional Unwrapping

The stream operation uses filter(Optional::isPresent).map(Optional::get) to handle Optional values. This pattern is functional but verbose. If the project's Java version is 9 or higher, this can be simplified to flatMap(Optional::stream) for better readability.

Standards:

Clean-Code-Functions

Missing High-Level Explanation

The method's Javadoc explains what fields are included in the hash but omits the high-level purpose. Adding context on why this hash is computed would improve long-term maintainability by clarifying the impact of future changes to this logic.

Standards:

Clean-Code-Comments

refacto-visz · 2025-12-15T09:37:33Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id


Incomplete UUID Hashing

The hashing algorithm uses the 32-bit integer hash code of the 128-bit topic UUID. This discards 96 bits of information from the UUID, significantly increasing the likelihood of hash collisions for different topics. This could lead to metadata changes being missed.

Standards

Logic-Verification-Data-Integrity

Algorithm-Correctness-Hashing

refacto-visz · 2025-12-15T09:37:35Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+    static long computeGroupHash(Map<String, Long> topicHashes) {
+        return Hashing.combineOrdered(
+            topicHashes.entrySet()
+                .stream()
+                .sorted(Map.Entry.comparingByKey())
+                .map(e -> HashCode.fromLong(e.getValue()))
+                .toList()
+        ).asLong();


Hash Collision Risk

The computeGroupHash method combines hash values without salt or additional entropy, creating predictable hash patterns. Attackers could exploit hash collision vulnerabilities to cause hash table attacks or denial of service through algorithmic complexity attacks.

Standards

CWE-328

OWASP-A02

NIST-SSDF-PW.1

refacto-visz · 2025-12-15T09:37:36Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+     * @return The hash of the topic.
+     */
+    static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {
+        HashFunction hf = Hashing.murmur3_128();


Weak Hash Algorithm

MurmurHash3 is a non-cryptographic hash function designed for speed rather than security. While suitable for hash tables, it's vulnerable to hash collision attacks where malicious input can be crafted to produce identical hash values.

Standards

CWE-328

OWASP-A02

NIST-SSDF-PW.1

refacto-visz · 2025-12-15T09:37:51Z

PR already reviewed at the latest commit: 36cb999.
Please try again with new changes.

codeant-ai · 2025-12-15T09:38:26Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Topic ID hashing The topic id is hashed using `topicImage.id().hashCode()` which reduces a UUID (or other id type) to an int and loses entropy. This may cause unnecessary collisions and reduce the effectiveness of the Murmur3 hash. Prefer incorporating the full UUID bits or a stable byte representation instead of its `hashCode()`. Null/compatibility handling in group hash The group hash uses Stream.toList() and maps values directly with `HashCode.fromLong(e.getValue())`. If codebase targets an older Java version this may not compile (Stream.toList() compatibility). Also values in `topicHashes` may be null; passing null to `fromLong` would NPE. Consider using `collect(Collectors.toList())` and defensively handling null values. Determinism Risk The tests use `Map.of(...)` and then assert an ordered hash result. The iteration order of maps produced by `Map.of` is unspecified in the Java API (not guaranteed to be insertion-ordered). If `Group.computeGroupHash` expects a specific ordering or sorts keys internally, the test may be brittle or misleading. Use an explicitly ordered map (e.g., `LinkedHashMap`) to make the test deterministic and clear. Possible Bug The tests build the expected topic hash by calling `.putLong(FOO_TOPIC_ID.hashCode())`. Using an int hashCode() and putting it as a long may not match how `Group.computeTopicHash` serializes the `Uuid` (it might use the raw 128-bit UUID, its string form, or both most/least significant bits). This can make the test pass or fail depending on the actual implementation and hides a mismatch risk — the expected hash should be computed using the same serialization used by the production code. Fragile Rack String Assumption The expected hasher input uses hard-coded rack strings like `"rack0;rack1"` and `"rack1;rack2"`. This assumes the `MetadataImageBuilder().addRacks()` produces those exact rack lists and formatting. If the rack serialization or ordering changes in `MetadataImage` or `MetadataImageBuilder`, the tests will break even if `Group.computeTopicHash` is correct. Ensure the expected string is derived from the same source/format logic used by the production code.

codeant-ai · 2025-12-15T09:38:32Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        return Hashing.combineOrdered(
+            topicHashes.entrySet()
+                .stream()
+                .sorted(Map.Entry.comparingByKey())
+                .map(e -> HashCode.fromLong(e.getValue()))
+                .toList()


Suggestion: The computeGroupHash method passes the list of topic hashes directly to Hashing.combineOrdered, which in Guava throws an IllegalArgumentException when the iterable is empty; this will cause a runtime failure if the method is ever called with an empty topicHashes map (e.g., for a group with no topics), so it should explicitly handle the empty case with a defined hash value. [logic error]

Severity Level: Minor ⚠️

Suggested change

return Hashing.combineOrdered(

topicHashes.entrySet()

.stream()

.sorted(Map.Entry.comparingByKey())

.map(e -> HashCode.fromLong(e.getValue()))

.toList()

List<HashCode> topicHashCodes = topicHashes.entrySet()

.stream()

.sorted(Map.Entry.comparingByKey())

.map(e -> HashCode.fromLong(e.getValue()))

.toList();

if (topicHashCodes.isEmpty()) {

// Define a stable hash value for groups without topics.

return 0L;

}

return Hashing.combineOrdered(topicHashCodes).asLong();

Why it matters? ⭐

The suggestion highlights a real runtime edge case: Guava's Hashing.combineOrdered(Iterable) throws IllegalArgumentException when given an empty iterable. The current implementation passes the stream result directly and thus will fail if topicHashes is empty (possible for a group with no topics). The improved code provides a safe, stable behavior (returns a defined hash for empty input) and prevents a potential crash. This is a functional bugfix, not just cosmetic.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java **Line:** 230:235 **Comment:** *Logic Error: The `computeGroupHash` method passes the list of topic hashes directly to `Hashing.combineOrdered`, which in Guava throws an `IllegalArgumentException` when the iterable is empty; this will cause a runtime failure if the method is ever called with an empty `topicHashes` map (e.g., for a group with no topics), so it should explicitly handle the empty case with a defined hash value. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai · 2025-12-15T09:38:33Z

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

+        long result = Group.computeGroupHash(Map.of(
+            BAR_TOPIC_NAME, 123L,
+            FOO_TOPIC_NAME, 456L
+        ));
+
+        long unexpected = Hashing.combineOrdered(List.of(
+            HashCode.fromLong(456L),
+            HashCode.fromLong(123L)
+        )).asLong();
+        assertNotEquals(unexpected, result);


Suggestion: The test named to verify behavior with different topic order does not actually call the hash function with a differently ordered map, it only changes the order in a manually computed hash, so the test would still pass even if the production implementation were incorrectly sensitive to map iteration order, giving a false sense that ordering is handled correctly. [logic error]

Severity Level: Minor ⚠️

Suggested change

long result = Group.computeGroupHash(Map.of(

BAR_TOPIC_NAME, 123L,

FOO_TOPIC_NAME, 456L

));

long unexpected = Hashing.combineOrdered(List.of(

HashCode.fromLong(456L),

HashCode.fromLong(123L)

)).asLong();

assertNotEquals(unexpected, result);

long result1 = Group.computeGroupHash(Map.of(

BAR_TOPIC_NAME, 123L,

FOO_TOPIC_NAME, 456L

));

long result2 = Group.computeGroupHash(Map.of(

FOO_TOPIC_NAME, 456L,

BAR_TOPIC_NAME, 123L

));

assertEquals(result1, result2);

Why it matters? ⭐

The current test only constructs one input map and then compares the result to a manually built ordered HashCode list; that doesn't exercise the producer code with a differently-ordered input map. The suggested change calls Group.computeGroupHash with the same entries in the opposite insertion order and asserts equality, which directly verifies that the implementation is order-independent. This is a meaningful behavioral test improvement (not just stylistic) and will catch real bugs where the implementation improperly depends on map iteration order.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java **Line:** 178:187 **Comment:** *Logic Error: The test named to verify behavior with different topic order does not actually call the hash function with a differently ordered map, it only changes the order in a manually computed hash, so the test would still pass even if the production implementation were incorrectly sensitive to map iteration order, giving a false sense that ordering is handled correctly. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai · 2025-12-15T09:38:38Z

CodeAnt AI finished reviewing your PR.

KAFKA-17747: Add compute topic and group hash

36cb999

Signed-off-by: PoAn Yang <[email protected]>

greptile-apps bot reviewed Dec 15, 2025

View reviewed changes

github-actions bot added the build label Dec 15, 2025

codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 15, 2025

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

refacto-visz bot reviewed Dec 15, 2025

View reviewed changes

codeant-ai bot reviewed Dec 15, 2025

View reviewed changes

	.putLong(topicImage.id().hashCode()) // topic Id
	.putLong(topicImage.id().getMostSignificantBits()).putLong(topicImage.id().getLeastSignificantBits()) // topic Id

	.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
	.putLong(FOO_TOPIC_ID.getMostSignificantBits()).putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id

KAFKA-17747: Add compute topic and group hash #54

Are you sure you want to change the base?

KAFKA-17747: Add compute topic and group hash #54

Conversation

visz11 commented Dec 15, 2025 • edited by codeant-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai bot commented Dec 15, 2025

Thanks for using CodeAnt! 🎉

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

visz11 commented Dec 15, 2025

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Dec 15, 2025

Rate limit exceeded

Uh oh!

refacto-visz bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

refacto-visz bot commented Dec 15, 2025

Code Review: Group Hashing Implementation

PR Confidence Score: 🟨 4 / 5

Uh oh!

refacto-visz bot Dec 15, 2025

Choose a reason for hiding this comment

Incomplete UUID Hashing

Uh oh!

refacto-visz bot Dec 15, 2025

Choose a reason for hiding this comment

Hash Collision Risk

Uh oh!

refacto-visz bot Dec 15, 2025

Choose a reason for hiding this comment

Weak Hash Algorithm

Uh oh!

refacto-visz bot commented Dec 15, 2025

Uh oh!

codeant-ai bot commented Dec 15, 2025

Nitpicks 🔍

Uh oh!

codeant-ai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

visz11 commented Dec 15, 2025 •

edited by codeant-ai bot

Loading