-
Notifications
You must be signed in to change notification settings - Fork 0
KAFKA-19019: Add support for remote storage fetch for share groups #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…line build failure
WalkthroughThe changes introduce remote storage fetch support to the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DelayedShareFetch
participant ReplicaManager
participant RemoteLogManager
Client->>DelayedShareFetch: Initiate share fetch
alt Remote fetch required
DelayedShareFetch->>ReplicaManager: Schedule remote fetch
ReplicaManager->>RemoteLogManager: Start remote fetch task
RemoteLogManager-->>ReplicaManager: Remote fetch result (async)
ReplicaManager-->>DelayedShareFetch: Remote fetch completion
DelayedShareFetch->>Client: Complete fetch (with remote data)
else Only local fetch required
DelayedShareFetch->>ReplicaManager: Read from local log
ReplicaManager-->>DelayedShareFetch: Local log data
DelayedShareFetch->>Client: Complete fetch (with local data)
end
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.31.1)core/src/test/java/kafka/server/share/DelayedShareFetchTest.javaTip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
core/src/main/java/kafka/server/share/DelayedShareFetch.java (2)
724-733
: Pass an immutable copy when queueing follow‑up actions
topicIdPartitions
is passed directly to the lambda queued viareplicaManager.addToActionQueue
.
Because the originalSet
is typically a mutableLinkedHashSet
owned by the caller, later mutations
(e.g.clear()
) will change the contents seen by the queued runnable, producing stale or empty work.-replicaManager.addToActionQueue(() -> topicIdPartitions.forEach(topicIdPartition -> +Set<TopicIdPartition> tpCopy = new LinkedHashSet<>(topicIdPartitions); +replicaManager.addToActionQueue(() -> tpCopy.forEach(topicIdPartition -> replicaManager.completeDelayedShareFetchRequest( new DelayedShareFetchGroupKey(shareFetch.groupId(), topicIdPartition.topicId(), topicIdPartition.partition()))));
594-597
: ExposeremoteFetch
asOptional
instead of nullable
remoteFetch()
collapses theOptional
and returnsnull
when absent, re‑introducing a nullable
contract that callers must special‑case.-RemoteFetch remoteFetch() { - return remoteFetchOpt.orElse(null); -} +Optional<RemoteFetch> remoteFetch() { + return remoteFetchOpt; +}This keeps the null‑free invariant and simplifies call sites (
ifPresent
,orElse
…).core/src/test/java/kafka/server/share/DelayedShareFetchTest.java (1)
1760-1764
:remoteFetch
field in test‑builder is never setThe builder keeps a
final Optional<DelayedShareFetch.RemoteFetch> remoteFetch = Optional.empty()
and
always passes that toDelayedShareFetch
. All tests therefore exercise only the production code path
that creates theRemoteFetch
internally.If you intended to inject a pre‑fabricated
RemoteFetch
for deterministic testing, expose a
withRemoteFetch(...)
setter; otherwise drop the field to avoid confusion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/src/main/java/kafka/server/share/DelayedShareFetch.java
(11 hunks)core/src/test/java/kafka/server/share/DelayedShareFetchTest.java
(11 hunks)
🔇 Additional comments (1)
core/src/main/java/kafka/server/share/DelayedShareFetch.java (1)
682-708
:❌ Incorrect review comment
Potential data race on
remoteFetchOpt
/remoteStorageFetchException
remoteFetchOpt
andremoteStorageFetchException
are mutated intryComplete
(thread‑pool threads) and
read inmaybeCompletePendingRemoteFetch
as well asonComplete
without any synchronisation orvolatile
declaration.DelayedOperation.tryComplete(...)
may run concurrently on multiple threads, so these two
fields can be observed in a half‑initialised state causing:
NullPointerException
(remoteFetchOpt.get()
when another thread cleared it),- lost wake‑up (one thread completes the future while another still sees
!isDone()
).Guard the two fields with the inherited
lock
or mark themvolatile
/AtomicReference
.
🏁 Script executed:
#!/bin/bash grep -n "class DelayedShareFetch" core/src/main/java/kafka/server/share/DelayedShareFetch.java grep -n "remoteFetchOpt" core/src/main/java/kafka/server/share/DelayedShareFetch.java grep -n "remoteStorageFetchException" core/src/main/java/kafka/server/share/DelayedShareFetch.java grep -n "tryComplete" core/src/main/java/kafka/server/share/DelayedShareFetch.java grep -n "maybeCompletePendingRemoteFetch" core/src/main/java/kafka/server/share/DelayedShareFetch.java grep -n "onComplete" core/src/main/java/kafka/server/share/DelayedShareFetch.javaLength of output: 3971
#!/bin/bash # Locate DelayedOperation class find core/src -type f -name "DelayedOperation.java" # Display the beginning to inspect locking and structure DELAYED_OP_PATH=$(find core/src -type f -name "DelayedOperation.java" | head -n1) echo "=== $DELAYED_OP_PATH ===" sed -n '1,200p' "$DELAYED_OP_PATH" # Show where tryComplete is implemented in DelayedOperation grep -R "tryComplete" -n "$DELAYED_OP_PATH" sed -n '200,350p' "$DELAYED_OP_PATH" # Inspect usage of 'lock' in DelayedOperation grep -R "lock" -n "$DELAYED_OP_PATH"
#!/bin/bash set -e FILE="core/src/main/java/kafka/server/share/DelayedShareFetch.java" echo "=== maybeCompletePendingRemoteFetch (lines 660-740) ===" sed -n '660,740p' "$FILE" echo echo "=== tryComplete (lines 270-360) ===" sed -n '270,360p' "$FILE" echo echo "=== onComplete (lines 180-260) ===" sed -n '180,260p' "$FILE" echo echo "Searching for lock usage in DelayedShareFetch:" grep -R "lock" -n "$FILE"
#!/bin/bash set -e # 1. Show mutations of remoteFetchOpt FILE="core/src/main/java/kafka/server/share/DelayedShareFetch.java" echo "=== remoteFetchOpt assignments and updates (lines 140-240) ===" sed -n '140,240p' "$FILE" echo # 2. Show where remoteStorageFetchException is set echo "=== remoteStorageFetchException mutations (lines 600-700) ===" sed -n '600,700p' "$FILE" echo # 3. Inspect DelayedOperation for lock usage OP_PATH=$(find core/src -type f -name "DelayedOperation.java" | head -n1) echo "=== DelayedOperation file: $OP_PATH (first 200 lines) ===" sed -n '1,200p' "$OP_PATH" echo echo "=== 'lock' declarations and usage in DelayedOperation ===" grep -n "lock" "$OP_PATH"
No data race – fields are already guarded by DelayedOperation’s lock
All accesses and mutations of
remoteFetchOpt
andremoteStorageFetchException
occur inside overriddentryComplete
,maybeCompletePendingRemoteFetch
, andonComplete
methods, which DelayedOperation invokes under its internal ReentrantLock. Since these methods (and their helper calls) run only while holding that lock, there is no risk of observing a half‑initialized state or lost wake‑up. Please disregard the synchronization suggestion.Likely an incorrect or invalid review comment.
remoteFetchTask = replicaManager.remoteLogManager().get().asyncRead( | ||
remoteStorageFetchInfo, | ||
result -> { | ||
remoteFetchResult.complete(result); | ||
replicaManager.completeDelayedShareFetchRequest(new DelayedShareFetchGroupKey(shareFetch.groupId(), remoteFetchTopicIdPartition.topicId(), remoteFetchTopicIdPartition.partition())); | ||
} | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle absence of RemoteLogManager
to avoid NoSuchElementException
replicaManager.remoteLogManager().get()
is called without checking whether the Option
is empty.
If tiered storage is disabled – or it is disabled temporarily because the broker transitions to offline – this will throw NoSuchElementException
, short‑circuit the fetch and bubble up as an unhandled runtime exception.
-remoteFetchTask = replicaManager.remoteLogManager().get().asyncRead(
+if (replicaManager.remoteLogManager().isEmpty()) {
+ remoteStorageFetchException = Optional.of(
+ new IllegalStateException("RemoteLogManager not available for remote fetch"));
+ throw remoteStorageFetchException.get();
+}
+remoteFetchTask = replicaManager.remoteLogManager().get().asyncRead(
Fail fast with a meaningful error or return an error‐code for the partition instead of crashing.
Committable suggestion skipped: line range outside the PR's diff.
What
This PR adds the support for remote storage fetch for share groups.
Limitation
There is a limitation in remote storage fetch for consumer groups that
we can only perform remote fetch for a single topic partition in a fetch
request. Since, the logic of share fetch requests is largely based on
how consumer
groups work, we are following similar logic in implementing remote
storage fetch. However, this problem
should be addressed as part of KAFKA-19133 which should help us perform
fetch for multiple remote fetch topic partition in a single share fetch
request.
Testing
I have followed the AK
documentation
to test my code locally (by adopting
LocalTieredStorage.java
) and withthe help of unit tests.
Summary by CodeRabbit
New Features
Bug Fixes
Tests