fix: claim clone approach to prevent duplicate clones#1101
fix: claim clone approach to prevent duplicate clones#1101
Conversation
Amp-Thread-ID: https://ampcode.com/threads/T-019d2ec4-6418-72cd-a5a9-8b7e73604d27 Co-authored-by: Amp <amp@ampcode.com>
Manual Deploy AvailableYou can trigger a manual deploy of this PR branch to testnet: Alternative: Comment
Comment updated automatically when the PR is synchronized. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request implements an in-flight clone coordination mechanism in Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs`:
- Around line 183-206: The mutex lock in check_dedup_cache uses .unwrap() on
self.dedup_cache.lock(), which must be either justified or handled; replace the
.unwrap() with explicit handling of PoisonError (e.g., match
self.dedup_cache.lock() { Ok(guard) => guard, Err(poison) => { /* recover: log
warning about poisoned mutex and continue with poison.into_inner() */
poison.into_inner() } }) or, if you decide to keep the assumption, add a short
justification comment immediately above the lock call referencing the invariant
that the mutex can only be poisoned by a panic and that recovery via
into_inner() is acceptable in this context; reference the function name
check_dedup_cache and the symbol self.dedup_cache.lock() when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 83702121-3c76-458b-a67e-4dd053afa2f9
📒 Files selected for processing (11)
magicblock-chainlink/src/chainlink/fetch_cloner/mod.rsmagicblock-chainlink/src/chainlink/fetch_cloner/pipeline.rsmagicblock-chainlink/src/chainlink/fetch_cloner/tests.rsmagicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/cloner/mod.rsmagicblock-chainlink/src/remote_account_provider/mod.rsmagicblock-chainlink/src/submux/mod.rsmagicblock-chainlink/src/testing/cloner_stub.rsmagicblock-chainlink/tests/utils/test_context.rstest-integration/test-chainlink/src/ixtest_context.rstest-integration/test-chainlink/src/test_context.rs
👮 Files not reviewed due to content moderation or server errors (8)
- magicblock-chainlink/src/cloner/mod.rs
- magicblock-chainlink/tests/utils/test_context.rs
- test-integration/test-chainlink/src/ixtest_context.rs
- test-integration/test-chainlink/src/test_context.rs
- magicblock-chainlink/src/chainlink/mod.rs
- magicblock-chainlink/src/chainlink/fetch_cloner/pipeline.rs
- magicblock-chainlink/src/submux/mod.rs
- magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs
There was a problem hiding this comment.
♻️ Duplicate comments (1)
magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs (1)
195-197:⚠️ Potential issue | 🟠 MajorHandle
dedup_cachelock poisoning explicitly.Line 197 still panics on a poisoned mutex. Please recover or document the invariant instead of unwrapping.
🛠️ Minimal fix
let now = Instant::now(); - let mut cache = self.dedup_cache.lock().unwrap(); + let mut cache = match self.dedup_cache.lock() { + Ok(cache) => cache, + Err(poisoned) => { + warn!( + pubkey = %pubkey, + slot, + "dedup cache lock poisoned; recovering" + ); + poisoned.into_inner() + } + };As per coding guidelines: "Treat any usage of
.unwrap()or.expect()in production Rust code as a MAJOR issue."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs` around lines 195 - 197, The check_dedup_cache function currently calls self.dedup_cache.lock().unwrap(), which will panic on a poisoned mutex; change this to explicitly handle Mutex::lock() returning Err(PoisonError) by matching the result (e.g., match self.dedup_cache.lock() { Ok(mut cache) => ..., Err(poisoned) => { let mut cache = poisoned.into_inner(); /* log a warning about poison */ ... } }) so the function recovers and continues using the inner cache instead of panicking; ensure the code in check_dedup_cache still uses the recovered cache to perform the dedup logic and consider logging the mutex poisoning event for observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs`:
- Around line 195-197: The check_dedup_cache function currently calls
self.dedup_cache.lock().unwrap(), which will panic on a poisoned mutex; change
this to explicitly handle Mutex::lock() returning Err(PoisonError) by matching
the result (e.g., match self.dedup_cache.lock() { Ok(mut cache) => ...,
Err(poisoned) => { let mut cache = poisoned.into_inner(); /* log a warning about
poison */ ... } }) so the function recovers and continues using the inner cache
instead of panicking; ensure the code in check_dedup_cache still uses the
recovered cache to perform the dedup logic and consider logging the mutex
poisoning event for observability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8d473a45-068c-4d09-8302-3394029a7dc2
📒 Files selected for processing (4)
magicblock-chainlink/src/chainlink/fetch_cloner/mod.rsmagicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/remote_account_provider/mod.rsmagicblock-chainlink/src/submux/mod.rs
* master: feat: action callbacks (#1061)
* master: fix: allow not existing accounts in actions (#1120) fix: published npm wrapper (#1115) release: 0.8.5 (#1113) fix: actions dropping on eata cold cloning (#1112) fix (CloneAccount): enforce incoming_slot > current_slot when cloning an account (#1110) fix: flaky restore_ledger integration test (#1107) Fix: remove conditions that allow to overrides undelegating ata (trough mapped eata) (#1105) feat: add logs for startup time on chainlink and committer service (#1106)
Amp-Thread-ID: https://ampcode.com/threads/T-019d42ff-82a6-7116-abae-51c4fa8edbd7 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d4302-ab64-70d3-9e6b-a165de6885a7 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d4d64-bf55-741b-a335-5bcebf7399ea Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d4d64-bf55-741b-a335-5bcebf7399ea Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d4d64-bf55-741b-a335-5bcebf7399ea Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d4d64-bf55-741b-a335-5bcebf7399ea Co-authored-by: Amp <amp@ampcode.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs (1)
217-250:⚠️ Potential issue | 🟠 MajorHandle poisoned
pending_cloneslocks withoutexpect().Lines 219 and 246 currently panic the clone-coordination path on mutex poison. This is production code, so please recover/report explicitly instead of using
expect().Suggested handling
- let mut map = self - .pending_clones - .lock() - .expect("pending_clones mutex poisoned"); + let mut map = match self.pending_clones.lock() { + Ok(guard) => guard, + Err(poison) => { + warn!("pending_clones mutex poisoned in claim_pending_clone; recovering"); + poison.into_inner() + } + }; @@ - let mut map = self - .pending_clones - .lock() - .expect("pending_clones mutex poisoned"); + let mut map = match self.pending_clones.lock() { + Ok(guard) => guard, + Err(poison) => { + warn!("pending_clones mutex poisoned in finish_pending_clone; recovering"); + poison.into_inner() + } + };As per coding guidelines: "Treat any usage of
.unwrap()or.expect()in production Rust code as a MAJOR issue. These should not be categorized as trivial or nit-level concerns. Request proper error handling or explicit justification with invariants."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs` around lines 217 - 250, Replace the panic-on-poisoned-mutex usage in claim_pending_clone and finish_pending_clone: instead of .lock().expect("pending_clones mutex poisoned") match on the Result from .lock(), log/report the PoisonError and recover by calling PoisonError::into_inner() to obtain the guard so the method can continue (for claim_pending_clone continue with creating/inserting the Vec and returning CloneClaim::Owner or CloneClaim::Waiter as before; for finish_pending_clone obtain the guard, remove the key and notify waiters). Reference the pending_clones mutex and the functions claim_pending_clone and finish_pending_clone when making the change so you handle both lock sites consistently and avoid any .expect()/unwrap() usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs`:
- Around line 261-296: clone_account_with_ownership currently dedupes based only
on (pubkey, slot) via claim_pending_clone, which discards differing
AccountCloneRequest fields (e.g., commit_frequency_ms) and can lose commit
scheduling metadata; fix by changing the dedupe key or call to
claim_pending_clone to include the request's materially significant fields
(e.g., pass AccountCloneRequest or a computed dedupe_key that includes pubkey,
slot and commit_frequency_ms/other relevant fields) and/or merge differing
requests before claiming ownership (compute a merged AccountCloneRequest and use
that for cloning and for finish_pending_clone) so that CloneClaim::Waiter
receives equivalent semantics to CloneClaim::Owner; update references to
claim_pending_clone, finish_pending_clone, clone_account_with_ownership, and the
AccountCloneRequest handling to use the richer key/merged request.
- Around line 268-276: Wrap the clone task’s critical section with a drop guard
type (e.g., PendingCloneGuard) that on Drop calls finish_pending_clone(pubkey,
slot, CloneCompletion::Failed) unless it has been explicitly marked completed;
construct the guard immediately after claim_pending_clone(pubkey, slot) and
dismiss it after you call finish_pending_clone(...) on success/failure so that
if the async await of self.cloner.clone_account(request).await is cancelled or
panics the guard will notify waiters. Also replace the pending_clones mutex uses
that currently call .expect() (the calls around claim_pending_clone and
finish_pending_clone) with proper poisoning handling (e.g., propagate the lock
error as a Result or recover via into_inner() with explicit documentation) or
add an explicit comment documenting the invariant if you intentionally allow
poisoning—do not use .expect() so mutex poison is handled explicitly.
---
Duplicate comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs`:
- Around line 217-250: Replace the panic-on-poisoned-mutex usage in
claim_pending_clone and finish_pending_clone: instead of
.lock().expect("pending_clones mutex poisoned") match on the Result from
.lock(), log/report the PoisonError and recover by calling
PoisonError::into_inner() to obtain the guard so the method can continue (for
claim_pending_clone continue with creating/inserting the Vec and returning
CloneClaim::Owner or CloneClaim::Waiter as before; for finish_pending_clone
obtain the guard, remove the key and notify waiters). Reference the
pending_clones mutex and the functions claim_pending_clone and
finish_pending_clone when making the change so you handle both lock sites
consistently and avoid any .expect()/unwrap() usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 63c8e43c-1724-4925-9c83-71834563e080
📒 Files selected for processing (8)
magicblock-chainlink/src/chainlink/fetch_cloner/mod.rsmagicblock-chainlink/src/chainlink/fetch_cloner/pipeline.rsmagicblock-chainlink/src/chainlink/fetch_cloner/tests.rsmagicblock-chainlink/src/remote_account_provider/mod.rsmagicblock-chainlink/src/testing/cloner_stub.rsmagicblock-chainlink/tests/utils/test_context.rstest-integration/test-chainlink/src/test_context.rstest-integration/test-chainlink/tests/ix_remote_account_provider.rs
* master: chore: improve fees claiming (#1129)
Amp-Thread-ID: https://ampcode.com/threads/T-019d5151-75ea-7328-b115-9b33be6525b9 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d5158-aadf-7594-bdc4-6e2da97080b4 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d516d-56c7-77e9-91c0-c5b38aa0af7f Co-authored-by: Amp <amp@ampcode.com>
Summary
Fixes a race condition where both the on-demand fetch path and the subscription update path independently submit clone requests for the same (pubkey, slot), resulting in duplicate clone transactions. The fix adds a pending-clone ownership mechanism inside FetchCloner so the first caller performs the clone while subsequent callers wait for the result.
Details
Problem
When an account is being fetched on-demand while a subscription update arrives for the same account at the same slot, both paths pass their independent checks and call
clone_account(), producing duplicate work.Initial Attempt to Fix
The original approach (shared SubMux dedup cache part of first commits in this PR, practically now reverted) did not properly work:
A cache hit only means "the subscription path touched this entry," not "the account was successfully cloned into bank."
This caused delegated accounts to silently never materialize.
Solution
FetchCloner now maintains a
pending_clones: HashMap<(Pubkey, u64), Vec<Sender<CloneCompletion>>>. The first caller toclaim_pending_clonefor a given (pubkey, slot) becomes the owner and performs the actual clone. Any subsequent caller for the same key becomes a waiter and receives the outcome (success/failure) via a oneshot channel. After the clone completes,finish_pending_cloneremoves the entry and notifies all waiters.Details
Summary
Fixes a race condition where both the on-demand fetch path and the subscription update path independently submit clone requests for the same (pubkey, slot), resulting in duplicate clone transactions. The fix shares SubMuxClient's existing dedup cache with FetchCloner so both paths coordinate on (pubkey, slot) deduplication.
Details
Problem
When an account is being fetched on-demand while a subscription update arrives for the same account at the same slot, both paths pass their independent checks and call
clone_account(), producing duplicate work.Solution
SubMuxClient already maintains a
(Pubkey, u64) → Instantdedup cache and inserts entries viashould_forward_dedup()before forwarding subscription updates. This PR exposes that cache and passes it into FetchCloner, which checks it inclone_accounts_and_programs()before cloning. If the subscription path already inserted an entry, the fetch path skips its clone (and vice versa).Key Changes
dedup_cache()anddedup_window()accessors to expose the shared cachetry_from_urls_and_configandtry_new_from_endpointsnow return the dedup cache and window alongside the providerDedupCacheanddedup_window, with acheck_dedup_cache()method that does atomic check-and-insert. The dedup filter runs in the pipeline'sclone_accounts_and_programs()stepCloneto support test assertionstest_fetch_subscription_race_duplicate_clonethat reproduces the race and verifies only one clone request is submittedSummary by CodeRabbit