Conversation
Manual Deploy AvailableYou can trigger a manual deploy of this PR branch to testnet: Alternative: Comment
Comment updated automatically when the PR is synchronized. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR removes the accountsdb snapshot-frequency config, migrates snapshot persistence to per-slot compressed archives ( Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.gitignore:
- Around line 34-35: The .gitignore currently uses global basename patterns
"config.json" and "config.toml" which will ignore those files anywhere in the
repo; change these to more specific, local-scoped patterns (for example a
specific directory like "secrets/config.json" or a dev-only filename like
"config.local.json" or prefix with a directory such as "dev/config.toml") so
that tracked templates (e.g., config.json in example/ or templates/) are not
masked—update the entries that reference "config.json" and "config.toml" in the
.gitignore to narrowed names or paths that reflect the intended secret/dev
override files.
In `@magicblock-replicator/README.md`:
- Line 49: Replace the misspelled word "replys" in the standby behavior sentence
("Consumes events from the stream and replys them locally") with the correct
term "replays" so the sentence reads "Consumes events from the stream and
replays them locally" in README.md.
- Around line 7-22: The README's fenced ASCII-art code block is missing a
language tag (triggering MD040); update the opening triple-backtick fence in
README.md to include a plain-language tag such as ```text (or ```plaintext) so
the block is explicitly marked as text—locate the ASCII diagram fenced by ```
and add the language identifier to the opening fence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 922ddb0c-6e71-4b11-81bb-5ccb43acf562
📒 Files selected for processing (2)
.gitignoremagicblock-replicator/README.md
| config.json | ||
| config.toml |
There was a problem hiding this comment.
Scope these ignore rules to avoid masking valid tracked config files.
config.json and config.toml here are global basename patterns, so they ignore matching files anywhere in the repo. That can silently block committing legitimate config files (examples, fixtures, package configs). Prefer local-only patterns (for secrets/dev overrides) and keep templates tracked.
Suggested narrowing
# AI related
**/CLAUDE.md
-config.json
-config.toml
+/config.local.json
+/config.local.toml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.gitignore around lines 34 - 35, The .gitignore currently uses global
basename patterns "config.json" and "config.toml" which will ignore those files
anywhere in the repo; change these to more specific, local-scoped patterns (for
example a specific directory like "secrets/config.json" or a dev-only filename
like "config.local.json" or prefix with a directory such as "dev/config.toml")
so that tracked templates (e.g., config.json in example/ or templates/) are not
masked—update the entries that reference "config.json" and "config.toml" in the
.gitignore to narrowed names or paths that reflect the intended secret/dev
override files.
| ``` | ||
| ┌─────────────┐ | ||
| │ Service │ | ||
| └──────┬──────┘ | ||
| ┌────────────┴────────────┐ | ||
| ▼ ▼ | ||
| ┌─────────┐ ┌─────────┐ | ||
| │ Primary │ ← ─ ─ ─ ─ ─ → │ Standby │ | ||
| └────┬────┘ └────┬────┘ | ||
| │ │ | ||
| ┌───┴───┐ ┌───┴───┐ | ||
| │Publish│ │Consume│ | ||
| │Upload │ │Apply │ | ||
| │Refresh│ │Verify │ | ||
| └───────┘ └───────┘ | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced block to satisfy Markdown linting.
Line 7 opens a fenced code block without a language tag (```), which triggers MD040. Use text (or plaintext) for the ASCII diagram.
Suggested patch
-```
+```text
┌─────────────┐
│ Service │
└──────┬──────┘
┌────────────┴────────────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ Primary │ ← ─ ─ ─ ─ ─ → │ Standby │
└────┬────┘ └────┬────┘
│ │
┌───┴───┐ ┌───┴───┐
│Publish│ │Consume│
│Upload │ │Apply │
│Refresh│ │Verify │
└───────┘ └───────┘</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-replicator/README.md` around lines 7 - 22, The README's fenced
ASCII-art code block is missing a language tag (triggering MD040); update the
opening triple-backtick fence in README.md to include a plain-language tag such
as ```text (or ```plaintext) so the block is explicitly marked as text—locate
the ASCII diagram fenced by ``` and add the language identifier to the opening
fence.
|
|
||
| ### Standby Node | ||
|
|
||
| - Consumes events from the stream and replys them locally |
There was a problem hiding this comment.
Fix typo in standby behavior description.
Line 49 says “replys”; this should be “replays” (or “replies,” depending on intent). Based on surrounding text (“consume and replay events”), “replays” appears correct.
Suggested patch
-- Consumes events from the stream and replys them locally
+- Consumes events from the stream and replays them locally📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Consumes events from the stream and replys them locally | |
| - Consumes events from the stream and replays them locally |
🧰 Tools
🪛 LanguageTool
[grammar] ~49-~49: Ensure spelling is correct
Context: ... - Consumes events from the stream and replys them locally - Watches the leader lock ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-replicator/README.md` at line 49, Replace the misspelled word
"replys" in the standby behavior sentence ("Consumes events from the stream and
replys them locally") with the correct term "replays" so the sentence reads
"Consumes events from the stream and replays them locally" in README.md.
3238fb7 to
cb7cff5
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (3)
.gitignore (1)
34-35:⚠️ Potential issue | 🟠 MajorScope ignore patterns to avoid hiding legitimate config files repo-wide.
Line 34 and Line 35 use global basename rules, so any
config.json/config.tomlanywhere in the repo will be ignored. Narrow these to intended local/dev-secret filenames or paths.Suggested narrowing
# AI related **/CLAUDE.md -config.json -config.toml +/config.local.json +/config.local.toml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 34 - 35, The .gitignore currently uses broad basename patterns "config.json" and "config.toml" which ignore those filenames repository-wide; replace them with scoped paths or more specific names (for example a local/dev-secret filename like "config.local.json", a folder-specific path like "secrets/config.toml" or "./.env/config.json") so only the intended local/dev secret files are ignored—update the entries that currently list "config.json" and "config.toml" to the narrowed names/paths used by your tooling.magicblock-replicator/README.md (2)
7-7:⚠️ Potential issue | 🟡 MinorAdd a language tag to the fenced code block.
The fenced code block is missing a language specifier, which triggers the MD040 linting rule. Add
text(orplaintext) to the opening fence.📝 Proposed fix
-``` +```text ┌─────────────┐ │ Service │ └──────┬──────┘🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-replicator/README.md` at line 7, The fenced code block in README.md is missing a language tag and triggers MD040; update the opening triple-backtick fence for the ASCII diagram to include a language specifier (e.g., change ``` to ```text or ```plaintext) so the markdown linter accepts the block and preserves plain text formatting.
49-49:⚠️ Potential issue | 🟡 MinorFix the typo in the standby behavior description.
Line 49 contains "replys" which should be "replays" (consuming and replaying events locally).
📝 Proposed fix
-- Consumes events from the stream and replys them locally +- Consumes events from the stream and replays them locally🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-replicator/README.md` at line 49, Replace the misspelled word "replys" with "replays" in the README sentence that currently reads "Consumes events from the stream and replys them locally" so it becomes "Consumes events from the stream and replays them locally"; search for that exact phrase in the README to locate and update the standby behavior description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In @.gitignore:
- Around line 34-35: The .gitignore currently uses broad basename patterns
"config.json" and "config.toml" which ignore those filenames repository-wide;
replace them with scoped paths or more specific names (for example a
local/dev-secret filename like "config.local.json", a folder-specific path like
"secrets/config.toml" or "./.env/config.json") so only the intended local/dev
secret files are ignored—update the entries that currently list "config.json"
and "config.toml" to the narrowed names/paths used by your tooling.
In `@magicblock-replicator/README.md`:
- Line 7: The fenced code block in README.md is missing a language tag and
triggers MD040; update the opening triple-backtick fence for the ASCII diagram
to include a language specifier (e.g., change ``` to ```text or ```plaintext) so
the markdown linter accepts the block and preserves plain text formatting.
- Line 49: Replace the misspelled word "replys" with "replays" in the README
sentence that currently reads "Consumes events from the stream and replys them
locally" so it becomes "Consumes events from the stream and replays them
locally"; search for that exact phrase in the README to locate and update the
standby behavior description.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1627e5bf-a6f2-4907-83dc-e59e87a79e12
📒 Files selected for processing (2)
.gitignoremagicblock-replicator/README.md
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 235-246: The validate_archive function only verifies gzip/tar
structure and not AccountsDb content integrity; update the function
(validate_archive) to include a clear TODO comment explaining that content
validation / checksums should be added in future when the snapshot format is
extended (e.g., include per-file or archive-wide checksums and verify entries
via Archive::entries()/read) and reference relevant types used (GzDecoder,
Archive, AccountsDbResult) so future implementers know where to add checksum
verification logic.
- Around line 248-268: The code uses unwrap() on registry.remove(index) in
find_and_remove_snapshot which must be replaced with explicit error handling;
before removing, verify the index is within registry.len() and return
AccountsDbError::SnapshotMissing(target_slot) if out of bounds, then safely
remove (or clone via registry.get(index).cloned()) into chosen_archive and
proceed with the existing parse_slot/error flow; reference the function
find_and_remove_snapshot and the registry.remove(index) call to locate the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e33ed757-35cf-4d57-b14e-0e1e808870a1
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locktest-integration/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (22)
config.example.tomlmagicblock-accounts-db/Cargo.tomlmagicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/snapshot.rsmagicblock-accounts-db/src/tests.rsmagicblock-config/src/config/accounts.rsmagicblock-config/src/consts.rsmagicblock-config/src/tests.rstest-integration/configs/api-conf.ephem.tomltest-integration/configs/chainlink-conf.devnet.tomltest-integration/configs/claim-fees-test.tomltest-integration/configs/cloning-conf.devnet.tomltest-integration/configs/cloning-conf.ephem.tomltest-integration/configs/committor-conf.devnet.tomltest-integration/configs/config-conf.devnet.tomltest-integration/configs/restore-ledger-conf.devnet.tomltest-integration/configs/schedulecommit-conf-fees.ephem.tomltest-integration/configs/schedulecommit-conf.devnet.tomltest-integration/configs/schedulecommit-conf.ephem.frequent-commits.tomltest-integration/configs/schedulecommit-conf.ephem.tomltest-integration/configs/validator-offline.devnet.tomltest-integration/test-ledger-restore/src/lib.rs
💤 Files with no reviewable changes (17)
- test-integration/configs/chainlink-conf.devnet.toml
- test-integration/configs/config-conf.devnet.toml
- test-integration/configs/committor-conf.devnet.toml
- test-integration/configs/schedulecommit-conf.ephem.frequent-commits.toml
- test-integration/configs/schedulecommit-conf.devnet.toml
- test-integration/configs/cloning-conf.ephem.toml
- test-integration/configs/claim-fees-test.toml
- test-integration/configs/api-conf.ephem.toml
- test-integration/configs/schedulecommit-conf.ephem.toml
- test-integration/configs/restore-ledger-conf.devnet.toml
- config.example.toml
- test-integration/configs/cloning-conf.devnet.toml
- magicblock-config/src/consts.rs
- magicblock-config/src/tests.rs
- test-integration/configs/validator-offline.devnet.toml
- test-integration/configs/schedulecommit-conf-fees.ephem.toml
- magicblock-config/src/config/accounts.rs
| /// Finds the best snapshot for target_slot, removes from registry. | ||
| /// Returns (archive_path, slot, index). | ||
| fn find_and_remove_snapshot( | ||
| &self, | ||
| target_slot: u64, | ||
| ) -> AccountsDbResult<(PathBuf, u64, usize)> { | ||
| let mut registry = self.registry.lock(); | ||
| let search_key = self.slot_to_archive_path(target_slot); | ||
| let index = match registry.binary_search(&search_key) { | ||
| Ok(i) => i, | ||
| Err(i) if i > 0 => i - 1, | ||
| _ => return Err(AccountsDbError::SnapshotMissing(target_slot)), | ||
| }; | ||
|
|
||
| let chosen_path = registry.remove(index).unwrap(); | ||
| let chosen_slot = Self::parse_slot(&chosen_path) | ||
| let chosen_archive = registry.remove(index).unwrap(); | ||
| let chosen_slot = Self::parse_slot(&chosen_archive) | ||
| .ok_or(AccountsDbError::SnapshotMissing(target_slot))?; | ||
|
|
||
| info!( | ||
| chosen_slot = chosen_slot, | ||
| target_slot = target_slot, | ||
| "Restoring snapshot" | ||
| ); | ||
|
|
||
| // 2. Prune Invalidated Futures | ||
| // Any snapshot strictly newer than the chosen one is now on a diverging timeline. | ||
| for invalidated in registry.drain(index..) { | ||
| warn!( | ||
| invalidated_path = %invalidated.display(), | ||
| "Pruning invalidated snapshot" | ||
| ); | ||
| let _ = fs::remove_dir_all(&invalidated); | ||
| } | ||
| info!(chosen_slot, target_slot, "Restoring snapshot"); | ||
| Ok((chosen_archive, chosen_slot, index)) | ||
| } |
There was a problem hiding this comment.
Use proper error handling instead of .unwrap() on line 262.
The .unwrap() on registry.remove(index) violates the coding guidelines for production Rust code in magicblock-*. While the index is validated by the preceding binary search logic, the guidelines require explicit error handling.
🔧 Proposed fix
- let chosen_archive = registry.remove(index).unwrap();
+ let chosen_archive = registry.remove(index).ok_or_else(|| {
+ AccountsDbError::Internal("Snapshot registry index out of bounds".into())
+ })?;As per coding guidelines: {magicblock-*,programs,storage-proto}/**: Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 248 - 268, The code uses
unwrap() on registry.remove(index) in find_and_remove_snapshot which must be
replaced with explicit error handling; before removing, verify the index is
within registry.len() and return AccountsDbError::SnapshotMissing(target_slot)
if out of bounds, then safely remove (or clone via registry.get(index).cloned())
into chosen_archive and proceed with the existing parse_slot/error flow;
reference the function find_and_remove_snapshot and the registry.remove(index)
call to locate the change.
2ebffec to
6af735c
Compare
6af735c to
0adfe05
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
magicblock-accounts-db/src/lib.rs (1)
424-433:⚠️ Potential issue | 🟠 MajorDon't hide LMDB scan failures behind a checksum.
This walks
iter_all(), anditer_all()turnsget_all_accounts()errors into an empty iterator. A failed index scan therefore produces a valid-looking hash of zero accounts instead of surfacing the error. For divergence detection, this path should be fallible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/lib.rs` around lines 424 - 433, The checksum function currently hides LMDB scan failures because iter_all() converts get_all_accounts() errors into an empty iterator; change checksum (unsafe fn checksum(&self) -> u64) to return a fallible result (e.g. -> Result<u64, E>) and propagate any error from the underlying scan instead of treating it as empty. Locate checksum and the iterator producer iter_all()/get_all_accounts(), make iter_all propagate or return an iterator over Result items (or have checksum call get_all_accounts() directly and return Err on failure), and only compute/finish the xxhash3_64::Hasher when the scan succeeds so real LMDB errors surface to callers.
♻️ Duplicate comments (3)
magicblock-accounts-db/src/snapshot.rs (1)
250-263:⚠️ Potential issue | 🟠 MajorReplace the
unwrap()in snapshot selection.The search narrows the candidate index, but this still needs an explicit error path in production code.
Suggested patch
- let chosen_archive = registry.remove(index).unwrap(); + let chosen_archive = registry + .remove(index) + .ok_or(AccountsDbError::SnapshotMissing(target_slot))?;As per coding guidelines:
{magicblock-*,programs,storage-proto}/**: Treat any usage of.unwrap()or.expect()in production Rust code as a MAJOR issue. These should not be categorized as trivial or nit-level concerns. Request proper error handling or explicit justification with invariants.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 250 - 263, The code in find_and_remove_snapshot uses registry.remove(index).unwrap(), which can panic in production; replace this unwrap with explicit error handling: perform registry.remove(index) into an Option, return Err(AccountsDbError::SnapshotMissing(target_slot)) (or a more specific AccountsDbError if appropriate) when None, otherwise bind the removed value to chosen_archive and continue to call Self::parse_slot(&chosen_archive) as before; ensure the error path covers both removal failure and any parse_slot failure by propagating/parsing errors consistently.magicblock-replicator/README.md (2)
49-49:⚠️ Potential issue | 🟡 MinorFix the standby behavior typo.
replysshould bereplayshere.Suggested patch
-- Consumes events from the stream and replys them locally +- Consumes events from the stream and replays them locally🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-replicator/README.md` at line 49, Update the README sentence containing the phrase "Consumes events from the stream and replys them locally" to correct the typo by replacing "replys" with "replays" so the line reads "Consumes events from the stream and replays them locally"; ensure any other occurrences of "replys" in the repository are similarly corrected.
7-22:⚠️ Potential issue | 🟡 MinorAdd a language tag to the ASCII-art fence.
The opening fence is still bare, so Markdown lint will keep flagging it.
Suggested patch
-``` +```text ┌─────────────┐ │ Service │ └──────┬──────┘ ┌────────────┴────────────┐ ▼ ▼ ┌─────────┐ ┌─────────┐ │ Primary │ ← ─ ─ ─ ─ ─ → │ Standby │ └────┬────┘ └────┬────┘ │ │ ┌───┴───┐ ┌───┴───┐ │Publish│ │Consume│ │Upload │ │Apply │ │Refresh│ │Verify │ └───────┘ └───────┘</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@magicblock-replicator/README.mdaround lines 7 - 22, The Markdown code fence
for the ASCII-art diagram currently has no language tag which triggers lint
warnings; update the opening fence (the triple backticks that begin the ASCII
block) to include a language tag (e.g., "text") so it reads ```text, leaving the
ASCII art content and the closing triple backticks unchanged to satisfy the
linter.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@magicblock-accounts-db/src/lib.rs:
- Around line 295-318: take_snapshot currently computes and returns a checksum
even when snapshot_manager.create_snapshot_dir(slot, used_storage) fails; make
take_snapshot fallible (e.g., return Result<u64, E> or Option) and return
early on phase‑1 errors instead of a successful checksum. Specifically, call
snapshot_manager.create_snapshot_dir(slot, used_storage) while you still hold
the write lock (or otherwise check its Result) and if it returns Err propagate
that error immediately; only compute the checksum (unsafe { this.checksum() })
and spawn the background thread that calls archive_and_register(&dir) when
create_snapshot_dir succeeded, and update callers of take_snapshot to handle the
new Result/Option return type.In
@magicblock-accounts-db/src/snapshot.rs:
- Around line 220-230: The code currently calls find_and_remove_snapshot() which
mutates the in-memory registry before extract_archive() and atomic_swap(),
causing the snapshot to be evicted even if restore fails; change the flow so
restore_from_snapshot finds the snapshot without removing it (or modify
find_and_remove_snapshot to only mark/return the candidate but not mutate) and
only remove/prune the registry entry and fs::remove_file(&chosen_archive) after
atomic_swap() completes successfully; keep using the same identifiers
(chosen_archive, chosen_slot, index) but defer
prune_invalidated_snapshots(index) and the file removal until after
extract_archive() and atomic_swap() succeed so failed restores leave the
snapshot selectable for retry.- Around line 103-119: create_snapshot_dir is currently calling prune_registry
before reliably inserting the new snapshot into the registry, and the registry
is appended unsorted which breaks find_and_remove_snapshot's binary_search when
archives complete out of order; change the registration step to insert the new
snapshot into the registry at the correct sorted position (use binary_search on
slot to compute insertion index rather than push_back) and only call
prune_registry after the insertion/reporting succeeds (i.e., after execute
returns Ok and after adding the snapshot to the registry), enforcing
max_snapshots post-registration so the registry remains sorted and pruning never
removes the last valid snapshot prematurely.- Around line 239-245: The validate_archive function currently only constructs
the tar::Archive iterator which is lazy; update validate_archive(bytes: &[u8])
to fully iterate and drain the archive so gzip/tar decoding errors surface:
create the Cursor and GzDecoder as before (Cursor::new(bytes),
GzDecoder::new(cursor)), call tar.entries()? to get the iterator, then loop over
entries (e.g., for entry in tar.entries()? { let mut e = entry.map_err(...)?;
read or drain each entry's contents to EOF such as by copying to /dev/null or
read_to_end into a buffer) so any corrupt/truncated data causes an error which
you propagate (using the same AccountsDbResult error path and log_err wrapper
used today). Ensure you preserve the current error message context ("Invalid
snapshot archive: not a valid gzip tar") when mapping/propagating errors from
entries or reading entry bodies.
Outside diff comments:
In@magicblock-accounts-db/src/lib.rs:
- Around line 424-433: The checksum function currently hides LMDB scan failures
because iter_all() converts get_all_accounts() errors into an empty iterator;
change checksum (unsafe fn checksum(&self) -> u64) to return a fallible result
(e.g. -> Result<u64, E>) and propagate any error from the underlying scan
instead of treating it as empty. Locate checksum and the iterator producer
iter_all()/get_all_accounts(), make iter_all propagate or return an iterator
over Result items (or have checksum call get_all_accounts() directly and return
Err on failure), and only compute/finish the xxhash3_64::Hasher when the scan
succeeds so real LMDB errors surface to callers.
Duplicate comments:
In@magicblock-accounts-db/src/snapshot.rs:
- Around line 250-263: The code in find_and_remove_snapshot uses
registry.remove(index).unwrap(), which can panic in production; replace this
unwrap with explicit error handling: perform registry.remove(index) into an
Option, return Err(AccountsDbError::SnapshotMissing(target_slot)) (or a more
specific AccountsDbError if appropriate) when None, otherwise bind the removed
value to chosen_archive and continue to call Self::parse_slot(&chosen_archive)
as before; ensure the error path covers both removal failure and any parse_slot
failure by propagating/parsing errors consistently.In
@magicblock-replicator/README.md:
- Line 49: Update the README sentence containing the phrase "Consumes events
from the stream and replys them locally" to correct the typo by replacing
"replys" with "replays" so the line reads "Consumes events from the stream and
replays them locally"; ensure any other occurrences of "replys" in the
repository are similarly corrected.- Around line 7-22: The Markdown code fence for the ASCII-art diagram currently
has no language tag which triggers lint warnings; update the opening fence (the
triple backticks that begin the ASCII block) to include a language tag (e.g.,
"text") so it reads ```text, leaving the ASCII art content and the closing
triple backticks unchanged to satisfy the linter.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: ASSERTIVE **Plan**: Pro **Run ID**: `41a69455-5c17-4cf8-8d93-c56fec1092eb` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 2ebffec80ea133496ade8e4e550c24ebad2ffea4 and 0adfe05eed385ba44458ce1fd92efc7c35e3633c. </details> <details> <summary>⛔ Files ignored due to path filters (2)</summary> * `Cargo.lock` is excluded by `!**/*.lock` * `test-integration/Cargo.lock` is excluded by `!**/*.lock` </details> <details> <summary>📒 Files selected for processing (24)</summary> * `.gitignore` * `config.example.toml` * `magicblock-accounts-db/Cargo.toml` * `magicblock-accounts-db/src/lib.rs` * `magicblock-accounts-db/src/snapshot.rs` * `magicblock-accounts-db/src/tests.rs` * `magicblock-config/src/config/accounts.rs` * `magicblock-config/src/consts.rs` * `magicblock-config/src/tests.rs` * `magicblock-replicator/README.md` * `test-integration/configs/api-conf.ephem.toml` * `test-integration/configs/chainlink-conf.devnet.toml` * `test-integration/configs/claim-fees-test.toml` * `test-integration/configs/cloning-conf.devnet.toml` * `test-integration/configs/cloning-conf.ephem.toml` * `test-integration/configs/committor-conf.devnet.toml` * `test-integration/configs/config-conf.devnet.toml` * `test-integration/configs/restore-ledger-conf.devnet.toml` * `test-integration/configs/schedulecommit-conf-fees.ephem.toml` * `test-integration/configs/schedulecommit-conf.devnet.toml` * `test-integration/configs/schedulecommit-conf.ephem.frequent-commits.toml` * `test-integration/configs/schedulecommit-conf.ephem.toml` * `test-integration/configs/validator-offline.devnet.toml` * `test-integration/test-ledger-restore/src/lib.rs` </details> <details> <summary>💤 Files with no reviewable changes (17)</summary> * test-integration/configs/committor-conf.devnet.toml * test-integration/configs/schedulecommit-conf.devnet.toml * test-integration/configs/schedulecommit-conf.ephem.frequent-commits.toml * test-integration/configs/config-conf.devnet.toml * test-integration/configs/restore-ledger-conf.devnet.toml * test-integration/configs/api-conf.ephem.toml * test-integration/configs/chainlink-conf.devnet.toml * test-integration/configs/cloning-conf.devnet.toml * magicblock-config/src/consts.rs * config.example.toml * test-integration/configs/schedulecommit-conf-fees.ephem.toml * test-integration/configs/validator-offline.devnet.toml * test-integration/configs/claim-fees-test.toml * test-integration/configs/schedulecommit-conf.ephem.toml * magicblock-config/src/config/accounts.rs * test-integration/configs/cloning-conf.ephem.toml * magicblock-config/src/tests.rs </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| pub fn take_snapshot(self: &Arc<Self>, slot: u64) -> u64 { | ||
| let this = self.clone(); | ||
|
|
||
| // Phase 1: Create snapshot directory (with write lock) | ||
| let locked = this.write_lock.write(); | ||
| this.flush(); | ||
| // SAFETY: | ||
| // we have acquired the write lock above | ||
| let checksum = unsafe { this.checksum() }; | ||
| let used_storage = this.storage.active_segment(); | ||
|
|
||
| let snapshot_dir = this | ||
| .snapshot_manager | ||
| .create_snapshot_dir(slot, used_storage); | ||
| drop(locked); | ||
| thread::spawn(move || { | ||
| // Acquire write lock to ensure consistent state capture | ||
| let write_guard = this.write_lock.write(); | ||
| this.flush(); | ||
|
|
||
| // Capture the active memory map region for the snapshot | ||
| let used_storage = this.storage.active_segment(); | ||
|
|
||
| let _ = this.snapshot_manager.create_snapshot( | ||
| slot, | ||
| used_storage, | ||
| write_guard, | ||
| ); | ||
| // Phase 2: Archive directory (no lock needed) | ||
| let _ = snapshot_dir | ||
| .and_then(|dir| { | ||
| this.snapshot_manager.archive_and_register(&dir) | ||
| }) | ||
| .log_err(|| "failed to create accountsdb snapshot"); | ||
| }); | ||
| checksum |
There was a problem hiding this comment.
Don't return a checksum when phase 1 already failed.
create_snapshot_dir() can fail before the background thread is spawned, but take_snapshot() still returns a checksum as if the snapshot request succeeded. That hides local snapshot-I/O failures from the replication flow. Please make this API fallible and return early on phase-1 errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/lib.rs` around lines 295 - 318, take_snapshot
currently computes and returns a checksum even when
snapshot_manager.create_snapshot_dir(slot, used_storage) fails; make
take_snapshot fallible (e.g., return Result<u64, E> or Option<u64>) and return
early on phase‑1 errors instead of a successful checksum. Specifically, call
snapshot_manager.create_snapshot_dir(slot, used_storage) while you still hold
the write lock (or otherwise check its Result) and if it returns Err propagate
that error immediately; only compute the checksum (unsafe { this.checksum() })
and spawn the background thread that calls archive_and_register(&dir) when
create_snapshot_dir succeeded, and update callers of take_snapshot to handle the
new Result/Option return type.
| pub fn create_snapshot_dir( | ||
| &self, | ||
| slot: u64, | ||
| active_mem: &[u8], | ||
| lock: RwLockWriteGuard<()>, | ||
| ) -> AccountsDbResult<()> { | ||
| let snap_path = self.generate_path(slot); | ||
|
|
||
| // 1. Maintain retention policy | ||
| self.prune_registry(); | ||
|
|
||
| // 2. Prepare Data Capture | ||
| // If legacy copy, we must capture state while lock is held. | ||
| ) -> AccountsDbResult<PathBuf> { | ||
| let snap_path = self.slot_to_dir_path(slot); | ||
| let memory_capture = | ||
| matches!(self.strategy, SnapshotStrategy::LegacyCopy) | ||
| .then(|| active_mem.to_vec()) | ||
| .unwrap_or_default(); | ||
|
|
||
| // 3. Execute Snapshot | ||
| self.strategy | ||
| .execute(&self.db_path, &snap_path, memory_capture, lock) | ||
| .execute(&self.db_path, &snap_path, memory_capture) | ||
| .log_err(|| "Snapshot failed")?; | ||
| self.prune_registry(); | ||
|
|
||
| // 4. Register success | ||
| self.registry.lock().push_back(snap_path); | ||
| Ok(()) | ||
| Ok(snap_path) |
There was a problem hiding this comment.
Keep the registry sorted and prune only after successful registration.
find_and_remove_snapshot() relies on binary_search(), but push_back() only stays ordered if archives register in slot order. Background archiving can complete out of order, and the current pre-archive prune can delete the last good snapshot before the new archive is durable. Insert by sorted position and enforce max_snapshots after registration succeeds.
Suggested direction
pub fn create_snapshot_dir(
&self,
slot: u64,
active_mem: &[u8],
) -> AccountsDbResult<PathBuf> {
@@
self.strategy
.execute(&self.db_path, &snap_path, memory_capture)
.log_err(|| "Snapshot failed")?;
- self.prune_registry();
Ok(snap_path)
}
@@
fn register_archive(&self, archive_path: PathBuf) {
info!(archive_path = %archive_path.display(), "Snapshot registered");
- self.registry.lock().push_back(archive_path);
+ let mut registry = self.registry.lock();
+ let index = registry.binary_search(&archive_path).unwrap_or_else(|i| i);
+ registry.insert(index, archive_path);
+ while registry.len() > self.max_snapshots {
+ let Some(path) = registry.pop_front() else {
+ break;
+ };
+ if let Err(e) = fs::remove_file(&path) {
+ warn!(path = %path.display(), error = ?e, "Failed to prune snapshot archive");
+ }
+ }
}Also applies to: 326-329
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 103 - 119,
create_snapshot_dir is currently calling prune_registry before reliably
inserting the new snapshot into the registry, and the registry is appended
unsorted which breaks find_and_remove_snapshot's binary_search when archives
complete out of order; change the registration step to insert the new snapshot
into the registry at the correct sorted position (use binary_search on slot to
compute insertion index rather than push_back) and only call prune_registry
after the insertion/reporting succeeds (i.e., after execute returns Ok and after
adding the snapshot to the registry), enforcing max_snapshots post-registration
so the registry remains sorted and pruning never removes the last valid snapshot
prematurely.
| pub fn restore_from_snapshot( | ||
| &self, | ||
| target_slot: u64, | ||
| ) -> AccountsDbResult<u64> { | ||
| let mut registry = self.registry.lock(); | ||
| let (chosen_archive, chosen_slot, index) = | ||
| self.find_and_remove_snapshot(target_slot)?; | ||
|
|
||
| let extracted_dir = self.extract_archive(&chosen_archive)?; | ||
| self.atomic_swap(&extracted_dir)?; | ||
| self.prune_invalidated_snapshots(index); | ||
| let _ = fs::remove_file(&chosen_archive); |
There was a problem hiding this comment.
Don't evict the chosen snapshot before restore commits.
find_and_remove_snapshot() mutates the in-memory registry before extract_archive() and atomic_swap() run. If either step fails, the archive is still on disk but the current process can no longer pick it for another restore attempt. Remove it from the registry only after the swap succeeds.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 220 - 230, The code
currently calls find_and_remove_snapshot() which mutates the in-memory registry
before extract_archive() and atomic_swap(), causing the snapshot to be evicted
even if restore fails; change the flow so restore_from_snapshot finds the
snapshot without removing it (or modify find_and_remove_snapshot to only
mark/return the candidate but not mutate) and only remove/prune the registry
entry and fs::remove_file(&chosen_archive) after atomic_swap() completes
successfully; keep using the same identifiers (chosen_archive, chosen_slot,
index) but defer prune_invalidated_snapshots(index) and the file removal until
after extract_archive() and atomic_swap() succeed so failed restores leave the
snapshot selectable for retry.
| fn validate_archive(bytes: &[u8]) -> AccountsDbResult<()> { | ||
| let cursor = Cursor::new(bytes); | ||
| let dec = GzDecoder::new(cursor); | ||
| let mut tar = Archive::new(dec); | ||
| tar.entries() | ||
| .log_err(|| "Invalid snapshot archive: not a valid gzip tar")?; | ||
| Ok(()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
tmp="$(mktemp)"
if ! curl -fsSL 'https://docs.rs/crate/tar/latest/source/src/archive.rs?plain=1' > "$tmp"; then
curl -fsSL https://raw.githubusercontent.com/alexcrichton/tar-rs/master/src/archive.rs > "$tmp"
fi
rg -n -A6 -B2 'pub fn entries' "$tmp"Repository: magicblock-labs/magicblock-validator
Length of output: 62
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Try alternative URLs for tar-rs source code
echo "Attempting to fetch tar-rs source from alternative locations..."
tmp1="$(mktemp)"
tmp2="$(mktemp)"
tmp3="$(mktemp)"
# Try GitHub releases/main branch
curl -fsSL "https://raw.githubusercontent.com/alexcrichton/tar-rs/main/src/archive.rs" > "$tmp1" 2>/dev/null || echo "main branch failed"
# Try crates.io source link
curl -fsSL "https://docs.rs/tar/*/source/src/archive.rs" > "$tmp2" 2>/dev/null || echo "docs.rs failed"
# Check if any succeeded and search for entries
for tmp in "$tmp1" "$tmp2"; do
if [ -s "$tmp" ]; then
echo "=== Found tar source, searching for entries() ==="
rg -n -A10 'pub fn entries' "$tmp" || echo "No match found"
break
fi
doneRepository: magicblock-labs/magicblock-validator
Length of output: 1021
🌐 Web query:
tar-rs Archive::entries() implementation lazy evaluation
💡 Result:
Yes, the tar-rs Archive::entries implementation uses lazy evaluation. It returns a struct Entries<'a, R> that implements Iterator<Item = io::Result<Entry<'a, R>>>. The iterator's next method is implemented by delegating to self.fields.next, where EntriesFields holds references to the Archive's reader (via RefCell) and position tracking state. Entries are yielded sequentially as the iterator advances, reading headers and preparing lazy Entry readers (EntryFields with EntryIo variants like Data(EntryData) that read from the shared underlying reader on demand via Read impl). This design ensures streaming/lazy processing without loading the entire archive into memory upfront, but requires sequential consumption to avoid corrupting entry contents due to shared mutable reader state. For seekable readers, entries_with_seek uses seeking to skip data more efficiently while maintaining laziness.
Citations:
- 1: https://docs.rs/tar/0.4.13/tar/struct.Entries.html
- 2: https://docs.rs/tar/latest/tar/struct.Archive.html
- 3: https://docs.rs/tar/latest/src/tar/archive.rs.html
- 4: https://docs.rs/tar/latest/tar/struct.Entries.html
- 5: https://github.com/alexcrichton/tar-rs
validate_archive() doesn't actually walk the archive.
tar::Archive::entries() is lazy; constructing the iterator does not force a full tar/gzip read. Truncated or corrupt archives can still pass this check and only fail later during extract/restore. Iterate the entries and drain them before accepting the bytes.
Possible fix
fn validate_archive(bytes: &[u8]) -> AccountsDbResult<()> {
let cursor = Cursor::new(bytes);
let dec = GzDecoder::new(cursor);
let mut tar = Archive::new(dec);
- tar.entries()
- .log_err(|| "Invalid snapshot archive: not a valid gzip tar")?;
+ let entries = tar
+ .entries()
+ .log_err(|| "Invalid snapshot archive: not a valid gzip tar")?;
+ for entry in entries {
+ let mut entry = entry
+ .log_err(|| "Invalid snapshot archive: failed to read tar entry")?;
+ io::copy(&mut entry, &mut io::sink())
+ .log_err(|| "Invalid snapshot archive: truncated tar entry")?;
+ }
Ok(())
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn validate_archive(bytes: &[u8]) -> AccountsDbResult<()> { | |
| let cursor = Cursor::new(bytes); | |
| let dec = GzDecoder::new(cursor); | |
| let mut tar = Archive::new(dec); | |
| tar.entries() | |
| .log_err(|| "Invalid snapshot archive: not a valid gzip tar")?; | |
| Ok(()) | |
| fn validate_archive(bytes: &[u8]) -> AccountsDbResult<()> { | |
| let cursor = Cursor::new(bytes); | |
| let dec = GzDecoder::new(cursor); | |
| let mut tar = Archive::new(dec); | |
| let entries = tar | |
| .entries() | |
| .log_err(|| "Invalid snapshot archive: not a valid gzip tar")?; | |
| for entry in entries { | |
| let mut entry = entry | |
| .log_err(|| "Invalid snapshot archive: failed to read tar entry")?; | |
| io::copy(&mut entry, &mut io::sink()) | |
| .log_err(|| "Invalid snapshot archive: truncated tar entry")?; | |
| } | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 239 - 245, The
validate_archive function currently only constructs the tar::Archive iterator
which is lazy; update validate_archive(bytes: &[u8]) to fully iterate and drain
the archive so gzip/tar decoding errors surface: create the Cursor and GzDecoder
as before (Cursor::new(bytes), GzDecoder::new(cursor)), call tar.entries()? to
get the iterator, then loop over entries (e.g., for entry in tar.entries()? {
let mut e = entry.map_err(...)?; read or drain each entry's contents to EOF such
as by copying to /dev/null or read_to_end into a buffer) so any
corrupt/truncated data causes an error which you propagate (using the same
AccountsDbResult error path and log_err wrapper used today). Ensure you preserve
the current error message context ("Invalid snapshot archive: not a valid gzip
tar") when mapping/propagating errors from entries or reading entry bodies.
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
magicblock-config/src/config/validator.rs (1)
27-37:⚠️ Potential issue | 🟠 MajorPreserve backward compatibility for
ReplicationModeconfig values.This serde rename policy changes accepted enum strings (
Standalone/StandBy/ReplicateOnly→ kebab-case forms). Existing configs can fail deserialization and prevent startup. Please add aliases (or migration handling) for legacy values.Proposed compatibility patch
#[derive(Deserialize, Serialize, Debug, Clone)] #[serde(rename_all = "kebab-case")] pub enum ReplicationMode { + #[serde(alias = "Standalone")] Standalone, + #[serde(alias = "StandBy")] StandBy(Url, SerdePubkey), + #[serde(alias = "ReplicateOnly")] ReplicateOnly(Url, SerdePubkey), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-config/src/config/validator.rs` around lines 27 - 37, The enum ReplicationMode now uses serde(rename_all = "kebab-case") which breaks configs that use the legacy variant names; update ReplicationMode to accept legacy names by adding serde alias attributes to each variant (e.g., add #[serde(alias = "Standalone")] to the Standalone variant, #[serde(alias = "StandBy")] to StandBy(Url, SerdePubkey), and #[serde(alias = "ReplicateOnly")] to ReplicateOnly(Url, SerdePubkey)) so deserialization accepts both the new kebab-case names and the old legacy names; this keeps backward compatibility without a custom deserializer.
♻️ Duplicate comments (1)
magicblock-accounts-db/src/lib.rs (1)
294-317:⚠️ Potential issue | 🟠 MajorPropagate phase-1 snapshot failures instead of returning a checksum.
create_snapshot_dir()can fail before the archiver starts, buttake_snapshot()still returns a successful checksum and only logs the error later from the spawned thread. That hides local snapshot-I/O failures from the replication path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/lib.rs` around lines 294 - 317, The current take_snapshot(self: &Arc<Self>, slot: u64) -> u64 hides phase-1 errors because create_snapshot_dir can fail but the function still returns checksum; change take_snapshot to return a Result<u64, E> (or appropriate SnapshotError) so phase-1 I/O failures are propagated. Concretely: call this.snapshot_manager.create_snapshot_dir(slot, used_storage) and if it returns Err, return Err immediately instead of spawning the thread; if Ok(dir) then spawn the thread that calls this.snapshot_manager.archive_and_register(&dir) and finally return Ok(checksum). Update all callers of take_snapshot to handle the Result. Use the existing symbols take_snapshot, create_snapshot_dir, snapshot_manager.archive_and_register to locate and apply the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/ISSUE_TEMPLATE/bug.yml:
- Around line 26-37: The deployment dropdown (type: dropdown, id: deployment)
should capture extra details when "Other" is selected; add an optional free-text
field (e.g., type: input, id: deployment_other, attributes.label: "If Other,
describe deployment") and wire it so it's shown/available when deployment ==
"Other" with no required validation, ensuring the new field's id and label are
descriptive for triage and easy to locate in the template.
In @.github/ISSUE_TEMPLATE/good-first-issue.yml:
- Around line 22-30: The acceptance criteria placeholder under the form field
with id "acceptance" is too minimal; replace the generic "- [ ]" placeholder
with a brief, explicit example checklist to help contributors (e.g., "- [ ]
Reproduced the bug; - [ ] Added tests; - [ ] Updated README"). Update the
"placeholder" attribute for id: acceptance to include that concrete example text
so new issues show a useful, copyable acceptance-criteria template.
In @.github/PULL_REQUEST_TEMPLATE.md:
- Around line 1-10: The file begins with an HTML comment before the first
markdown heading which triggers markdownlint MD041; fix this by ensuring the
document starts with a top-level H1 before any comments—either add a leading "#
Pull Request" (or similar) as the first line, or move the existing title-rule
HTML comment to follow the "## Summary" line; update the file around the
existing "## Summary" heading and the HTML comment so the first non-empty line
is the H1.
- Around line 13-15: The markdown heading "## Breaking Changes" is immediately
followed by list items; insert a single blank line after the "## Breaking
Changes" heading so the heading and the subsequent checklist are separated
(i.e., add one empty line between the "## Breaking Changes" line and the "- [ ]
None" list item).
In `@magicblock-accounts-db/src/tests.rs`:
- Around line 231-242: The test contains a duplicated archive
existence/assertion block that reconstructs archive_path using
env.snapshot_manager.database_path() and SNAPSHOT_SLOT and repeats the same
assert!(archive_path.exists()) and assert!(archive_path.is_file()); remove the
redundant second block so only the original assertions remain (locate the
duplicated code that defines archive_path and the two asserts and delete that
duplicate).
In `@magicblock-replicator/src/nats/broker.rs`:
- Around line 137-144: The current code in broker.rs materializes the entire
snapshot into memory by creating a Vec with capacity info.size and calling
object.read_to_end(&mut data) in the Snapshot construction; instead, change
Snapshot handling to stream the object to disk (or provide a streaming reader)
rather than allocating info.size bytes: open a temporary file (or return a file
path/AsyncRead impl) and use async streaming copy (e.g., tokio::fs::File +
tokio::io::copy or tokio_util::io helpers) to write the object body
incrementally, set self.sequence = meta.sequence as before, and construct
Snapshot to reference the temp file/reader rather than an in-memory Vec (update
the Snapshot type accordingly where used).
- Around line 85-101: The init_resources() function currently always calls
create_object_store() and create_key_value(), which fail if the buckets already
exist; change it to use a get-first-or-create pattern: call
ctx.get_object_store(cfg::SNAPSHOTS) and only call ctx.create_object_store(...)
if the get returns a NotFound error (propagate other errors), and likewise call
ctx.get_key_value(cfg::PRODUCER_LOCK) and only call ctx.create_key_value(...)
when NotFound is returned; also update the init_resources() docstring (around
the current line 34) to state that resources are created idempotently by first
attempting to get the existing object store and KV and creating them only on
NotFound.
In `@magicblock-replicator/src/nats/consumer.rs`:
- Around line 61-75: The messages() loop currently retries
self.inner.stream().max_messages_per_batch(cfg::BATCH_SIZE).messages().await in
a tight loop and will spam warnings on JetStream outages; change it to either
(A) return a Result<MessageStream, Error> so the caller can decide
backoff/shutdown by updating the signature of messages() and propagating the
Err(error) instead of swallowing it, or (B) implement an exponential backoff
with jitter inside messages() (use tokio::time::sleep, start with a small delay
and cap it) and log backoff attempts before retrying; locate the retry logic in
messages(), reference self.inner.stream() and cfg::BATCH_SIZE, and apply one of
these fixes consistently (do not both swallow errors and busy-loop).
In `@magicblock-replicator/src/nats/lock_watcher.rs`:
- Around line 20-37: The new(...) constructor in lock_watcher.rs currently
hot-loops on errors from broker.ctx.get_key_value and store.watch; add a backoff
delay between retries to avoid log flooding and CPU spin. Modify the loop in
pub(crate) async fn new(broker: &Broker) to track a retry delay (e.g., start
250–500ms, multiply on each failure up to a max like 5–10s, optionally add small
jitter), and await tokio::time::sleep(delay) before continuing after any Err
branch (both the get_key_value and watch error handling), resetting the delay to
the base value on success (e.g., after Ok(s) or Ok(w)). Ensure you reference the
same function name new and the variables broker, store, and watch so the change
is localized to this retry loop.
- Around line 41-59: The method wait_for_expiry currently treats the watch
stream ending as a successful expiry; instead change its contract to surface
stream termination as an error so callers can distinguish transport failure from
a real Delete/Purge. Update pub async fn wait_for_expiry(&mut self) to return a
Result<(), E> (or a concrete error type you use) and, inside the loop, return
Ok(()) only on matches!(operation, Operation::Delete | Operation::Purge) while
returning Err(...) when self.watch.next().await yields None (and include context
in the error); keep the existing warn! log but do not return success on stream
end. Use the symbols wait_for_expiry, self.watch.next(), and Operation::Delete |
Operation::Purge to locate and modify the code.
In `@magicblock-replicator/src/service/context.rs`:
- Around line 98-101: The formatted error message in context.rs constructs msg
with expected and actual checksum reversed; update the format string in the let
msg = format!(...) call (referencing msg, sb.slot and sb.checksum) so it prints
"expected {expected}, got {actual}" with the expected value being the known good
checksum and the actual value coming from sb.checksum (i.e., swap the
placeholders/arguments or reorder the interpolation) to correctly reflect
expected vs. observed values.
- Around line 121-128: The methods enter_replica_mode and enter_primary_mode
currently ignore send errors (let _ = ...) which can silently fail; change both
to return Result<(), crate::ErrorType> (or anyhow::Result) and propagate the
error from self.mode_tx.send(...).await so callers can handle failures; update
callers (into_primary and into_standby) to await and propagate or handle the
Result (i.e., bubble up errors from enter_replica_mode/enter_primary_mode and
adjust into_primary/into_standby signatures to return Result as needed) so
mode-change send failures are not dropped.
In `@magicblock-replicator/src/service/mod.rs`:
- Around line 110-115: The current-thread Tokio runtime is built without drivers
and the build error is unconditionally panicked; update the runtime construction
(where Builder::new_current_thread() is used) to enable required drivers (at
minimum enable_time() and enable_io() or simply enable_all()) so Primary::run()
and Standby::run() can safely use tokio::time::interval and broker I/O, and
change the .expect("Failed to build replication service runtime") to propagate
the Result error (return Err(...) or map the error into the function's Result)
rather than panicking so callers can handle build failures instead of aborting;
locate and modify the block that builds the runtime and the call to
runtime.block_on(self.run()) accordingly.
In `@magicblock-replicator/src/service/primary.rs`:
- Around line 82-90: The code advances the replay position unconditionally
(calling ctx.update_position(slot, index)) even for messages that require a
durable JetStream ack (ack == true when Message::SuperBlock), but
Broker::publish only waits for PubAck when ack is true; move the call to
ctx.update_position so it only happens after a successful durable publish
acknowledgement for messages with ack==true (i.e., await publish and, if ack was
requested and publish succeeded, then call ctx.update_position(slot, index));
use the existing ack variable, msg.slot_and_index(), Broker::publish() result,
and Message::SuperBlock to locate and implement this conditional update.
- Around line 42-47: The code currently drops the message when publish() fails
because msg has already been popped from self.messages; before calling
self.ctx.into_standby(...), requeue or persist the failed msg so it isn't lost:
inside the Some(msg) = self.messages.recv() branch, on Err(error) from
self.publish(msg).await attempt to push the same msg back onto self.messages
(e.g., using its async send or try_send equivalent) or write it to a durable
retry queue, logging any requeue error, and only then call
self.ctx.into_standby(self.messages, true). Ensure you reference the
recv()/publish() failure path and use the same msg instance (or clone it if
necessary) so the event stream can be replayed after demotion.
In `@magicblock-replicator/src/service/standby.rs`:
- Around line 70-75: The watcher currently conflates a real lock expiry with an
unexpected watch-stream end; update LockWatcher::wait_for_expiry() to return a
discriminated result (e.g., Result<ExpiryKind, WatcherError> or an enum like
Expiry::Deleted | Expiry::StreamEnded) so callers can distinguish a genuine
Delete/Purge from stream termination, then modify the standby loop that calls
self.watcher.wait_for_expiry() to match on that result: on a genuine expiry,
proceed to call self.ctx.try_acquire_producer().await and on success call
self.ctx.into_primary(...).await as before, but on StreamEnded/Err log
appropriately and stop/recreate the watcher (do not spin on
try_acquire_producer()); reference symbols: LockWatcher::wait_for_expiry,
self.watcher, self.ctx.try_acquire_producer(), and self.ctx.into_primary.
---
Outside diff comments:
In `@magicblock-config/src/config/validator.rs`:
- Around line 27-37: The enum ReplicationMode now uses serde(rename_all =
"kebab-case") which breaks configs that use the legacy variant names; update
ReplicationMode to accept legacy names by adding serde alias attributes to each
variant (e.g., add #[serde(alias = "Standalone")] to the Standalone variant,
#[serde(alias = "StandBy")] to StandBy(Url, SerdePubkey), and #[serde(alias =
"ReplicateOnly")] to ReplicateOnly(Url, SerdePubkey)) so deserialization accepts
both the new kebab-case names and the old legacy names; this keeps backward
compatibility without a custom deserializer.
---
Duplicate comments:
In `@magicblock-accounts-db/src/lib.rs`:
- Around line 294-317: The current take_snapshot(self: &Arc<Self>, slot: u64) ->
u64 hides phase-1 errors because create_snapshot_dir can fail but the function
still returns checksum; change take_snapshot to return a Result<u64, E> (or
appropriate SnapshotError) so phase-1 I/O failures are propagated. Concretely:
call this.snapshot_manager.create_snapshot_dir(slot, used_storage) and if it
returns Err, return Err immediately instead of spawning the thread; if Ok(dir)
then spawn the thread that calls
this.snapshot_manager.archive_and_register(&dir) and finally return
Ok(checksum). Update all callers of take_snapshot to handle the Result. Use the
existing symbols take_snapshot, create_snapshot_dir,
snapshot_manager.archive_and_register to locate and apply the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b98b3800-f56d-4a35-94ad-664b6bb69c80
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
.github/ISSUE_TEMPLATE/bug.yml.github/ISSUE_TEMPLATE/feature.yml.github/ISSUE_TEMPLATE/good-first-issue.yml.github/PULL_REQUEST_TEMPLATE.mdCargo.tomlmagicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/tests.rsmagicblock-api/src/magic_validator.rsmagicblock-config/src/config/validator.rsmagicblock-core/src/link/transactions.rsmagicblock-processor/src/scheduler/mod.rsmagicblock-processor/src/scheduler/state.rsmagicblock-replicator/Cargo.tomlmagicblock-replicator/src/error.rsmagicblock-replicator/src/lib.rsmagicblock-replicator/src/nats/broker.rsmagicblock-replicator/src/nats/consumer.rsmagicblock-replicator/src/nats/lock_watcher.rsmagicblock-replicator/src/nats/mod.rsmagicblock-replicator/src/nats/producer.rsmagicblock-replicator/src/nats/snapshot.rsmagicblock-replicator/src/proto.rsmagicblock-replicator/src/service/context.rsmagicblock-replicator/src/service/mod.rsmagicblock-replicator/src/service/primary.rsmagicblock-replicator/src/service/standby.rsmagicblock-replicator/src/tests.rsmagicblock-replicator/src/watcher.rstest-kit/src/lib.rs
| - type: dropdown | ||
| id: deployment | ||
| attributes: | ||
| label: Deployment | ||
| options: | ||
| - Local (native) | ||
| - Local (Docker) | ||
| - Testnet | ||
| - Mainnet | ||
| - Other | ||
| validations: | ||
| required: true |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Optional: capture details when deployment is “Other”.
Adding a small free-text field can reduce back-and-forth during triage for non-standard environments.
Suggested addition
- type: dropdown
id: deployment
attributes:
label: Deployment
options:
- Local (native)
- Local (Docker)
- Testnet
- Mainnet
- Other
validations:
required: true
+
+ - type: input
+ id: deployment_details
+ attributes:
+ label: Deployment details (if Other)
+ placeholder: "Describe custom environment"
+ validations:
+ required: false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - type: dropdown | |
| id: deployment | |
| attributes: | |
| label: Deployment | |
| options: | |
| - Local (native) | |
| - Local (Docker) | |
| - Testnet | |
| - Mainnet | |
| - Other | |
| validations: | |
| required: true | |
| - type: dropdown | |
| id: deployment | |
| attributes: | |
| label: Deployment | |
| options: | |
| - Local (native) | |
| - Local (Docker) | |
| - Testnet | |
| - Mainnet | |
| - Other | |
| validations: | |
| required: true | |
| - type: input | |
| id: deployment_details | |
| attributes: | |
| label: Deployment details (if Other) | |
| placeholder: "Describe custom environment" | |
| validations: | |
| required: false |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/ISSUE_TEMPLATE/bug.yml around lines 26 - 37, The deployment dropdown
(type: dropdown, id: deployment) should capture extra details when "Other" is
selected; add an optional free-text field (e.g., type: input, id:
deployment_other, attributes.label: "If Other, describe deployment") and wire it
so it's shown/available when deployment == "Other" with no required validation,
ensuring the new field's id and label are descriptive for triage and easy to
locate in the template.
| - type: textarea | ||
| id: acceptance | ||
| attributes: | ||
| label: Acceptance Criteria | ||
| description: How do we know this is done? | ||
| placeholder: | | ||
| - [ ] | ||
| validations: | ||
| required: false |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider a more explicit acceptance-criteria example.
The current placeholder (- [ ]) is valid, but adding one concrete example usually improves issue quality for first-time contributors.
Suggested tweak
- type: textarea
id: acceptance
attributes:
label: Acceptance Criteria
description: How do we know this is done?
placeholder: |
- - [ ]
+ - [ ] Added/updated tests for the changed behavior
+ - [ ] Documentation updated (if applicable)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - type: textarea | |
| id: acceptance | |
| attributes: | |
| label: Acceptance Criteria | |
| description: How do we know this is done? | |
| placeholder: | | |
| - [ ] | |
| validations: | |
| required: false | |
| - type: textarea | |
| id: acceptance | |
| attributes: | |
| label: Acceptance Criteria | |
| description: How do we know this is done? | |
| placeholder: | | |
| - [ ] Added/updated tests for the changed behavior | |
| - [ ] Documentation updated (if applicable) | |
| validations: | |
| required: false |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/ISSUE_TEMPLATE/good-first-issue.yml around lines 22 - 30, The
acceptance criteria placeholder under the form field with id "acceptance" is too
minimal; replace the generic "- [ ]" placeholder with a brief, explicit example
checklist to help contributors (e.g., "- [ ] Reproduced the bug; - [ ] Added
tests; - [ ] Updated README"). Update the "placeholder" attribute for id:
acceptance to include that concrete example text so new issues show a useful,
copyable acceptance-criteria template.
| <!-- | ||
| PR title must match: | ||
| type(scope): summary | ||
| PR title must match: type(scope): summary | ||
| Types: feat|fix|docs|chore|refactor|test|perf|ci|build | ||
| Examples: | ||
| fix: avoid panic on empty slot | ||
| feat(rpc): add getFoo endpoint | ||
| --> | ||
|
|
||
| ## Summary | ||
| - | ||
| <!-- One sentence. Link to issue if applicable. --> |
There was a problem hiding this comment.
Make the first line a top-level heading to satisfy markdownlint
markdownlint warns that the file should start with an H1. Right now, an HTML comment appears before ## Summary, which triggers MD041. Move the title-rule comment below the first heading (or add a top-level heading at Line [1]).
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 9-9: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/PULL_REQUEST_TEMPLATE.md around lines 1 - 10, The file begins with
an HTML comment before the first markdown heading which triggers markdownlint
MD041; fix this by ensuring the document starts with a top-level H1 before any
comments—either add a leading "# Pull Request" (or similar) as the first line,
or move the existing title-rule HTML comment to follow the "## Summary" line;
update the file around the existing "## Summary" heading and the HTML comment so
the first non-empty line is the H1.
| ## Breaking Changes | ||
| - [ ] None | ||
| - [ ] Yes — migration path described below |
There was a problem hiding this comment.
Add a blank line after ## Breaking Changes
At Line [13], the heading is immediately followed by list items. Add one blank line below the heading to satisfy MD022 and keep formatting consistent.
Suggested patch
## Breaking Changes
+
- [ ] None
- [ ] Yes — migration path described below📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Breaking Changes | |
| - [ ] None | |
| - [ ] Yes — migration path described below | |
| ## Breaking Changes | |
| - [ ] None | |
| - [ ] Yes — migration path described below |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/PULL_REQUEST_TEMPLATE.md around lines 13 - 15, The markdown heading
"## Breaking Changes" is immediately followed by list items; insert a single
blank line after the "## Breaking Changes" heading so the heading and the
subsequent checklist are separated (i.e., add one empty line between the "##
Breaking Changes" line and the "- [ ] None" list item).
| // Verify archive file exists (not directory) | ||
| let archive_path = env | ||
| .snapshot_manager | ||
| .database_path() | ||
| .parent() | ||
| .unwrap() | ||
| .join(format!("snapshot-{:0>12}.tar.gz", SNAPSHOT_SLOT)); | ||
| assert!(archive_path.exists(), "Archive file should exist"); | ||
| assert!( | ||
| archive_path.is_file(), | ||
| "Snapshot should be a file, not directory" | ||
| ); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Remove the duplicated archive assertion block.
Lines 231-242 repeat the exact same path construction and assertions from Lines 218-229, so this adds no coverage and will drift on future filename changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/tests.rs` around lines 231 - 242, The test
contains a duplicated archive existence/assertion block that reconstructs
archive_path using env.snapshot_manager.database_path() and SNAPSHOT_SLOT and
repeats the same assert!(archive_path.exists()) and
assert!(archive_path.is_file()); remove the redundant second block so only the
original assertions remain (locate the duplicated code that defines archive_path
and the two asserts and delete that duplicate).
| let runtime = Builder::new_current_thread() | ||
| .thread_name("replication-service") | ||
| .build() | ||
| .expect("Failed to build replication service runtime"); | ||
|
|
||
| runtime.block_on(tokio::task::unconstrained(self.run())) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the file and lines in question
cat -n magicblock-replicator/src/service/mod.rs | sed -n '100,125p'Repository: magicblock-labs/magicblock-validator
Length of output: 822
🏁 Script executed:
# Check the Primary::run() method for tokio::time::interval usage
fd -t f "primary.rs" magicblock-replicator/ | head -5Repository: magicblock-labs/magicblock-validator
Length of output: 124
🏁 Script executed:
# Check the Standby::run() method
fd -t f "standby.rs" magicblock-replicator/ | head -5Repository: magicblock-labs/magicblock-validator
Length of output: 124
🏁 Script executed:
# Check the broker code for Tokio I/O
fd -t f "broker.rs" magicblock-replicator/ | head -5Repository: magicblock-labs/magicblock-validator
Length of output: 120
🏁 Script executed:
# Check Primary::run() for tokio::time::interval usage
cat -n magicblock-replicator/src/service/primary.rs | head -100Repository: magicblock-labs/magicblock-validator
Length of output: 3764
🏁 Script executed:
# Check Standby::run() for tokio::time::interval usage
cat -n magicblock-replicator/src/service/standby.rs | head -100Repository: magicblock-labs/magicblock-validator
Length of output: 4138
🏁 Script executed:
# Check broker.rs for async I/O
cat -n magicblock-replicator/src/nats/broker.rs | head -80Repository: magicblock-labs/magicblock-validator
Length of output: 3352
🏁 Script executed:
# Verify the spawn() return type and context
rg -A 5 "pub fn spawn" magicblock-replicator/src/service/mod.rsRepository: magicblock-labs/magicblock-validator
Length of output: 372
Build this runtime with Tokio drivers enabled and return the build failure.
This current-thread runtime is built without enable_all(), but Primary::run() (line 38) and Standby::run() (line 51) both use tokio::time::interval, and the broker relies on Tokio I/O. Without the time and I/O drivers, these will panic at runtime. Additionally, line 113's .expect() converts a recoverable error into a panic despite spawn() returning JoinHandle<Result<()>>, violating the MAJOR guideline for this codebase.
Suggested fix
let runtime = Builder::new_current_thread()
.thread_name("replication-service")
- .build()
- .expect("Failed to build replication service runtime");
+ .enable_all()
+ .build()
+ .map_err(|error| crate::Error::Internal(format!(
+ "failed to build replication service runtime: {error}"
+ )))?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let runtime = Builder::new_current_thread() | |
| .thread_name("replication-service") | |
| .build() | |
| .expect("Failed to build replication service runtime"); | |
| runtime.block_on(tokio::task::unconstrained(self.run())) | |
| let runtime = Builder::new_current_thread() | |
| .thread_name("replication-service") | |
| .enable_all() | |
| .build() | |
| .map_err(|error| crate::Error::Internal(format!( | |
| "failed to build replication service runtime: {error}" | |
| )))?; | |
| runtime.block_on(tokio::task::unconstrained(self.run())) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-replicator/src/service/mod.rs` around lines 110 - 115, The
current-thread Tokio runtime is built without drivers and the build error is
unconditionally panicked; update the runtime construction (where
Builder::new_current_thread() is used) to enable required drivers (at minimum
enable_time() and enable_io() or simply enable_all()) so Primary::run() and
Standby::run() can safely use tokio::time::interval and broker I/O, and change
the .expect("Failed to build replication service runtime") to propagate the
Result error (return Err(...) or map the error into the function's Result)
rather than panicking so callers can handle build failures instead of aborting;
locate and modify the block that builds the runtime and the call to
runtime.block_on(self.run()) accordingly.
| Some(msg) = self.messages.recv() => { | ||
| if let Err(error) = self.publish(msg).await { | ||
| // publish should not easily fail, if that happens, it means | ||
| // the message broker has become unrecoverably unreacheable | ||
| warn!(%error, "failed to publish the message"); | ||
| return self.ctx.into_standby(self.messages, true).await; |
There was a problem hiding this comment.
Don't drop the message that triggered demotion.
Line 42 has already removed msg from the channel. If publish() fails here, Line 47 demotes without retrying or requeueing it, so recovery now depends on a later full snapshot instead of the event stream.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-replicator/src/service/primary.rs` around lines 42 - 47, The code
currently drops the message when publish() fails because msg has already been
popped from self.messages; before calling self.ctx.into_standby(...), requeue or
persist the failed msg so it isn't lost: inside the Some(msg) =
self.messages.recv() branch, on Err(error) from self.publish(msg).await attempt
to push the same msg back onto self.messages (e.g., using its async send or
try_send equivalent) or write it to a durable retry queue, logging any requeue
error, and only then call self.ctx.into_standby(self.messages, true). Ensure you
reference the recv()/publish() failure path and use the same msg instance (or
clone it if necessary) so the event stream can be replayed after demotion.
| _ = self.watcher.wait_for_expiry() => { | ||
| info!("leader lock expired, attempting takeover"); | ||
| if let Ok(Some(producer)) = self.ctx.try_acquire_producer().await { | ||
| info!("acquired leadership, promoting"); | ||
| return self.ctx.into_primary(producer, self.messages).await; | ||
| } |
There was a problem hiding this comment.
Don't treat a dead watch stream as lock expiry.
LockWatcher::wait_for_expiry() also returns when its watch stream ends unexpectedly (magicblock-replicator/src/nats/lock_watcher.rs:41-59). From this branch, that case is indistinguishable from a real Delete/Purge, and once the watcher is exhausted you'll spin on try_acquire_producer() every loop iteration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-replicator/src/service/standby.rs` around lines 70 - 75, The
watcher currently conflates a real lock expiry with an unexpected watch-stream
end; update LockWatcher::wait_for_expiry() to return a discriminated result
(e.g., Result<ExpiryKind, WatcherError> or an enum like Expiry::Deleted |
Expiry::StreamEnded) so callers can distinguish a genuine Delete/Purge from
stream termination, then modify the standby loop that calls
self.watcher.wait_for_expiry() to match on that result: on a genuine expiry,
proceed to call self.ctx.try_acquire_producer().await and on success call
self.ctx.into_primary(...).await as before, but on StreamEnded/Err log
appropriately and stop/recreate the watcher (do not spin on
try_acquire_producer()); reference symbols: LockWatcher::wait_for_expiry,
self.watcher, self.ctx.try_acquire_producer(), and self.ctx.into_primary.
| let result = match message { | ||
| Message::Transaction(tx) => { | ||
| self.replay_tx(tx.slot, tx.index, tx.payload).await | ||
| } | ||
| Message::Block(block) => self.ctx.write_block(&block).await, | ||
| Message::SuperBlock(sb) => { | ||
| self.ctx.verify_checksum(&sb).inspect_err(|error| | ||
| error!(slot, %error, "accountsdb state has diverged") | ||
| ) | ||
| } | ||
| }; | ||
|
|
||
| if let Err(error) = result { | ||
| warn!(slot, index, %error, "message processing error"); | ||
| return; | ||
| } | ||
| self.ctx.update_position(slot, index); |
There was a problem hiding this comment.
Stop this standby on replay/apply failures.
Lines 116-119 only warn and keep consuming. A failed transaction replay, block write, or superblock verification leaves a known-bad replica running and still eligible for promotion later. Bubble this out of run() and force rebootstrap instead of continuing.

Summary
An accumulator branch for replication service PRs
Checklist
Summary by CodeRabbit