feat: move blockhash calculation to the scheduler#1072
feat: move blockhash calculation to the scheduler#1072bmuddha wants to merge 20 commits intobmuddha/replicator/integrationfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors block-update signaling to use ledger-broadcasted LatestBlockInner via tokio::broadcast and removes the flume BlockUpdate channel/types. Moves slot advancement and superblock/snapshot control into the scheduler: scheduler maintains a streaming blake3 hasher, finalizes blockhashes at slot boundaries, emits Block/Transaction/SuperBlock replication messages, and triggers superblock snapshot publication. Introduces execution-permit semaphore coordination (for executors and coordinator wait-for-idle), adds configurable ledger.superblock_size, updates tests to poll/wait for scheduler-driven slot progress, and adjusts various consumers/producers and replication plumbing accordingly. Assessment against linked issues
Assessment — Out-of-scope changes
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
magicblock-processor/src/scheduler/state.rs (1)
109-113:⚠️ Potential issue | 🟠 MajorReplace
.expect()with proper error handling.The
.expect()on line 111 violates the coding guidelines for production Rust code in themagicblock-*paths. Ifregister_functionfails, this will panic instead of gracefully handling the error.🛠️ Proposed fix
let runtime_v2 = { let mut runtime = create_program_runtime_environment_v2( &Default::default(), false, ); - runtime + if let Err(e) = runtime .register_function("sol_matmul_i8", SyscallMatmulI8::vm) - .expect( - "failed to register sol_matmul_i8 in runtime environment v2", - ); + { + tracing::error!("failed to register sol_matmul_i8 in runtime environment v2: {e}"); + } runtime };As per coding guidelines:
{magicblock-*,programs,storage-proto}/**: Treat any usage of.unwrap()or.expect()in production Rust code as a MAJOR issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/state.rs` around lines 109 - 113, The use of .expect() when registering the function "sol_matmul_i8" via runtime.register_function (SyscallMatmulI8::vm) should be replaced with proper error handling: propagate the error from register_function (change the surrounding function to return Result if needed) or handle it by logging the failure and returning an Err instead of panicking, ensuring the runtime registration failure is handled gracefully and reported rather than calling .expect().magicblock-aperture/tests/websocket.rs (1)
28-50:⚠️ Potential issue | 🟡 MinorRestore a lower-bound check for the notification slot.
This only checks that the slot is not ahead of
current_slot. A stale notification from far in the past still passes. Capture the slot before Line 29 and assert the notification is>=that baseline.Proposed fix
// Trigger an update by sending lamports to the account. + let slot_before_transfer = env.latest_slot(); env.transfer_lamports(account, amount).await; @@ let current_slot = env.latest_slot(); assert!( - notification.context.slot <= current_slot + 1, + notification.context.slot >= slot_before_transfer + && notification.context.slot <= current_slot + 1, "notification slot {} should be reasonable compared to current slot {}", notification.context.slot, current_slot🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-aperture/tests/websocket.rs` around lines 28 - 50, Before triggering the transfer, capture the current slot into a baseline (call env.latest_slot() and store it, e.g., baseline_slot) so you can assert the notification slot is not stale; after receiving the notification (notification.context.slot) assert it is >= baseline_slot and also keep the existing upper-bound check against current_slot + 1. Update the test around env.transfer_lamports and the assertions (referencing env.transfer_lamports, env.latest_slot, and notification.context.slot) to use the baseline_slot lower-bound check and the existing upper-bound check.magicblock-processor/src/executor/mod.rs (1)
176-184:⚠️ Potential issue | 🔴 CriticalAcquire the semaphore before
transition_to_slot().Line 181 is too late.
wait_for_idle()uses this semaphore to make the superblock snapshot path exclusive, buttransition_to_slot()on Line 179 mutates per-slot executor state before the permit is held. That leaves a window whereaccountsdb.take_snapshot()can run concurrently with a slot/sysvar transition.Proposed fix
txn = self.rx.recv() => { let Some(transaction) = txn else { break }; + let Ok(_permit) = self.execution_permits.acquire().await else { + warn!("execution permit semaphore closed"); + break; + }; if transaction.slot != self.processor.slot { self.transition_to_slot(transaction.slot); } - let _permit = self.execution_permits.acquire().await; match transaction.txn.mode { TransactionProcessingMode::Execution(_) => { self.execute(transaction, None);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/executor/mod.rs` around lines 176 - 184, The code currently calls transition_to_slot(transaction.slot) before acquiring the execution_permits, which permits wait_for_idle()/accountsdb.take_snapshot() to run concurrently with the slot/sysvar transition; move the semaphore acquisition so the permit is awaited and held (call self.execution_permits.acquire().await and bind _permit) before checking and calling transition_to_slot(transaction.slot), ensuring the permit is held across transition_to_slot and subsequent execute() calls (symbols: transition_to_slot, execution_permits, wait_for_idle, take_snapshot, execute, rx.recv, processor.slot).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config.example.toml`:
- Around line 192-197: The comment above the superblock-size setting contains a
grammar error; update the sentence "accountsdb checksum/snapshot takes at
superblock boundary" to "accountsdb checksum/snapshot is taken at superblock
boundary" in the comment block immediately preceding the superblock-size key so
the message reads clearly and correctly.
In `@magicblock-aperture/tests/blocks.rs`:
- Around line 187-201: The current assertions around the get_blocks result
(blocks from env.rpc.get_blocks(start, Some(end))) only check boundaries; update
the test to iterate the returned blocks and assert for each slot that start <=
slot <= end, that each slot is strictly greater than the previous slot (strictly
increasing), and — if the invariant is one block per slot — that each slot
equals previous_slot + 1 to enforce contiguity; apply the same checks in the
other occurrence of these assertions (the block range at lines 239-252).
In `@magicblock-aperture/tests/transactions.rs`:
- Around line 468-469: Replace the fixed 100ms sleep with a polling loop that
repeatedly checks env.execution.get_transaction(fail_sig) until it returns
Some(...) or a short timeout/iteration limit is reached; use small sleeps (e.g.
20ms) between polls via tokio::time::sleep and after the loop assert that the
transaction is present to fail fast if it never appears. Locate the wait near
the test that references fail_sig and env.execution.get_transaction and
implement the loop there instead of the single
tokio::time::sleep(Duration::from_millis(100)).await call.
In `@magicblock-api/src/tickers.rs`:
- Around line 22-29: init_slot_ticker() used to increment SLOT_COUNT via
metrics::inc_slot() but that call was removed during the refactor, so the
scheduler-owned slot transition path must now perform the metric increment;
update the scheduler transition logic (the code that advances slots inside
TransactionSchedulerHandle / the scheduler's slot transition function) to call
metrics::inc_slot() whenever a new slot is accepted/advanced, ensuring
ScheduledCommitsProcessor-related paths still advance slots and trigger the same
metrics increment so SLOT_COUNT reflects real slot progress.
In `@magicblock-config/src/config/ledger.rs`:
- Around line 16-17: The superblock_size field in the LedgerConfig (pub
superblock_size: u64) lacks constraints and can be zero, which leads to a
division-by-zero in handle_superblock; update the code by either (a) adding
documentation to the superblock_size field stating it must be > 0 and include an
example/rationale, or (b) adding serde/runtime validation (e.g., in
LedgerConfig::validate or a custom Deserialize) to reject values == 0 and return
a clear error; reference the superblock_size field in ledger.rs and the
handle_superblock function when adding the check or doc so readers and callers
know the valid range.
- Around line 39-40: The constant name is misspelled: update the definition and
all uses from DEFAULT_SUPEBLOCK_SIZE to DEFAULT_SUPERBLOCK_SIZE; specifically
rename the constant in the consts definition (where DEFAULT_SUPEBLOCK_SIZE is
declared) and update its usage in the ledger config where the struct field
superblock_size is assigned, ensuring all references (e.g.,
DEFAULT_SUPEBLOCK_SIZE → DEFAULT_SUPERBLOCK_SIZE) are renamed identically to
avoid compilation errors.
In `@magicblock-config/src/consts.rs`:
- Around line 57-58: Rename the mis-typed constant DEFAULT_SUPEBLOCK_SIZE to
DEFAULT_SUPERBLOCK_SIZE in magicblock-config/src/consts.rs and update all usages
(e.g., the reference in magicblock-config/src/config/ledger.rs where
DEFAULT_SUPEBLOCK_SIZE is read) to the new identifier; ensure you update the
public constant name and any places that import or reference it so compilation
and tests succeed.
In `@magicblock-core/src/link/blocks.rs`:
- Around line 7-10: Update the doc comment for BlockUpdateRx to match its
generic payload: replace the statement that the notification payload is empty
with a description that BlockUpdateRx<T> is a generic broadcast::Receiver<T> and
that callers typically instantiate it as BlockUpdateRx<LatestBlockInner> to
receive the full block data (slot, blockhash, timestamp); ensure the comment
mentions the generic T and the common usage pattern (e.g., LatestBlockInner) so
readers know the payload carries block details rather than being empty.
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 345-350: Change wait_for_idle to return
Result<OwnedSemaphorePermit, tokio::sync::AcquireError> instead of
unconditionally unwrapping: replace the .expect(...) on the acquire_many_owned
await with a ?-style propagation of the AcquireError (i.e., return the Err when
acquire_many_owned fails). Update the signature of pub(super) async fn
wait_for_idle(&self) to reflect the Result and adjust all callers of
wait_for_idle to handle the Err case (handle the tokio::sync::AcquireError)
rather than assuming success; reference the execution_permits,
blocked_transactions, and OwnedSemaphorePermit symbols when locating the change.
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 476-491: The finalize_block path must abort advancing local slot
when ledger.write_block fails: in async fn finalize_block(&self, block:
&LatestBlockInner) call self.ledger.write_block(...).await and match its Result
instead of just inspect_err; if self.coordinator.is_primary() and write_block
returns Err(error) then log the error (using error!(%error, ...)) and return
early from finalize_block so you do not call
self.accountsdb.set_slot(block.slot); only call set_slot when either not primary
or when write_block succeeded.
- Around line 162-165: The Hasher is created empty (Hasher::new()) but must be
seeded with the last persisted block hash so the first slot uses
H(previous_blockhash || txs...); before starting the slot loop (after computing
latest_block and slot_ticker) load the previous block hash from
latest_block.load().block_hash and feed it into the hasher (via the appropriate
method on Hasher, e.g., seed/update/absorb) so that hasher reflects the previous
block state before processing the first slot.
- Around line 437-442: The current call to
SystemTime::now().duration_since(UNIX_EPOCH).unwrap() (the computation that
assigns unix_timestamp) can panic if the host clock is behind UNIX_EPOCH;
replace the unwrap with proper error handling: match or map_err the Result, log
the SystemTimeError (using the module's logger), and recover instead of
panicking (e.g., use a safe fallback value such as the previous slot timestamp,
0, or SystemTime::UNIX_EPOCH.elapsed() equivalent) so the scheduler continues
running; locate the unix_timestamp assignment in mod.rs and update that
expression to handle Err by logging and returning/using the chosen fallback
rather than unwrap().
In `@magicblock-processor/src/scheduler/state.rs`:
- Around line 69-71: Validate that TransactionSchedulerState::superblock_size is
> 0 to avoid divide-by-zero in handle_superblock where
slot.is_multiple_of(self.superblock_size) is called; add a check in the
TransactionSchedulerState constructor (or in config deserialization) that
returns an error or panics with a clear message if superblock_size == 0, or use
an assert ensuring the invariant, and document the requirement so callers cannot
pass 0.
In `@magicblock-replicator/src/nats/consumer.rs`:
- Around line 88-107: The pending() method currently busy-loops on
self.inner.get_info() failures causing unbounded retries; change pending(&self,
cancel: &CancellationToken) -> Option<u64> to return a Result<Option<u64>, E>
(propagate the underlying error type) and stop retrying on get_info() errors —
instead return Err(error) so callers like Standby::has_pending() can decide how
to handle/backoff; keep the tokio::select! that respects cancel.cancelled() and
only loop if you intend to implement an explicit backoff strategy (sleep with
jitter) before retrying.
In `@magicblock-replicator/src/service/standby.rs`:
- Around line 63-66: The expiry signal from watcher.wait_for_expiry() is being
dropped by the immediate continue after has_pending() returns true; instead set
and use a lock_expired flag to remember the expiry and retry takeover/promotion
as soon as has_pending() becomes false. Modify the loop around
watcher.wait_for_expiry() and has_pending() to: on expiry set lock_expired =
true (do not discard the event), when has_pending() is false and lock_expired is
true call the promotion/takeover routine (e.g., the existing promotion
function), reset lock_expired on successful takeover, and only fall back to the
idle-timeout branch if no expiry is recorded—this preserves the expiry signal
until takeover succeeds.
- Around line 95-101: The standby path currently warns when self.can_promote is
false but still calls self.ctx.try_acquire_producer() and may promote via
self.ctx.into_primary(...), bypassing ReplicaOnly; update the logic in the
function that contains can_promote/try_acquire_producer (in standby.rs) so
try_acquire_producer() and the subsequent self.ctx.into_primary(producer,
self.messages).await.map(Some) are only executed when self.can_promote is true
(e.g., change the sequence to check if self.can_promote before attempting
try_acquire_producer or early-continue/return when can_promote is false).
---
Outside diff comments:
In `@magicblock-aperture/tests/websocket.rs`:
- Around line 28-50: Before triggering the transfer, capture the current slot
into a baseline (call env.latest_slot() and store it, e.g., baseline_slot) so
you can assert the notification slot is not stale; after receiving the
notification (notification.context.slot) assert it is >= baseline_slot and also
keep the existing upper-bound check against current_slot + 1. Update the test
around env.transfer_lamports and the assertions (referencing
env.transfer_lamports, env.latest_slot, and notification.context.slot) to use
the baseline_slot lower-bound check and the existing upper-bound check.
In `@magicblock-processor/src/executor/mod.rs`:
- Around line 176-184: The code currently calls
transition_to_slot(transaction.slot) before acquiring the execution_permits,
which permits wait_for_idle()/accountsdb.take_snapshot() to run concurrently
with the slot/sysvar transition; move the semaphore acquisition so the permit is
awaited and held (call self.execution_permits.acquire().await and bind _permit)
before checking and calling transition_to_slot(transaction.slot), ensuring the
permit is held across transition_to_slot and subsequent execute() calls
(symbols: transition_to_slot, execution_permits, wait_for_idle, take_snapshot,
execute, rx.recv, processor.slot).
In `@magicblock-processor/src/scheduler/state.rs`:
- Around line 109-113: The use of .expect() when registering the function
"sol_matmul_i8" via runtime.register_function (SyscallMatmulI8::vm) should be
replaced with proper error handling: propagate the error from register_function
(change the surrounding function to return Result if needed) or handle it by
logging the failure and returning an Err instead of panicking, ensuring the
runtime registration failure is handled gracefully and reported rather than
calling .expect().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4f261bb4-4f29-4a40-bef4-8f169de6dc91
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (34)
Cargo.tomlconfig.example.tomlmagicblock-aperture/src/geyser.rsmagicblock-aperture/src/processor.rsmagicblock-aperture/src/state/blocks.rsmagicblock-aperture/tests/blocks.rsmagicblock-aperture/tests/mocked.rsmagicblock-aperture/tests/setup.rsmagicblock-aperture/tests/transactions.rsmagicblock-aperture/tests/websocket.rsmagicblock-api/src/lib.rsmagicblock-api/src/magic_validator.rsmagicblock-api/src/slot.rsmagicblock-api/src/tickers.rsmagicblock-config/src/config/ledger.rsmagicblock-config/src/consts.rsmagicblock-core/src/link.rsmagicblock-core/src/link/blocks.rsmagicblock-core/src/link/replication.rsmagicblock-ledger/src/blockstore_processor/mod.rsmagicblock-ledger/src/lib.rsmagicblock-processor/Cargo.tomlmagicblock-processor/src/executor/mod.rsmagicblock-processor/src/scheduler/coordinator.rsmagicblock-processor/src/scheduler/mod.rsmagicblock-processor/src/scheduler/state.rsmagicblock-processor/src/scheduler/tests.rsmagicblock-processor/tests/fees.rsmagicblock-processor/tests/replay.rsmagicblock-replicator/src/nats/consumer.rsmagicblock-replicator/src/nats/mod.rsmagicblock-replicator/src/service/context.rsmagicblock-replicator/src/service/standby.rstest-kit/src/lib.rs
💤 Files with no reviewable changes (3)
- magicblock-api/src/lib.rs
- magicblock-api/src/slot.rs
- magicblock-core/src/link.rs
873881e to
fd44715
Compare
8d1b35d to
45462c8
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (6)
magicblock-processor/src/scheduler/coordinator.rs (1)
345-350:⚠️ Potential issue | 🟠 MajorReplace panicing
.expect()in idle barrier with fallible error propagation.Line 350 can panic coordinator/repl flows on semaphore closure. This should return
Resultand let callers handle shutdown/error paths.Proposed fix
- pub(super) async fn wait_for_idle(&self) -> OwnedSemaphorePermit { - self.execution_permits - .clone() - .acquire_many_owned(self.blocked_transactions.len() as u32) - .await - .expect("semaphore can never be closed") - } + pub(super) async fn wait_for_idle( + &self, + ) -> Result<OwnedSemaphorePermit, tokio::sync::AcquireError> { + self.execution_permits + .clone() + .acquire_many_owned(self.blocked_transactions.len() as u32) + .await + }Please verify all
.wait_for_idle().awaitcall sites are updated to handleErr(AcquireError)explicitly:#!/bin/bash set -euo pipefail rg -n -C2 --type rust 'pub\(super\)\s+async\s+fn\s+wait_for_idle|acquire_many_owned|expect\("semaphore can never be closed"\)' rg -n -C2 --type rust '\bwait_for_idle\s*\(\)\.await'As per coding guidelines
Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/coordinator.rs` around lines 345 - 350, Change wait_for_idle to return Result<OwnedSemaphorePermit, tokio::sync::AcquireError> instead of panicking: replace the expect call on acquire_many_owned with ? (i.e., propagate the AcquireError) and update the signature of pub(super) async fn wait_for_idle to reflect Result. Update all call sites of wait_for_idle().await to handle the Err(AcquireError) path explicitly (return Err, map to a coordinator shutdown error, or continue as appropriate) so callers of execution_permits.acquire_many_owned and users of OwnedSemaphorePermit handle semaphore closure gracefully; references to execution_permits, blocked_transactions, wait_for_idle, OwnedSemaphorePermit and AcquireError will help locate affected code.magicblock-config/src/config/ledger.rs (1)
16-17: 🧹 Nitpick | 🔵 TrivialFix typo in documentation.
The doc comment has a grammatical error: "before they accountsdb" should be "before the accountsdb".
📝 Proposed fix
- /// The number of slots that must elapse before they accountsdb snapshot/checksum is taken + /// The number of slots that must elapse before the accountsdb snapshot/checksum is taken pub superblock_size: u64,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-config/src/config/ledger.rs` around lines 16 - 17, Update the doc comment for the struct field superblock_size to fix the typo: change "before they accountsdb snapshot/checksum is taken" to "before the accountsdb snapshot/checksum is taken." Locate the triple-slash doc comment immediately above the pub superblock_size: u64 field in ledger.rs and edit the text to use "the accountsdb" so the comment reads correctly.magicblock-processor/src/scheduler/mod.rs (3)
477-492:⚠️ Potential issue | 🟠 MajorAbort the slot transition when
write_block()fails.This still advances
accountsdbafter a ledger write error. Once that happens, local slot/Clock/SlotHashes move ahead of persisted ledger state and RPC/replication can observe a block that never landed.Suggested fix
async fn finalize_block(&self, block: &LatestBlockInner) { if self.coordinator.is_primary() { - let _ = self - .ledger - .write_block( - block.slot, - block.clock.unix_timestamp, - block.blockhash, - ) - .inspect_err(|error| { - error!(%error, "failed to write block to the ledger") - }); + if let Err(error) = self.ledger.write_block( + block.slot, + block.clock.unix_timestamp, + block.blockhash, + ) { + error!(%error, "failed to write block to the ledger"); + return; + } } self.accountsdb.set_slot(block.slot); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 477 - 492, finalize_block currently advances accountsdb.set_slot even when ledger.write_block fails; change the logic so that when self.coordinator.is_primary() and ledger.write_block(...) returns an Err the function returns early (aborting the slot transition) instead of calling self.accountsdb.set_slot. Concretely, replace the current inspect_err usage with explicit error handling around self.ledger.write_block (in finalize_block) — call ledger.write_block, log the error via error!(%error, ...) on failure and return immediately; only call self.accountsdb.set_slot(block.slot) when write_block succeeded or when not primary.
438-443:⚠️ Potential issue | 🟠 MajorReplace this
unwrap()on the clock read.A host clock before
UNIX_EPOCHwill panic the scheduler at Line 440 during slot finalization. HandleSystemTimeErrorand choose a fallback timestamp instead.Suggested fix
- let unix_timestamp = SystemTime::now() - .duration_since(UNIX_EPOCH) - .unwrap() - // NOTE: since we can tick very frequently, a lot - // of blocks might have identical timestamps - .as_secs() as i64; + let unix_timestamp = match SystemTime::now().duration_since(UNIX_EPOCH) { + Ok(duration) => duration.as_secs() as i64, + Err(error) => { + warn!(%error, "system clock is before UNIX_EPOCH; using 0"); + 0 + } + };As per coding guidelines, "Treat any usage of
.unwrap()or.expect()in production Rust code as a MAJOR issue. These should not be categorized as trivial or nit-level concerns. Request proper error handling or explicit justification with invariants."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 438 - 443, The current call to SystemTime::now().duration_since(UNIX_EPOCH).unwrap() can panic if the host clock is before UNIX_EPOCH; replace the unwrap by explicitly handling the Result from duration_since(UNIX_EPOCH) (around the unix_timestamp calculation), e.g., match the Ok(duration) branch to use duration.as_secs() as i64 and the Err(_) branch to choose a safe fallback timestamp (for example 0 or another agreed sentinel) and emit a warning/log; update the code that assigns unix_timestamp accordingly so slot finalization no longer panics on SystemTimeError.
163-167:⚠️ Potential issue | 🔴 CriticalMake the hasher mutable and seed it with bytes, or this code will not compile.
blake3::Hasher::updaterequires&mut selfand&[u8]input. The current code declareshasheras immutable and attempts to pass aHashtype directly, causing compilation failure. Other uses ofhasherin this file confirm the pattern:self.hasher.update(block.blockhash.as_ref()).Suggested fix
- let hasher = Hasher::new(); + let mut hasher = Hasher::new(); let slot_ticker = interval(state.block_time); let latest_block = state.ledger.latest_block().clone(); - hasher.update(latest_block.load().blockhash); - let slot = latest_block.load().slot; + let latest = latest_block.load(); + hasher.update(latest.blockhash.as_ref()); + let slot = latest.slot;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 163 - 167, The Hasher is created immutably and update is called with a non-byte type; change to create a mutable hasher (e.g., use let mut hasher = Hasher::new()) and call hasher.update(...) with a byte slice from the block hash (use latest_block.load().blockhash.as_ref() or equivalent) so the update receives &[u8] and compiles like other usages in this file.magicblock-replicator/src/service/standby.rs (1)
63-66:⚠️ Potential issue | 🟠 MajorKeep the lock-expiry signal latched until takeover actually runs.
Lines 64-65 drop the expiry event when backlog still exists. Once those pending messages drain, this loop no longer retries takeover until another watcher event or the idle-timeout path fires.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-replicator/src/service/standby.rs` around lines 63 - 66, When watcher.wait_for_expiry() fires, don't drop that signal if has_pending().await returns true; instead set a latched flag (e.g., expiry_latched) when wait_for_expiry() returns, continue draining pending items while keeping expiry_latched true, and only allow the takeover path to run once has_pending().await is false and expiry_latched is set; clear expiry_latched after takeover completes. Update the loop in standby.rs around watcher.wait_for_expiry(), has_pending(), and the takeover invocation to use this latched boolean so the expiry event is remembered across the backlog drain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-processor/src/executor/mod.rs`:
- Around line 195-209: The recv() loop in the executor treats any recv error as
terminal (the pattern using Ok(latest) => ... else => break), which causes the
executor to exit on tokio::broadcast::error::RecvError::Lagged; change the loop
in the run/processing task that calls block_updated.recv() to match on the
Result and explicitly handle Err(RecvError::Lagged(_)) by resynchronizing (e.g.,
call or implement a resync method to repair/refresh self.block_history or fetch
the latest missing blocks) and continue the loop, handle Err(RecvError::Closed)
by breaking/terminating, and keep the existing path of calling
register_new_block(latest) for Ok(latest); reference the recv call and
register_new_block(…) so you update only the error handling branch rather than
exiting on any recv error.
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 362-376: The streaming blockhash must include replayed
transactions, but current code only updates self.hasher when is_execution is
true, causing replicas (TransactionProcessingMode::Replay) to diverge in
verify_block_as_replica; change the logic so that replay and execution both
contribute to the hasher: keep the self.index += 1 behavior only inside the
TransactionProcessingMode::Execution arm, but ensure you call
self.hasher.update(txn.transaction.signature().as_ref()) for both
TransactionProcessingMode::Replay and TransactionProcessingMode::Execution (or
set a should_hash flag in those two arms), leaving
TransactionProcessingMode::Simulation un-hashed.
- Around line 368-375: The code advances self.index and updates self.hasher
before sending work to the executor, so if try_send fails the slot index and
hash move forward incorrectly; change the flow in the scheduling path that uses
is_execution and try_send so you compute the candidate index/slot and hash input
without mutating self, then attempt try_send, and only on successful send update
self.index and call self.hasher.update(...) (and similarly ensure the
replica-path mutations around try_send at the 392-396 region are delayed until
after a successful enqueue). Reference self.index, self.hasher, is_execution,
and try_send when making these changes.
---
Duplicate comments:
In `@magicblock-config/src/config/ledger.rs`:
- Around line 16-17: Update the doc comment for the struct field superblock_size
to fix the typo: change "before they accountsdb snapshot/checksum is taken" to
"before the accountsdb snapshot/checksum is taken." Locate the triple-slash doc
comment immediately above the pub superblock_size: u64 field in ledger.rs and
edit the text to use "the accountsdb" so the comment reads correctly.
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 345-350: Change wait_for_idle to return
Result<OwnedSemaphorePermit, tokio::sync::AcquireError> instead of panicking:
replace the expect call on acquire_many_owned with ? (i.e., propagate the
AcquireError) and update the signature of pub(super) async fn wait_for_idle to
reflect Result. Update all call sites of wait_for_idle().await to handle the
Err(AcquireError) path explicitly (return Err, map to a coordinator shutdown
error, or continue as appropriate) so callers of
execution_permits.acquire_many_owned and users of OwnedSemaphorePermit handle
semaphore closure gracefully; references to execution_permits,
blocked_transactions, wait_for_idle, OwnedSemaphorePermit and AcquireError will
help locate affected code.
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 477-492: finalize_block currently advances accountsdb.set_slot
even when ledger.write_block fails; change the logic so that when
self.coordinator.is_primary() and ledger.write_block(...) returns an Err the
function returns early (aborting the slot transition) instead of calling
self.accountsdb.set_slot. Concretely, replace the current inspect_err usage with
explicit error handling around self.ledger.write_block (in finalize_block) —
call ledger.write_block, log the error via error!(%error, ...) on failure and
return immediately; only call self.accountsdb.set_slot(block.slot) when
write_block succeeded or when not primary.
- Around line 438-443: The current call to
SystemTime::now().duration_since(UNIX_EPOCH).unwrap() can panic if the host
clock is before UNIX_EPOCH; replace the unwrap by explicitly handling the Result
from duration_since(UNIX_EPOCH) (around the unix_timestamp calculation), e.g.,
match the Ok(duration) branch to use duration.as_secs() as i64 and the Err(_)
branch to choose a safe fallback timestamp (for example 0 or another agreed
sentinel) and emit a warning/log; update the code that assigns unix_timestamp
accordingly so slot finalization no longer panics on SystemTimeError.
- Around line 163-167: The Hasher is created immutably and update is called with
a non-byte type; change to create a mutable hasher (e.g., use let mut hasher =
Hasher::new()) and call hasher.update(...) with a byte slice from the block hash
(use latest_block.load().blockhash.as_ref() or equivalent) so the update
receives &[u8] and compiles like other usages in this file.
In `@magicblock-replicator/src/service/standby.rs`:
- Around line 63-66: When watcher.wait_for_expiry() fires, don't drop that
signal if has_pending().await returns true; instead set a latched flag (e.g.,
expiry_latched) when wait_for_expiry() returns, continue draining pending items
while keeping expiry_latched true, and only allow the takeover path to run once
has_pending().await is false and expiry_latched is set; clear expiry_latched
after takeover completes. Update the loop in standby.rs around
watcher.wait_for_expiry(), has_pending(), and the takeover invocation to use
this latched boolean so the expiry event is remembered across the backlog drain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b8813f34-7cb6-4d10-9c4e-82d0108a714f
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (35)
Cargo.tomlconfig.example.tomlmagicblock-aperture/src/geyser.rsmagicblock-aperture/src/processor.rsmagicblock-aperture/src/state/blocks.rsmagicblock-aperture/tests/blocks.rsmagicblock-aperture/tests/mocked.rsmagicblock-aperture/tests/setup.rsmagicblock-aperture/tests/transactions.rsmagicblock-aperture/tests/websocket.rsmagicblock-api/src/lib.rsmagicblock-api/src/magic_validator.rsmagicblock-api/src/slot.rsmagicblock-api/src/tickers.rsmagicblock-config/src/config/ledger.rsmagicblock-config/src/consts.rsmagicblock-core/src/link.rsmagicblock-core/src/link/blocks.rsmagicblock-core/src/link/replication.rsmagicblock-ledger/src/blockstore_processor/mod.rsmagicblock-ledger/src/lib.rsmagicblock-metrics/src/metrics/mod.rsmagicblock-processor/Cargo.tomlmagicblock-processor/src/executor/mod.rsmagicblock-processor/src/scheduler/coordinator.rsmagicblock-processor/src/scheduler/mod.rsmagicblock-processor/src/scheduler/state.rsmagicblock-processor/src/scheduler/tests.rsmagicblock-processor/tests/fees.rsmagicblock-processor/tests/replay.rsmagicblock-replicator/src/nats/consumer.rsmagicblock-replicator/src/nats/mod.rsmagicblock-replicator/src/service/context.rsmagicblock-replicator/src/service/standby.rstest-kit/src/lib.rs
💤 Files with no reviewable changes (3)
- magicblock-api/src/lib.rs
- magicblock-api/src/slot.rs
- magicblock-core/src/link.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-aperture/tests/blocks.rs`:
- Around line 125-163: The single-pass scan from slot_before..=current_slot can
miss a block that appears after current_slot; modify the logic around
env.wait_for_slot_progress / env.rpc.get_slot so you implement a bounded polling
loop that repeatedly refreshes current_slot and rescans slots (using
env.rpc.get_block_with_config and RpcBlockConfig with
UiTransactionEncoding::Base64) until you find the signature or a timeout/attempt
limit is reached; use a short delay between iterations, keep the scan window
bounded (e.g., last N slots or from slot_before to refreshed current_slot), and
return a clear error if the transaction isn't found within the timeout.
- Around line 46-77: The test incorrectly compares the second value returned by
get_latest_blockhash_with_commitment (captured as last_valid_slot) to the slot
from get_slot(); that value is actually lastValidBlockHeight. Update the test to
rename last_valid_slot to last_valid_block_height (or similar) and fetch the
current block height via env.rpc.get_block_height().await.expect(...) and assert
that last_valid_block_height > current_block_height, updating the assert message
to reference block heights rather than slots; keep the BlockHash non-default
assertions unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 58e3857e-c980-4479-a411-f32ffea77aa5
📒 Files selected for processing (1)
magicblock-aperture/tests/blocks.rs
28daab0 to
e0fd273
Compare
fd44715 to
2773373
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
magicblock-api/src/tickers.rs (1)
43-44:⚠️ Potential issue | 🟠 Major
.expect()usage flagged per coding guidelines.The
.expect("Validator found to be running without MagicContext account!")on line 44 is flagged per coding guidelines which treat any.unwrap()or.expect()inmagicblock-*/**as a MAJOR issue.While this invariant (MagicContext always exists) is fundamental to validator operation and the error message is descriptive, consider returning an error or logging and continuing rather than panicking in the async loop.
🛡️ Suggested defensive handling
- let magic_context_acc = accountsdb.get_account(&magic_program::MAGIC_CONTEXT_PUBKEY) - .expect("Validator found to be running without MagicContext account!"); + let Some(magic_context_acc) = accountsdb.get_account(&magic_program::MAGIC_CONTEXT_PUBKEY) else { + error!("Validator found to be running without MagicContext account!"); + continue; + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-api/src/tickers.rs` around lines 43 - 44, Replace the panic-causing expect call when fetching the MagicContext account: instead of calling accountsdb.get_account(&magic_program::MAGIC_CONTEXT_PUBKEY).expect(...), handle the missing account defensively in the surrounding async loop by checking the Option returned, logging a clear error via the existing logger, and either returning an Err from the containing function or skipping/continuing the loop iteration while preserving state; update code paths that use magic_context_acc to accept the Option or the Result so the function signature (e.g., the ticker handler) returns Result<..., Error> or handles the None case gracefully rather than panicking.magicblock-processor/src/executor/mod.rs (1)
224-234:⚠️ Potential issue | 🟠 Major
.unwrap()on RwLock write guard flagged per coding guidelines.Line 226 uses
.unwrap()onwritable_sysvar_cache().write(). While lock poisoning is rare and this pattern is common in Rust, the coding guidelines treat any.unwrap()inmagicblock-*/**as a MAJOR issue.Consider whether this should propagate an error or use
.expect()with an explicit invariant justification.🛡️ Suggested fix with explicit invariant
fn set_sysvars(&self, block: &LatestBlockInner) { - let mut cache = self.processor.writable_sysvar_cache().write().unwrap(); + let mut cache = self.processor + .writable_sysvar_cache() + .write() + .expect("sysvar cache lock should never be poisoned in single-threaded executor"); cache.set_sysvar_for_tests(&block.clock);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/executor/mod.rs` around lines 224 - 234, The write() call on writable_sysvar_cache used in set_sysvars currently calls .unwrap() which is disallowed; replace the .unwrap() with a non-panicking handling: either change set_sysvars to return a Result and propagate the PoisonError from writable_sysvar_cache().write(), or (if you can assert the lock poisoning is an unrecoverable invariant) replace .unwrap() with .expect("invariant: writable_sysvar_cache RwLock must not be poisoned in set_sysvars; if poisoned, recovery is undefined") to provide an explicit justification; update references to set_sysvars/LatestBlockInner accordingly when changing the signature.
♻️ Duplicate comments (6)
magicblock-replicator/src/nats/consumer.rs (1)
88-107:⚠️ Potential issue | 🟠 MajorPropagate cancellation distinctly from “no pending messages.”
magicblock-replicator/src/service/standby.rs:151-158turnsNoneintofalsewithunwrap_or_default(). Ifcancelfires whileget_info()is in flight, the expiry branch inmagicblock-replicator/src/service/standby.rs:63-76can keep going and attempt promotion during shutdown. This method needs a distinct cancelled/error state instead of overloadingOption<u64>.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-replicator/src/nats/consumer.rs` around lines 88 - 107, The pending method currently returns Option<u64> and conflates "cancelled" with "no pending messages"; change its signature and behaviour to return a distinct result for cancellation/errors (e.g. Result<Option<u64>, PendingCancelled> or a small enum like PendingOutcome::{Cancelled, Pending(u64), Error}), update the loop in pending (referencing pending, self.inner.get_info(), CancellationToken) to return the appropriate variant when cancel.cancelled() fires versus when get_info() errors or returns a count, and then update the callers (e.g. the code in standby.rs that currently calls pending(...).unwrap_or_default()) to match on the new return type and handle Cancelled distinctly from zero pending so promotion is not attempted during shutdown.magicblock-replicator/src/service/standby.rs (1)
63-76:⚠️ Potential issue | 🟠 MajorLock-expiry signal may be lost if pending messages exist.
When
has_pending()returns true at line 64, the code continues without preserving the fact that the lock expired. Once catch-up finishes andhas_pending()becomes false, the code won't automatically retry takeover—it will only respond to a new expiry event or fall back to the idle timeout (adding at leastLEADER_TIMEOUTdelay).Consider tracking the expiry state with a flag and retrying promotion as soon as
has_pending()becomes false.♻️ Suggested approach
+ lock_expired: bool, } impl Standby { pub fn new(...) -> Self { ... Self { ... + lock_expired: false, } } pub async fn run(mut self) -> Result<Option<Primary>> { ... loop { tokio::select! { biased; _ = self.watcher.wait_for_expiry() => { + self.lock_expired = true; if self.has_pending().await { continue; } - if !self.can_promote { - warn!("leader lock expired, but takeover disabled (ReplicaOnly mode)"); - continue - } - info!("leader lock expired, attempting takeover"); - if let Ok(Some(producer)) = self.ctx.try_acquire_producer().await { - info!("acquired leadership, promoting"); - return self.ctx.into_primary(producer, self.messages).await.map(Some); - } + self.try_promote_if_ready().await?; } result = stream.next() => { // ... handle message ... + // After processing, check if we can promote now + if self.lock_expired && !self.has_pending().await { + self.try_promote_if_ready().await?; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-replicator/src/service/standby.rs` around lines 63 - 76, The expiry event handled in the watcher.wait_for_expiry branch is currently discarded when has_pending().await is true, so promotion isn't retried once pending clears; add a flag (e.g., pending_expired_lock or expiry_pending) in the Standby state that is set when watcher.wait_for_expiry fires and cleared only after a successful promotion attempt or when can_promote is false, then modify the loop so that after has_pending().await becomes false it checks this flag and immediately retries the same promotion logic (calling ctx.try_acquire_producer().await and ctx.into_primary(...).await with messages) instead of waiting for the next expiry; ensure the flag is also checked/cleared on other early exits to avoid stuck state and keep existing logging (info/warn) paths.magicblock-processor/src/scheduler/mod.rs (4)
362-376:⚠️ Potential issue | 🔴 CriticalHash replayed transactions too.
Replicas schedule
TransactionProcessingMode::Replay, but that arm never updatesself.hasher.verify_block_as_replica()will diverge on any non-empty replicated slot.⚙️ Minimal fix
- let mut is_execution = false; + let mut is_execution = false; + let mut should_hash = false; let (slot, index) = match txn.mode { - TransactionProcessingMode::Replay(ctx) => (ctx.slot, ctx.index), + TransactionProcessingMode::Replay(ctx) => { + should_hash = true; + (ctx.slot, ctx.index) + } TransactionProcessingMode::Simulation(_) => (self.slot, 0), TransactionProcessingMode::Execution(_) => { is_execution = true; + should_hash = true; let index = self.index; // we only advance the index if we are executing (primary mode) self.index += 1; (self.slot, index) } }; - if is_execution { + if should_hash { self.hasher.update(txn.transaction.signature().as_ref()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 362 - 376, The replay branch currently skips updating the block hasher, causing divergence in verify_block_as_replica; update self.hasher with the transaction signature when handling TransactionProcessingMode::Replay (same way as when is_execution is true) so both Replay and Execution contribute to the hash—i.e., in the match arm for TransactionProcessingMode::Replay (and/or by removing the is_execution gating) call self.hasher.update(txn.transaction.signature().as_ref()) so replicated transactions are hashed consistently with verify_block_as_replica.
405-413:⚠️ Potential issue | 🟠 MajorAbort the slot transition when
write_block()fails.
finalize_block()only logs ledger errors, andtransition_to_new_slot()immediately advances sysvars anyway. That leaves local slot,Clock, andSlotHashesahead of persisted history. Return aResulthere and stop the transition on error, or make the failure fatal.Also applies to: 477-491
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 405 - 413, transition_to_new_slot must abort on ledger write failures: change its signature to return a Result<u64, E> (or make it fatal) and propagate errors from prepare_block/finalize_block (which calls write_block) instead of always proceeding; stop calling update_sysvars and metrics::set_slot when finalize_block/write_block fails. Locate transition_to_new_slot, prepare_block, finalize_block and any calls to write_block, update_sysvars and metrics::set_slot (also apply the same change to the other transition block around lines 477-491) and ensure errors from finalize_block/write_block are returned up so the slot transition is not applied on persistent write failure.
366-396:⚠️ Potential issue | 🟠 MajorDon't advance the block stream before enqueue succeeds.
Lines 368-375 mutate
self.indexandself.hasherbefore Line 392 attemptstry_send(). If that send fails, the transaction is dropped but the slot index and blockhash stream have already moved. Defer those mutations until after a successful enqueue, and surface the send failure instead of returningNone.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 366 - 396, The code currently increments self.index and updates self.hasher before calling self.executors[...].try_send(...) so on send failure the stream and hash advance incorrectly; fix by computing local slot and index variables without mutating self, build the IndexedTransaction and call try_send first, and only when try_send returns Ok do you mutate self.index (self.index += 1) and call self.hasher.update(...); similarly only call self.send_replication(...) after confirming sent and coordinator.is_primary(); on send failure return or propagate an error instead of silently returning None so the caller can observe the enqueue failure (references: self.index, self.hasher.update, self.executors[..].try_send, IndexedTransaction, send_replication, coordinator.is_primary, Message::Transaction).
438-443:⚠️ Potential issue | 🟠 MajorReplace this
unwrap()in the slot-boundary path.A backward host clock panics the scheduler here. Handle
Err(SystemTimeError)explicitly or make the failure fatal with context instead of panicking viaunwrap().🛡️ Minimal fix
- let unix_timestamp = SystemTime::now() - .duration_since(UNIX_EPOCH) - .unwrap() - // NOTE: since we can tick very frequently, a lot - // of blocks might have identical timestamps - .as_secs() as i64; + let unix_timestamp = match SystemTime::now().duration_since(UNIX_EPOCH) { + Ok(duration) => duration.as_secs() as i64, + Err(error) => { + error!(%error, "system clock before UNIX_EPOCH"); + 0 + } + };As per coding guidelines
{magicblock-*,programs,storage-proto}/**: Treat any usage of.unwrap()or.expect()in production Rust code as a MAJOR issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/mod.rs` around lines 438 - 443, The current call to SystemTime::now().duration_since(UNIX_EPOCH).unwrap() can panic if the host clock is before the epoch; replace the unwrap with explicit error handling: call duration_since(UNIX_EPOCH) and match or map_err the Result, set unix_timestamp from Ok(d.as_secs() as i64) and on Err(SystemTimeError) either return a Result::Err from the enclosing function (propagate the error) or log and convert it into a fatal error with clear context (include the SystemTimeError in the message). Update the function signature to return a Result if you choose propagation, and reference SystemTime::now, duration_since, UNIX_EPOCH, SystemTimeError and the unix_timestamp variable when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-config/src/config/ledger.rs`:
- Around line 16-17: The doc comment for the ledger field superblock_size
contains a typo ("before they accountsdb"); update the documentation string
above the pub superblock_size: u64 field in ledger.rs to read "before the
accountsdb snapshot/checksum is taken" (replace "they" with "the") so the
comment is grammatically correct and clear.
In `@magicblock-config/src/consts.rs`:
- Around line 57-58: Replace the hardcoded "20" with a calculation using
DEFAULT_LEDGER_BLOCK_TIME_MS so DEFAULT_SUPERBLOCK_SIZE stays in sync with
ledger block time; compute slots per second as 1000 /
DEFAULT_LEDGER_BLOCK_TIME_MS (using the same integer type as
DEFAULT_LEDGER_BLOCK_TIME_MS) and set DEFAULT_SUPERBLOCK_SIZE = 3600 *
slots_per_second, keeping the doc comment updated to reflect the derivation;
reference DEFAULT_SUPERBLOCK_SIZE and DEFAULT_LEDGER_BLOCK_TIME_MS when making
the change.
In `@magicblock-ledger/src/lib.rs`:
- Around line 32-35: Update the field documentation for notifier to reflect that
the broadcast channel now sends the full LatestBlockInner payload rather than
merely a signal; mention that subscribers receive LatestBlockInner instances
(not just a notification) and clarify whether callers should still use the load
method or prefer the payload from the channel for the current state, referencing
the notifier field and LatestBlockInner type.
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 511-518: In update_program_cache, avoid using write().unwrap() on
self.program_cache which will panic if the RwLock is poisoned; instead handle
the PoisonError explicitly by matching on self.program_cache.write() (or using
.into_inner() on the Err branch) so you can log or record the poisoning and
still obtain a mutable cache guard to run cache.prune(slot, 0) and set
cache.latest_root_slot = slot, then drop the guard before the syscall lookup;
update references to program_cache, update_program_cache, cache.prune, and
cache.latest_root_slot accordingly and remove the unwrap()/expect() usage.
---
Outside diff comments:
In `@magicblock-api/src/tickers.rs`:
- Around line 43-44: Replace the panic-causing expect call when fetching the
MagicContext account: instead of calling
accountsdb.get_account(&magic_program::MAGIC_CONTEXT_PUBKEY).expect(...), handle
the missing account defensively in the surrounding async loop by checking the
Option returned, logging a clear error via the existing logger, and either
returning an Err from the containing function or skipping/continuing the loop
iteration while preserving state; update code paths that use magic_context_acc
to accept the Option or the Result so the function signature (e.g., the ticker
handler) returns Result<..., Error> or handles the None case gracefully rather
than panicking.
In `@magicblock-processor/src/executor/mod.rs`:
- Around line 224-234: The write() call on writable_sysvar_cache used in
set_sysvars currently calls .unwrap() which is disallowed; replace the .unwrap()
with a non-panicking handling: either change set_sysvars to return a Result and
propagate the PoisonError from writable_sysvar_cache().write(), or (if you can
assert the lock poisoning is an unrecoverable invariant) replace .unwrap() with
.expect("invariant: writable_sysvar_cache RwLock must not be poisoned in
set_sysvars; if poisoned, recovery is undefined") to provide an explicit
justification; update references to set_sysvars/LatestBlockInner accordingly
when changing the signature.
---
Duplicate comments:
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 362-376: The replay branch currently skips updating the block
hasher, causing divergence in verify_block_as_replica; update self.hasher with
the transaction signature when handling TransactionProcessingMode::Replay (same
way as when is_execution is true) so both Replay and Execution contribute to the
hash—i.e., in the match arm for TransactionProcessingMode::Replay (and/or by
removing the is_execution gating) call
self.hasher.update(txn.transaction.signature().as_ref()) so replicated
transactions are hashed consistently with verify_block_as_replica.
- Around line 405-413: transition_to_new_slot must abort on ledger write
failures: change its signature to return a Result<u64, E> (or make it fatal) and
propagate errors from prepare_block/finalize_block (which calls write_block)
instead of always proceeding; stop calling update_sysvars and metrics::set_slot
when finalize_block/write_block fails. Locate transition_to_new_slot,
prepare_block, finalize_block and any calls to write_block, update_sysvars and
metrics::set_slot (also apply the same change to the other transition block
around lines 477-491) and ensure errors from finalize_block/write_block are
returned up so the slot transition is not applied on persistent write failure.
- Around line 366-396: The code currently increments self.index and updates
self.hasher before calling self.executors[...].try_send(...) so on send failure
the stream and hash advance incorrectly; fix by computing local slot and index
variables without mutating self, build the IndexedTransaction and call try_send
first, and only when try_send returns Ok do you mutate self.index (self.index +=
1) and call self.hasher.update(...); similarly only call
self.send_replication(...) after confirming sent and coordinator.is_primary();
on send failure return or propagate an error instead of silently returning None
so the caller can observe the enqueue failure (references: self.index,
self.hasher.update, self.executors[..].try_send, IndexedTransaction,
send_replication, coordinator.is_primary, Message::Transaction).
- Around line 438-443: The current call to
SystemTime::now().duration_since(UNIX_EPOCH).unwrap() can panic if the host
clock is before the epoch; replace the unwrap with explicit error handling: call
duration_since(UNIX_EPOCH) and match or map_err the Result, set unix_timestamp
from Ok(d.as_secs() as i64) and on Err(SystemTimeError) either return a
Result::Err from the enclosing function (propagate the error) or log and convert
it into a fatal error with clear context (include the SystemTimeError in the
message). Update the function signature to return a Result if you choose
propagation, and reference SystemTime::now, duration_since, UNIX_EPOCH,
SystemTimeError and the unix_timestamp variable when making the change.
In `@magicblock-replicator/src/nats/consumer.rs`:
- Around line 88-107: The pending method currently returns Option<u64> and
conflates "cancelled" with "no pending messages"; change its signature and
behaviour to return a distinct result for cancellation/errors (e.g.
Result<Option<u64>, PendingCancelled> or a small enum like
PendingOutcome::{Cancelled, Pending(u64), Error}), update the loop in pending
(referencing pending, self.inner.get_info(), CancellationToken) to return the
appropriate variant when cancel.cancelled() fires versus when get_info() errors
or returns a count, and then update the callers (e.g. the code in standby.rs
that currently calls pending(...).unwrap_or_default()) to match on the new
return type and handle Cancelled distinctly from zero pending so promotion is
not attempted during shutdown.
In `@magicblock-replicator/src/service/standby.rs`:
- Around line 63-76: The expiry event handled in the watcher.wait_for_expiry
branch is currently discarded when has_pending().await is true, so promotion
isn't retried once pending clears; add a flag (e.g., pending_expired_lock or
expiry_pending) in the Standby state that is set when watcher.wait_for_expiry
fires and cleared only after a successful promotion attempt or when can_promote
is false, then modify the loop so that after has_pending().await becomes false
it checks this flag and immediately retries the same promotion logic (calling
ctx.try_acquire_producer().await and ctx.into_primary(...).await with messages)
instead of waiting for the next expiry; ensure the flag is also checked/cleared
on other early exits to avoid stuck state and keep existing logging (info/warn)
paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bdb9c6c7-2d9c-49be-9799-40893b584f04
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (37)
Cargo.tomlconfig.example.tomlmagicblock-aperture/src/geyser.rsmagicblock-aperture/src/processor.rsmagicblock-aperture/src/state/blocks.rsmagicblock-aperture/tests/blocks.rsmagicblock-aperture/tests/mocked.rsmagicblock-aperture/tests/setup.rsmagicblock-aperture/tests/transactions.rsmagicblock-aperture/tests/websocket.rsmagicblock-api/src/lib.rsmagicblock-api/src/magic_validator.rsmagicblock-api/src/slot.rsmagicblock-api/src/tickers.rsmagicblock-config/src/config/ledger.rsmagicblock-config/src/consts.rsmagicblock-core/src/link.rsmagicblock-core/src/link/blocks.rsmagicblock-core/src/link/replication.rsmagicblock-ledger/src/blockstore_processor/mod.rsmagicblock-ledger/src/lib.rsmagicblock-metrics/src/metrics/mod.rsmagicblock-processor/Cargo.tomlmagicblock-processor/src/executor/mod.rsmagicblock-processor/src/scheduler/coordinator.rsmagicblock-processor/src/scheduler/mod.rsmagicblock-processor/src/scheduler/state.rsmagicblock-processor/src/scheduler/tests.rsmagicblock-processor/tests/fees.rsmagicblock-processor/tests/replay.rsmagicblock-replicator/src/nats/consumer.rsmagicblock-replicator/src/nats/mod.rsmagicblock-replicator/src/nats/producer.rsmagicblock-replicator/src/service/context.rsmagicblock-replicator/src/service/primary.rsmagicblock-replicator/src/service/standby.rstest-kit/src/lib.rs
💤 Files with no reviewable changes (3)
- magicblock-api/src/lib.rs
- magicblock-core/src/link.rs
- magicblock-api/src/slot.rs
3ca65ca to
9ce31db
Compare
2773373 to
06289eb
Compare
9ce31db to
9b8ab0b
Compare
06289eb to
0a64c31
Compare
9b8ab0b to
342a8cd
Compare
0a64c31 to
9b27820
Compare

Summary
Add a streaming blockhash computation to the transaction scheduler, moving it from slot_ticker. In addition introduced the notion of superblocks, where between each N slots the scheduler stops the world and takes a snapshot of accountsdb along with its checksum.
Compatibility
Blockhash computation has been moved completely to the scheduler and the way it is computed is different now.
Config change: new config parameters: ledger.superblock-size, accountsdb no longer has snapshot frequency parameter.
Testing
Checklist
Summary by CodeRabbit
New Features
superblock-size, default 72,000).Bug Fixes
Refactor
Chores
Tests