refactor: extract shared wal file substrate#134
Conversation
Summary by CodeRabbitRelease Notes
WalkthroughA new crate Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/reservation-core/src/wal_file.rs (1)
132-138:⚠️ Potential issue | 🟡 MinorFix test path prefix to use
reservation-coreinstead ofquota-core.The test file path format uses
"quota-core-{name}-{nanos}.wal"which appears to be a copy-paste artifact. This should be"reservation-core-{name}-{nanos}.wal"for clarity and to avoid potential temp file collisions if tests run concurrently across crates.Proposed fix
fn test_path(name: &str) -> PathBuf { let nanos = SystemTime::now() .duration_since(UNIX_EPOCH) .expect("system time should be after epoch") .as_nanos(); - std::env::temp_dir().join(format!("quota-core-{name}-{nanos}.wal")) + std::env::temp_dir().join(format!("reservation-core-{name}-{nanos}.wal")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/reservation-core/src/wal_file.rs` around lines 132 - 138, In the test helper function test_path, update the temp file name format string from "quota-core-{name}-{nanos}.wal" to "reservation-core-{name}-{nanos}.wal" so the generated temp files use the reservation-core prefix and avoid cross-crate collisions; locate the format call inside fn test_path and modify the string passed to std::env::temp_dir().join(...) accordingly.
🧹 Nitpick comments (5)
crates/quota-core/src/recovery.rs (1)
152-236: Consider adding recovery logging for operational clarity.Unlike
allocdb-core/src/recovery.rswhich logs recovery completion and failures atinfo/errorlevels,quota-corelacks equivalent structured logging. For consistency and debuggability, consider adding similar lifecycle logging around recovery boundaries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/quota-core/src/recovery.rs` around lines 152 - 236, Add structured info/error logs around recover_quota_with_observer lifecycle to mirror allocdb-core: log start of recovery (including snapshot presence/loaded_snapshot_lsn) right after snapshot load, log an info after wal_file.truncate_to_valid_prefix completes, log progress at RecoveryBoundary::AfterReplayFrame (include lsn, record_type, replay_ordinal/replayable_frame_count), and log a final info on successful completion before returning RecoveryResult; also ensure errors during snapshot load, WAL truncate, or replay are logged with error level where the ? propagates (e.g., wrap or map_err to log before returning) so failures are visible in logs. Use the existing observer/RecoveryBoundary places and db.set_progress/RecoveryResult return point to attach these logs.crates/allocdb-wal-file/src/lib.rs (4)
63-71: Unnecessary seek before dropping the truncation handle.The
seekon line 66 has no effect sincefileis dropped immediately after and the new append handle (line 69) positions at EOF automatically due to append mode.♻️ Suggested simplification
pub fn truncate_to(&mut self, valid_prefix: u64) -> Result<(), std::io::Error> { let mut file = OpenOptions::new().write(true).open(&self.path)?; file.set_len(valid_prefix)?; - file.seek(SeekFrom::Start(valid_prefix))?; file.sync_data()?; drop(file); self.file = open_append_file(&self.path)?; Ok(()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/allocdb-wal-file/src/lib.rs` around lines 63 - 71, In truncate_to remove the redundant file.seek(SeekFrom::Start(valid_prefix)) call: after opening a write handle to set_len and sync_data, drop that handle and reopen the append handle via self.file = open_append_file(&self.path)?; so delete the seek on the temporary `file` in fn truncate_to to simplify behavior (the new append handle returned by open_append_file positions at EOF automatically).
115-126: Consider documenting the Windows durability caveat.The non-Unix
sync_parent_diris a no-op, meaning atomic rename durability guarantees are weaker on Windows. Consider adding a doc comment noting this platform-specific behavior for maintainers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/allocdb-wal-file/src/lib.rs` around lines 115 - 126, Add a doc comment to the non-Unix implementation of sync_parent_dir (the function annotated with #[cfg(not(unix))]) explaining that it is a no-op on Windows and therefore atomic rename/durability guarantees are weaker on that platform; mention that callers relying on fsyncing parent directories to persist renames should be aware this behavior is platform-specific and may require alternate strategies on Windows.
11-94: Consider adding structured logging for lifecycle and state-transition events.Per coding guidelines, logging should be added "where it materially improves debuggability or operational clarity." For a critical durability component, consider logging:
info!on file open (lifecycle event)info!ordebug!on truncation and atomic replacement (significant state transitions)This aids debugging recovery issues and operational observability without cluttering hot-path append operations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/allocdb-wal-file/src/lib.rs` around lines 11 - 94, Add structured lifecycle/state-transition logs to AppendWalFile: in open(...) emit an info! with the file path and whether it was created or opened; in truncate_to(...) emit info! or debug! with the path and the valid_prefix value (and resulting file length after truncate if available); in replace_with_bytes(...) emit info! or debug! with the path and bytes.len() before writing and after successful rename; use the existing function names (AppendWalFile::open, append_bytes, sync, read_all, truncate_to, replace_with_bytes) to locate insertion points and prefer the crate's logging macro (info!/debug!) so logs include path and numeric details for observability without touching hot-path append_bytes/sync.
144-185: Add edge-case and negative-path tests per coding guidelines.The current test suite covers happy paths but lacks edge-case, negative-path, and recovery tests. Per coding guidelines for
**/*.rsfiles, favor these over shallow happy-path coverage.Consider adding tests for:
- Truncating to zero
- Truncating to length greater than file size (documents
set_lenbehavior)- Replacing with empty bytes
- Error propagation when underlying operations fail (e.g., after truncate or replace, verify append still works; or simulate permission errors)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/allocdb-wal-file/src/lib.rs` around lines 144 - 185, Add focused edge-case and negative-path tests for AppendWalFile: create tests named like truncate_to_zero, truncate_beyond_length, replace_with_empty_bytes, and error_propagation_on_failed_ops that exercise AppendWalFile::open, ::append_bytes, ::sync, ::truncate_to and ::replace_with_bytes; specifically assert behavior after truncating to 0, truncating to a length greater than current file size (documenting set_len semantics), replacing with an empty slice produces an empty file and allows future appends, and simulate/verify error propagation when underlying filesystem ops fail (for example by opening a read-only handle or changing permissions) to ensure append_bytes returns an error and state is consistent; follow existing test structure (use test_path, fs::remove_file) and add assertions using ::read_all and unwrap/expect as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/reservation-core/src/wal_file.rs`:
- Around line 132-138: In the test helper function test_path, update the temp
file name format string from "quota-core-{name}-{nanos}.wal" to
"reservation-core-{name}-{nanos}.wal" so the generated temp files use the
reservation-core prefix and avoid cross-crate collisions; locate the format call
inside fn test_path and modify the string passed to
std::env::temp_dir().join(...) accordingly.
---
Nitpick comments:
In `@crates/allocdb-wal-file/src/lib.rs`:
- Around line 63-71: In truncate_to remove the redundant
file.seek(SeekFrom::Start(valid_prefix)) call: after opening a write handle to
set_len and sync_data, drop that handle and reopen the append handle via
self.file = open_append_file(&self.path)?; so delete the seek on the temporary
`file` in fn truncate_to to simplify behavior (the new append handle returned by
open_append_file positions at EOF automatically).
- Around line 115-126: Add a doc comment to the non-Unix implementation of
sync_parent_dir (the function annotated with #[cfg(not(unix))]) explaining that
it is a no-op on Windows and therefore atomic rename/durability guarantees are
weaker on that platform; mention that callers relying on fsyncing parent
directories to persist renames should be aware this behavior is
platform-specific and may require alternate strategies on Windows.
- Around line 11-94: Add structured lifecycle/state-transition logs to
AppendWalFile: in open(...) emit an info! with the file path and whether it was
created or opened; in truncate_to(...) emit info! or debug! with the path and
the valid_prefix value (and resulting file length after truncate if available);
in replace_with_bytes(...) emit info! or debug! with the path and bytes.len()
before writing and after successful rename; use the existing function names
(AppendWalFile::open, append_bytes, sync, read_all, truncate_to,
replace_with_bytes) to locate insertion points and prefer the crate's logging
macro (info!/debug!) so logs include path and numeric details for observability
without touching hot-path append_bytes/sync.
- Around line 144-185: Add focused edge-case and negative-path tests for
AppendWalFile: create tests named like truncate_to_zero, truncate_beyond_length,
replace_with_empty_bytes, and error_propagation_on_failed_ops that exercise
AppendWalFile::open, ::append_bytes, ::sync, ::truncate_to and
::replace_with_bytes; specifically assert behavior after truncating to 0,
truncating to a length greater than current file size (documenting set_len
semantics), replacing with an empty slice produces an empty file and allows
future appends, and simulate/verify error propagation when underlying filesystem
ops fail (for example by opening a read-only handle or changing permissions) to
ensure append_bytes returns an error and state is consistent; follow existing
test structure (use test_path, fs::remove_file) and add assertions using
::read_all and unwrap/expect as appropriate.
In `@crates/quota-core/src/recovery.rs`:
- Around line 152-236: Add structured info/error logs around
recover_quota_with_observer lifecycle to mirror allocdb-core: log start of
recovery (including snapshot presence/loaded_snapshot_lsn) right after snapshot
load, log an info after wal_file.truncate_to_valid_prefix completes, log
progress at RecoveryBoundary::AfterReplayFrame (include lsn, record_type,
replay_ordinal/replayable_frame_count), and log a final info on successful
completion before returning RecoveryResult; also ensure errors during snapshot
load, WAL truncate, or replay are logged with error level where the ? propagates
(e.g., wrap or map_err to log before returning) so failures are visible in logs.
Use the existing observer/RecoveryBoundary places and
db.set_progress/RecoveryResult return point to attach these logs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5c2665df-8dd8-4a35-bcea-1623c63f2b46
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
Cargo.tomlcrates/allocdb-core/Cargo.tomlcrates/allocdb-core/src/recovery.rscrates/allocdb-core/src/recovery_issue_30_tests.rscrates/allocdb-core/src/recovery_issue_31_tests.rscrates/allocdb-core/src/recovery_revoke_tests.rscrates/allocdb-core/src/recovery_tests.rscrates/allocdb-core/src/wal_file.rscrates/allocdb-node/src/engine.rscrates/allocdb-wal-file/Cargo.tomlcrates/allocdb-wal-file/src/lib.rscrates/quota-core/Cargo.tomlcrates/quota-core/src/recovery.rscrates/quota-core/src/wal_file.rscrates/reservation-core/Cargo.tomlcrates/reservation-core/src/wal_file.rsscripts/check_repo.sh
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: semgrep-cloud-platform/scan
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/allocdb-core/src/recovery_revoke_tests.rscrates/allocdb-node/src/engine.rscrates/allocdb-core/src/recovery_issue_30_tests.rscrates/allocdb-core/src/recovery_issue_31_tests.rscrates/allocdb-core/src/recovery_tests.rscrates/allocdb-core/src/recovery.rscrates/quota-core/src/recovery.rscrates/allocdb-core/src/wal_file.rscrates/quota-core/src/wal_file.rscrates/allocdb-wal-file/src/lib.rscrates/reservation-core/src/wal_file.rs
🔇 Additional comments (34)
scripts/check_repo.sh (1)
38-38: LGTM!The dependency whitelist correctly includes
allocdb-wal-fileto permit the new workspace crate as a dependency ofallocdb-core.Cargo.toml (1)
4-4: LGTM!The new
allocdb-wal-filecrate is correctly added to the workspace members, enabling path dependencies fromallocdb-core,quota-core, andreservation-core.crates/reservation-core/Cargo.toml (1)
9-9: LGTM!The path dependency on
allocdb-wal-fileis correctly added, consistent with the same pattern inquota-core/Cargo.tomlandallocdb-core/Cargo.toml.crates/allocdb-core/src/recovery_issue_31_tests.rs (1)
70-70: LGTM!The call site correctly passes
&mut walto match the updatedrecover_allocdbsignature that now requires mutable WAL access for potential truncation during recovery.crates/allocdb-core/Cargo.toml (1)
9-9: LGTM!The path dependency on
allocdb-wal-fileis correctly added and is now permitted by the updatedscripts/check_repo.shwhitelist.crates/allocdb-wal-file/Cargo.toml (1)
1-8: LGTM!The new crate manifest correctly inherits workspace configuration and has no external dependencies, keeping the shared WAL substrate minimal and self-contained.
crates/allocdb-core/src/recovery_revoke_tests.rs (2)
92-93: LGTM!The call site correctly passes
&mut walto match the updatedrecover_allocdbsignature.
160-161: LGTM!The call site correctly passes
&mut walto match the updatedrecover_allocdbsignature, consistent with the change in the first test.crates/quota-core/Cargo.toml (1)
9-9: LGTM!The path dependency on
allocdb-wal-fileis correctly added, enablingquota-coreto delegate WAL I/O to the sharedAppendWalFileabstraction.crates/allocdb-node/src/engine.rs (1)
420-441: LGTM!The recovery call site is correctly updated to pass
&mut walto match the newrecover_allocdb_with_observersignature. The mutable binding and error mapping chain are properly preserved.crates/allocdb-core/src/recovery_issue_30_tests.rs (1)
82-83: LGTM!Test call sites correctly updated to pass mutable WAL references, matching the new
recover_allocdbsignature.Also applies to: 130-131, 148-148, 177-177
crates/allocdb-core/src/recovery_tests.rs (1)
95-95: LGTM!All recovery test call sites correctly updated to use mutable WAL references. Test coverage for various recovery scenarios is preserved.
Also applies to: 151-151, 196-197, 216-216, 222-222, 266-267, 313-314
crates/allocdb-core/src/recovery.rs (3)
149-164: LGTM!The signature change to
&mut WalFileis correctly justified by the underlyingtruncate_to_valid_prefix(&mut self)call at line 221. The recovery logic and error handling remain intact.
173-201: LGTM!The observer-based recovery variant correctly propagates the mutable borrow requirement. Logging at
infolevel for successful recovery anderrorlevel for failures vialog_recovery_failurefollows the coding guidelines.
203-222: LGTM!Internal implementation correctly receives and uses the mutable WAL reference for
truncate_to_valid_prefix(). The observer callback pattern and replay logic remain unchanged.crates/quota-core/src/recovery.rs (2)
135-150: LGTM!The signature change to
&mut WalFilecorrectly enables thetruncate_to_valid_prefixmutation at line 170.
377-377: LGTM!Test call sites correctly updated to pass mutable WAL references.
Also applies to: 415-415, 465-465, 520-520
crates/allocdb-core/src/wal_file.rs (6)
1-10: LGTM!Clean refactor to delegate WAL operations to
AppendWalFile. The struct simplification removes duplicate path storage sinceAppendWalFileowns the path internally.
37-50: LGTM!Opening and path access correctly delegate to
AppendWalFile. The error conversion viaFrom<std::io::Error>handles I/O failures from the underlying abstraction.
63-77: LGTM!Frame append and sync operations cleanly delegate to
AppendWalFile::append_bytesandAppendWalFile::sync. Payload validation occurs before delegation, preserving the existing contract.
84-115: LGTM!Recovery and truncation correctly use the
recover_filehelper and delegate truncation toAppendWalFile::truncate_to. The fail-closed behavior on mid-log corruption is preserved (returnsWalFileError::Corruptionwithout truncating).
123-134: LGTM!The
replace_with_framesimplementation correctly validates all frames first, then encodes into a contiguous buffer before atomic replacement viareplace_with_bytes. This preserves crash safety (atomic rename + parent dir sync per context snippet 1).
148-155: LGTM!The
recover_filehelper cleanly reads the full WAL viaread_all()and scans frames. Theu64conversion panic is acceptable as WAL files exceedingu64::MAXbytes are unrealistic.crates/quota-core/src/wal_file.rs (4)
1-10: LGTM!Clean delegation to
AppendWalFile, mirroring theallocdb-corepattern for the shared WAL substrate.
37-61: LGTM!Open, path, append, and sync operations correctly delegate to
AppendWalFile.
67-85: LGTM!Truncation logic preserves fail-closed behavior on corruption and correctly delegates torn-tail truncation to
AppendWalFile::truncate_to.
87-119: LGTM!Frame replacement and recovery helper follow the same pattern as
allocdb-core, maintaining consistency across the shared substrate.crates/reservation-core/src/wal_file.rs (3)
1-10: LGTM!Clean delegation to
AppendWalFile, consistent with the other crate-local WAL wrappers.
37-65: LGTM!Core WAL operations correctly delegate to
AppendWalFile.
67-119: LGTM!Truncation, replacement, and recovery helper implementations are consistent with the shared substrate pattern.
crates/allocdb-wal-file/src/lib.rs (4)
1-9: LGTM!Clean struct definition with appropriate fields for the append-only WAL file abstraction.
11-44: LGTM!The
open,path,append_bytes, andsyncmethods are straightforward delegations with correct semantics. Usingsync_data()is appropriate for WAL durability.
46-56: LGTM!Opening a fresh read handle in
read_allavoids position conflicts with the append handle and correctly reads the full on-disk state.
78-94: LGTM on the atomic replacement pattern.The write-to-temp → sync → rename → parent-dir-sync → reopen sequence is correct for crash-safe atomic replacement.
Summary
allocdb-wal-fileTesting
Closes #123
Refs #120