- High Performance: Optimized for concurrent writes and reads
- Topic-based Organization: Separate read/write streams per topic
- Configurable Consistency: Choose between strict and relaxed consistency models
- Memory-mapped I/O: Efficient file operations using memory mapping
- Persistent Read Offsets: Read positions survive process restarts
- Coordination-free Deletion: Atomic file cleanup without blocking operations
- Comprehensive Benchmarking: Built-in performance testing suite
Run quick benchmarks with:
pip install pandas matplotlib # we need these to show graphs
make bench-and-show-reads
Add Walrus to your Cargo.toml
:
[dependencies]
walrus-rust = "0.1.0"
use walrus_rust::{Walrus, ReadConsistency};
// Create a new WAL instance with default settings
let wal = Walrus::new()?;
// Write data to a topic
let data = b"Hello, Walrus!";
wal.append_for_topic("my-topic", data)?;
// Read data from the topic
if let Some(entry) = wal.read_next("my-topic")? {
println!("Read: {:?}", String::from_utf8_lossy(&entry.data));
}
use walrus_rust::{Walrus, ReadConsistency, FsyncSchedule};
// Configure with custom consistency and fsync behavior
let wal = Walrus::with_consistency_and_schedule(
ReadConsistency::AtLeastOnce { persist_every: 1000 },
FsyncSchedule::Milliseconds(500)
)?;
// Write and read operations work the same way
wal.append_for_topic("events", b"event data")?;
Walrus supports two consistency models:
- Behavior: Read offsets are persisted after every read operation
- Guarantees: No message will be read more than once, even after crashes
- Performance: Higher I/O overhead due to frequent persistence
- Use Case: Critical systems where duplicate processing must be avoided
let wal = Walrus::with_consistency(ReadConsistency::StrictlyAtOnce)?;
- Behavior: Read offsets are persisted every N read operations
- Guarantees: Messages may be re-read after crashes (at-least-once delivery)
- Performance: Better throughput with configurable persistence frequency
- Use Case: High-throughput systems that can handle duplicate processing
let wal = Walrus::with_consistency(
ReadConsistency::AtLeastOnce { persist_every: 5000 }
)?;
Control when data is flushed to disk:
- Behavior: Background thread flushes data every N milliseconds
- Default: 1000ms (1 second)
- Range: Minimum 1ms, recommended 100-5000ms
let wal = Walrus::with_consistency_and_schedule(
ReadConsistency::AtLeastOnce { persist_every: 1000 },
FsyncSchedule::Milliseconds(2000) // Flush every 2 seconds
)?;
WALRUS_QUIET
: Set to any value to suppress debug output during operations
export WALRUS_QUIET=1 # Suppress debug messages
Walrus organizes data in the following structure:
wal_files/
├── wal_1234567890.log # Log files (10MB blocks, 100 blocks per file)
├── wal_1234567891.log
├── read_offset_idx_index.db # Persistent read offset index
└── read_offset_idx_index.db.tmp # Temporary file for atomic updates
- Block Size: 10MB per block (configurable via
DEFAULT_BLOCK_SIZE
) - Blocks Per File: 100 blocks per file (1GB total per file)
- Max File Size: 1GB per log file
- Index Persistence: Read offsets stored in separate index files
Creates a new WAL instance with default settings (StrictlyAtOnce
consistency).
Creates a WAL with custom consistency mode and default fsync schedule (1000ms).
Walrus::with_consistency_and_schedule(mode: ReadConsistency, schedule: FsyncSchedule) -> std::io::Result<Self>
Creates a WAL with full configuration control.
Appends data to the specified topic. Topics are created automatically on first write.
Reads the next entry from the topic. Returns None
if no more data is available.
pub struct Entry {
pub data: Vec<u8>,
}
Walrus includes a comprehensive benchmarking suite to measure performance across different scenarios.
- Duration: 2 minutes
- Threads: 10 concurrent writers
- Data Size: Random entries between 500B and 1KB
- Topics: One topic per thread (
topic_0
throughtopic_9
) - Configuration:
AtLeastOnce { persist_every: 50 }
- Output:
benchmark_throughput.csv
- Phases:
- Write Phase: 1 minute (populate data)
- Read Phase: 2 minutes (consume data)
- Threads: 10 concurrent reader/writers
- Data Size: Random entries between 500B and 1KB
- Configuration:
AtLeastOnce { persist_every: 5000 }
- Output:
read_benchmark_throughput.csv
- Thread Counts: 1 to 10 threads (tested sequentially)
- Duration: 30 seconds per thread count
- Data Size: Random entries between 500B and 1KB
- Configuration:
AtLeastOnce { persist_every: 50 }
- Output:
scaling_results.csv
andscaling_results_live.csv
# Run individual benchmarks
make bench-writes # Write benchmark
make bench-reads # Read benchmark
make bench-scaling # Scaling benchmark
# Show results
make show-writes # Visualize write results
make show-reads # Visualize read results
make show-scaling # Visualize scaling results
# Live monitoring (run in separate terminal)
make live-writes # Live write throughput
make live-scaling # Live scaling progress
# Cleanup
make clean # Remove CSV files
# Write benchmark
cargo test --test multithreaded_benchmark_writes -- --nocapture
# Read benchmark
cargo test --test multithreaded_benchmark_reads -- --nocapture
# Scaling benchmark
cargo test --test scaling_benchmark -- --nocapture
All benchmarks use the following data generation strategy:
// Random entry size between 500B and 1KB
let size = rng.gen_range(500..=1024);
let data = vec![(counter % 256) as u8; size];
This creates realistic variable-sized entries with predictable content for verification.
The scripts/
directory contains Python visualization tools:
visualize_throughput.py
- Write benchmark graphsshow_reads_graph.py
- Read benchmark graphsshow_scaling_graph_writes.py
- Scaling resultslive_scaling_plot.py
- Live scaling monitoring
Requirements: pandas
, matplotlib
pip install pandas matplotlib
- Coordination-free Operations: Writers don't block readers, minimal locking
- Memory-mapped I/O: Efficient file operations with OS-level optimizations
- Topic Isolation: Each topic maintains independent read/write positions
- Persistent State: Read offsets survive process restarts
- Background Maintenance: Async fsync and cleanup operations
Important: Read offsets are decoupled from write offsets. This means:
- Each topic maintains its own read position independently
- Read positions are persisted to disk based on consistency configuration
- After restart, readers continue from their last persisted position
- Write operations don't affect existing read positions
This design enables multiple readers per topic and supports replay scenarios.
let wal = Walrus::with_consistency_and_schedule(
ReadConsistency::AtLeastOnce { persist_every: 10000 },
FsyncSchedule::Milliseconds(5000)
)?;
let wal = Walrus::with_consistency_and_schedule(
ReadConsistency::StrictlyAtOnce,
FsyncSchedule::Milliseconds(100)
)?;
let wal = Walrus::with_consistency_and_schedule(
ReadConsistency::AtLeastOnce { persist_every: 1000 },
FsyncSchedule::Milliseconds(1000)
)?;
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
git clone https://github.com/your-username/walrus.git
cd walrus
cargo build
cargo test
# Unit tests
cargo test --test unit
# Integration tests
cargo test --test integration
# End-to-end tests
cargo test --test e2e_longrunning
# All tests
cargo test
This project is licensed under the MIT License - see the LICENSE file for details.
- Initial release
- Core WAL functionality
- Topic-based organization
- Configurable consistency modes
- Comprehensive benchmark suite
- Memory-mapped I/O implementation
- Persistent read offset tracking