-
Notifications
You must be signed in to change notification settings - Fork 8
Improved Memory Usage, Incremental File Level Updates #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Move specialized documentation to on-demand context files: - .agents/MCP.md - MCP server details - .agents/NEW_ANALYZER.md - Adding new analyzers (moved from skill) - .agents/PERFORMANCE.md - Performance optimization - .agents/PRICING.md - Model pricing updates - .agents/TESTING.md - Test strategy - .agents/TUI.md - Real-time monitoring - .agents/TYPES.md - Core data structures Reduces always-read context from 158 to 58 lines (63% reduction). Also adds MCP and Configuration sections to README.md.
Remove code blocks and struct definitions that duplicate source code. AI agents will read the actual source files for implementation patterns. Consolidate TESTING.md into NEW_ANALYZER.md. Total reduction: 751 -> 232 lines.
… updates Replace full-stats caching with lightweight view types for TUI rendering. Messages are converted to pre-computed aggregates immediately after parsing, reducing memory from ~550MB to ~70MB. Key changes: - Add view types: SessionAggregate, AnalyzerStatsView, FileContribution - Add mimalloc v3 as global allocator (replaces glibc malloc arenas) - Replace message cache with FileContribution cache for O(1) incremental updates - Use temporary Rayon thread pool for initial load, dropped after parsing Mimalloc is optional via feature flag (enabled by default). Disable for memory profiling with heaptrack: cargo build --no-default-features Removed 1500+ lines of dead code and 37 tests that only covered removed paths.
Replace PathBuf keys with xxh3 hash-based PathHash newtype to eliminate allocations during incremental cache updates. Cache lookups and updates now use PathHash::new(path) which computes the hash in place without any heap allocation.
…short-circuit after finding first file Add walk_data_dir() helpers to all analyzers that reuse WalkDir config between discover_data_sources() and is_available(). The is_available() implementations now use .any() which stops iteration after finding the first matching file, avoiding the cost of discovering all files just to check availability. - Add walk_data_dir() helper to claude_code, codex_cli, gemini_cli, opencode, qwen_code, pi_agent analyzers - Add walk_data_dirs() helper to copilot analyzer (multiple roots) - Add walk_vscode_extension_tasks() and vscode_extension_has_sources() shared helpers for cline, kilo_code, roo_code analyzers - Refactor discover_vscode_extension_sources() to use shared helper - Add get_stats_with_sources() default trait method to avoid double discovery when loading stats - Make get_watch_directories() a required trait method (no default) - Remove duplicated get_stats() and is_available() from all analyzers
Add is_valid_data_path() to Analyzer trait to filter invalid paths (directories, wrong file types) before attempting to parse. Each analyzer implements validation for its specific data format: - OpenCode: .json at depth 2 (session_id/message.json) - Claude Code: .jsonl at depth 2 (project_id/conversation.jsonl) - VSCode extensions: ui_messages.json files - Copilot: .json in chatSessions directory - Gemini/Qwen: .json in chats directory - Codex CLI: .jsonl files - Pi Agent: .jsonl at depth 2 - Piebald: app.db file Fixes 'Is a directory' errors when OpenCode creates session directories.
Adds file-based debug logging to help diagnose potential deadlock issues with RwLock contention. Logs lock acquisition/release for both read and write locks, watch channel operations, and TUI lifecycle. Enable with: SPLITRAIL_DEBUG_LOG=1 Logs to: /tmp/splitrail-debug.log
…mory - Change SessionAggregate.analyzer_name from String to Arc<str>, saving ~200KB by sharing a single allocation across all sessions from the same analyzer - Change AnalyzerStatsView.analyzer_name to Arc<str> for consistency - Fix redundant cloning in aggregate_sessions_from_messages using or_insert_with_key, reducing from 3 clones to 1 per message (~200-300KB savings) - Remove unused Serialize/Deserialize derives from view-only types - Fix pre-existing clippy warning in debug_log.rs (collapsible if) Total estimated idle memory savings: ~400-500KB (8-9%)
The deadlock occurred when: 1. Thread 1 (TUI) held read lock on a view in draw_ui 2. Thread 2 (Watcher) queued for write lock on same view 3. Thread 1 tried to acquire another read in draw_summary_stats 4. parking_lot's fair queuing blocked Thread 1's new read behind Thread 2's write 5. Neither thread could proceed - classic priority inversion deadlock Fix: Scope the read lock in draw_ui to release it before calling draw_summary_stats, which acquires its own independent read locks.
Replace full Stats (320 bytes) with compact TuiStats (24 bytes) in SessionAggregate and DailyStats, keeping only the 6 fields displayed in the TUI: input/output/reasoning/cached tokens, cost, and tool calls. - Use cost_cents (u32) instead of cost (f64) to avoid float precision issues - Make format_number generic via impl Into<u64> to accept both u32 and u64 - Fix MCP get_file_operations to compute from raw messages (not TuiStats) - Keep full Stats in ConversationMessage for upload serialization Memory savings: ~710 KB (91% reduction in stats memory) for typical usage.
This reverts commit 0b1a5bd.
- Add lasso crate for model name interning (4-byte Spur vs 24+ byte String) - Replace day_key String with compact DayKey struct (4 bytes vs 34+ bytes) - Stores year(u16) + month(u8) + day(u8) directly, no heap allocation - DayKey::from_local() extracts components from DateTime without formatting - Pre-size models Vec with capacity of 2 (most sessions use 1-2 models) - Add shrink_to_fit() after building session aggregates Estimated memory savings: ~1.5MB at idle (~37% reduction)
- Change session_day_filters from Vec<Option<String>> to Vec<Option<DayKey>> - Change DailyStats.date from String to DayKey - Remove PartialEq<str/String> impls from DayKey (no longer needed) - Add Default derive to DayKey, add Serialize/Deserialize impls - Simplify update_day_filters using resize/truncate
The rayon thread pool was adding complexity without benefit since analyzers are already parallelized internally. Simplifies to async/await.
Replace rayon thread pool with tokio's async runtime for parallel analyzer loading. Each analyzer now runs concurrently via join_all while parsing files sequentially within each analyzer. Switch from jwalk to the lighter walkdir crate for directory traversal. Remove rayon dependency entirely. - Consolidate parse_sources/parse_sources_parallel into parse_sources - Inline walk_data_dir() helpers into discover/is_available methods - Update all analyzers to use walkdir's into_path() API - Remove redundant deduplication overrides where global_hash suffices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/mcp/server.rs (1)
248-307: Consider simplifying the message filtering logic.The current implementation filters messages in two separate steps (analyzer, then date), creating intermediate Vec collections. This could be streamlined with iterator chaining.
🔎 Proposed refactor for cleaner filtering
let stats = self.load_stats().map_err(|e| e.to_string())?; - // Collect messages, optionally filtered by analyzer - let messages: Vec<_> = if let Some(ref analyzer_name) = req.analyzer { - stats - .analyzer_stats - .iter() - .filter(|a| a.analyzer_name.eq_ignore_ascii_case(analyzer_name)) - .flat_map(|a| a.messages.iter()) - .collect() - } else { - stats - .analyzer_stats - .iter() - .flat_map(|a| a.messages.iter()) - .collect() - }; - - // Filter by date if specified - let filtered: Vec<_> = if let Some(ref date) = req.date { - messages - .into_iter() - .filter(|m| { - m.date - .with_timezone(&chrono::Local) - .format("%Y-%m-%d") - .to_string() - == *date - }) - .collect() - } else { - messages - }; - - // Sum file operations from raw Stats + // Filter and sum file operations in one pass let mut response = FileOpsResponse::default(); - for msg in filtered { + + for msg in stats + .analyzer_stats + .iter() + .filter(|a| { + req.analyzer + .as_ref() + .map(|name| a.analyzer_name.eq_ignore_ascii_case(name)) + .unwrap_or(true) + }) + .flat_map(|a| a.messages.iter()) + .filter(|m| { + req.date + .as_ref() + .map(|d| { + m.date + .with_timezone(&chrono::Local) + .format("%Y-%m-%d") + .to_string() + == *d + }) + .unwrap_or(true) + }) + { response.files_read += msg.stats.files_read; // ... rest of accumulation }src/types.rs (1)
424-434: Fix inconsistent saturation handling inFrom<&Stats>implementation.The
From<&Stats>implementation (lines 424-434) uses plain casts that truncate values above u32::MAX, while theAddAssign,SubAssignoperators andaccumulate_tui_statsfunction all usesaturating_add/saturating_sub. Although large aggregations are handled correctly through the saturating operators used in the actual code flow, theFromimplementation should be consistent:TuiStats { input_tokens: s.input_tokens.min(u32::MAX as u64) as u32, output_tokens: s.output_tokens.min(u32::MAX as u64) as u32, reasoning_tokens: s.reasoning_tokens.min(u32::MAX as u64) as u32, cached_tokens: s.cached_tokens.min(u32::MAX as u64) as u32, cost_cents: (s.cost * 100.0).round() as u32, tool_calls: s.tool_calls, }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mcp/server.rssrc/mcp/types.rssrc/types.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Follow Rust 2024 edition conventions
Useanyhow::Resultfor error handling in Rust code
Preferasync/awaitover raw futures in Rust code
Useparking_lotlocks overstd::syncfor performance in Rust code
Files:
src/mcp/types.rssrc/types.rssrc/mcp/server.rs
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Normalize parsed data to common format defined in `src/types.rs` for cross-platform compatibility
Applied to files:
src/mcp/types.rs
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.
Applied to files:
src/mcp/types.rssrc/types.rssrc/mcp/server.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Applied to files:
src/types.rssrc/mcp/server.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
Applied to files:
src/mcp/server.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation
Applied to files:
src/mcp/server.rs
🧬 Code graph analysis (2)
src/types.rs (3)
src/tui/logic.rs (1)
aggregate_sessions_from_messages(131-182)src/utils.rs (1)
aggregate_by_date(108-228)src/mcp/types.rs (1)
new(118-135)
src/mcp/server.rs (2)
src/mcp/types.rs (1)
new(118-135)src/types.rs (1)
cost(405-407)
🔇 Additional comments (10)
src/types.rs (4)
14-33: LGTM: Model interner is well-designed.The global ThreadedRodeo interner is a solid choice for reducing memory usage of repeated model names. The use of LazyLock ensures thread-safe initialization, and the &'static str return from resolve_model is safe given ThreadedRodeo's lifetime guarantees.
125-143: LGTM: Efficient memory design.Good use of
Arc<str>for shared analyzer names andVec<Spur>for interned models. This significantly reduces memory overhead compared to storing full Strings in every SessionAggregate.
534-551: LGTM: FileContribution construction is correct.The method properly delegates to aggregation utilities and correctly computes conversation count from unique sessions. Filtering out "unknown" dates is a sensible safeguard.
553-598: LGTM: Incremental addition logic is sound.The method correctly handles both merging existing sessions and appending new ones. Model deduplication and earliest-timestamp tracking are properly implemented, and the final sort maintains consistent ordering.
src/mcp/types.rs (2)
104-136: LGTM: Clean separation of TUI stats and on-demand file operations.The introduction of
DateFileOpsand theDailySummary::newconstructor properly addresses the previous concern about file operation stats being hardcoded to zero. The design cleanly separates TUI-relevant fields (DailyStats) from less-frequently-needed file operation stats (computed on-demand from raw messages).
163-163: LGTM: Default derive is appropriate.Adding
DefaulttoFileOpsResponseis a minor convenience improvement that's used in the refactoredget_file_operationsendpoint.src/mcp/server.rs (4)
41-83: LGTM: Synchronous stats loading and on-demand file ops computation.The change from async to synchronous
load_statsaligns with the PR objectives to avoid Tokio in background paths. Thecompute_file_ops_by_datemethod correctly aggregates file operations from raw messages on-demand, avoiding the need to store them in DailyStats.Based on learnings: Prefer blocking I/O on dedicated threads rather than async/await with Tokio for TUI-related code.
119-153: LGTM: Daily stats now correctly include file operations.The endpoint properly integrates on-demand file operation computation via
compute_file_ops_by_dateand passes the results toDailySummary::new. This resolves the previous issue where file operation fields were hardcoded to zero.
313-390: LGTM: Consistent updates to use method-based cost access.All endpoints correctly use
ds.stats.cost()method instead of direct field access, aligning with the TuiStats API. The change from.iter()to.into_iter()inlist_analyzersis a minor efficiency improvement.
436-507: LGTM: Resource handlers updated consistently.Both resource handlers correctly use synchronous
load_statsand thecost()method accessor. The logic remains sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/main.rs (1)
3-3: Consider usingparking_lot::Mutexfor consistency with project guidelines.The import uses
std::sync::Mutex, which is used forupload_statusat line 234. While this isn't a performance-critical hot path, the coding guidelines recommendparking_lotlocks overstd::syncfor better performance characteristics. Consider replacingstd::sync::Mutexwithparking_lot::Mutexfor consistency across the codebase.🔎 Suggested change
-use std::sync::{Arc, Mutex}; +use parking_lot::Mutex; +use std::sync::Arc;Then at line 234, the usage remains the same:
let upload_status = Arc::new(Mutex::new(tui::UploadStatus::None));As per coding guidelines,
parking_lotis preferred for performance in Rust code.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
.agents/PERFORMANCE.mdCargo.tomlsrc/main.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- Cargo.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Follow Rust 2024 edition conventions
Useanyhow::Resultfor error handling in Rust code
Preferasync/awaitover raw futures in Rust code
Useparking_lotlocks overstd::syncfor performance in Rust code
Files:
src/main.rs
🧠 Learnings (8)
📓 Common learnings
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: .agents/PERFORMANCE.md:0-0
Timestamp: 2026-01-02T01:31:46.893Z
Learning: Splitrail uses `walkdir` (not `jwalk`) for directory traversal because most analyzer data directories have flat structures where parallel directory walking provides no benefit. Parallel processing is applied at the file parsing level via rayon instead, where the actual performance gains are realized.
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.
Applied to files:
src/main.rs
📚 Learning: 2026-01-01T00:15:25.382Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-01T00:15:25.382Z
Learning: Applies to AGENTS.md : Document agent responsibilities and capabilities in AGENTS.md
Applied to files:
.agents/PERFORMANCE.md
📚 Learning: 2026-01-02T01:31:46.893Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: .agents/PERFORMANCE.md:0-0
Timestamp: 2026-01-02T01:31:46.893Z
Learning: Splitrail uses `walkdir` (not `jwalk`) for directory traversal because most analyzer data directories have flat structures where parallel directory walking provides no benefit. Parallel processing is applied at the file parsing level via rayon instead, where the actual performance gains are realized.
Applied to files:
.agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Applied to files:
.agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
Applied to files:
.agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation
Applied to files:
.agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code
Applied to files:
.agents/PERFORMANCE.md
🔇 Additional comments (7)
.agents/PERFORMANCE.md (1)
1-18: LGTM! Well-structured performance documentation.The documentation accurately reflects the project's performance optimizations, correctly references
walkdirfor directory traversal (as per project learnings), and provides clear, actionable guidelines. The mention ofparking_lotlocks aligns with coding standards, and the techniques listed match the architectural changes in this PR.src/main.rs (6)
25-27: LGTM! Proper global allocator declaration.The mimalloc global allocator is correctly feature-gated and uses the official crate as recommended in previous reviews.
212-231: LGTM! Excellent scoped threadpool pattern with memory reclamation.The temporary rayon threadpool is created, used via
install(), and then dropped immediately after initialization. The explicitrelease_unused_memory()call ensures memory is returned to the OS. This pattern aligns perfectly with the PR's memory-management objectives.
246-260: LGTM! Proper memory management for background auto-upload.The separate registry for upload and the call to
release_unused_memory()after scoped parallel loading ensures the background upload doesn't retain unnecessary memory. This aligns with the incremental upload approach described in the PR objectives.
294-304: LGTM! Consistent scoped threadpool pattern.Identical pattern to
run_default: create temporary pool, load stats viainstall(), drop pool, release memory. Consistent implementation across the codebase.
392-402: LGTM! Consistent memory management in stats command.Same scoped threadpool and memory-release pattern applied consistently.
444-458: LGTM! Well-documented memory release helper.The implementation uses the official
libmimalloc_sys::mi_collect(addressing previous review feedback) with a clear SAFETY comment. The feature-gated no-op version ensures compilation without mimalloc. The function signature and documentation are clear.
Replace all std::sync::Mutex usage with parking_lot::Mutex for better performance, as recommended in the coding guidelines. This also simplifies lock patterns since parking_lot::Mutex::lock() never fails. Additionally removes a redundant upload_in_progress check in watcher.rs that duplicated an earlier check in the same function.
Change SessionAggregate.models from Vec<Spur> to BTreeMap<Spur, u32> to correctly handle incremental updates when multiple files contribute the same model to a session. This matches the pattern already used by DailyStats.models. Add tests for model reference counting in both DailyStats and SessionAggregate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/types.rs (1)
348-389: Optional: Consider clamping floating-point cost subtraction.Line 358 uses direct floating-point subtraction for
cost, which can produce tiny negative values (e.g.,-1e-15) due to precision errors when subtracting nearly-equal costs. While functionally harmless, consider clamping to zero for cleaner output:- self.cost -= rhs.cost; + self.cost = (self.cost - rhs.cost).max(0.0);This is a cosmetic refinement and not critical for functionality.
🧹 Nitpick comments (4)
src/watcher.rs (1)
297-309: Consider replacing tokio::spawn with std::thread::spawn for background uploads.Line 298 uses
tokio::spawnfor background upload tasks. Per your earlier feedback, TUI-related code should avoid tokio and prefer blocking I/O on a spawned thread or small threadpool.Consider refactoring to:
- Use
std::thread::spawninstead oftokio::spawn- Make the upload call blocking (or wrap async call in a local runtime within the thread)
- This eliminates tokio dependency from the TUI path
Example approach using std::thread::spawn
- // Spawn background upload task with only the messages to upload - tokio::spawn(async move { - upload::perform_background_upload_messages( + // Spawn background upload on dedicated thread + std::thread::spawn(move || { + // Run blocking upload or create local tokio runtime if needed + let rt = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build() + .unwrap(); + rt.block_on(async { + upload::perform_background_upload_messages( - messages, - upload_status, - None, - Some(move || dirty_files.clear()), - ) - .await; + messages, + upload_status, + None, + Some(move || dirty_files.clear()), + ) + .await; + }); - // Mark upload as complete - *upload_in_progress.lock() = false; + // Mark upload as complete + *upload_in_progress.lock() = false; + }); });Alternatively, consider making the upload path fully synchronous to avoid tokio entirely.
Based on learnings: In TUI code (src/tui.rs, src/watcher.rs, and related), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool.
src/analyzers/piebald.rs (1)
252-258: Consider logging parse errors in parallel processing.Line 255 uses
unwrap_or_default()which silently ignores parse errors for individual sources. While this provides fault tolerance, it may hide issues from users.Consider logging errors before returning the default:
let all_messages: Vec<ConversationMessage> = sources .par_iter() - .flat_map(|source| self.parse_source(source).unwrap_or_default()) + .flat_map(|source| { + self.parse_source(source).unwrap_or_else(|e| { + eprintln!("Error parsing source {:?}: {}", source.path, e); + Vec::new() + }) + }) .collect();This maintains fault tolerance while providing visibility into parsing issues.
src/utils.rs (1)
264-293: Remove redundantHashSetimports.
HashSetis already imported at line 1 (use std::collections::{BTreeMap, HashSet};). The inneruse std::collections::HashSet;statements at lines 267 and 280 are unnecessary.🔎 Suggested cleanup
/// Sequential deduplication by global_hash using HashSet. /// Used for post-init processing (incremental updates, uploads). pub fn deduplicate_by_global_hash(messages: Vec<ConversationMessage>) -> Vec<ConversationMessage> { - use std::collections::HashSet; - let mut seen: HashSet<String> = HashSet::with_capacity(messages.len() / 2); messages .into_iter() .filter(|msg| seen.insert(msg.global_hash.clone())) .collect() } /// Sequential deduplication by local_hash using HashSet. /// Messages without local_hash are always kept. /// Used for post-init processing (incremental updates, uploads). pub fn deduplicate_by_local_hash(messages: Vec<ConversationMessage>) -> Vec<ConversationMessage> { - use std::collections::HashSet; - let mut seen: HashSet<String> = HashSet::with_capacity(messages.len() / 2); messages .into_iter() .filter(|msg| {src/upload.rs (1)
304-312: Consider logging config load failures for debugging.The
Config::load().ok().flatten()?pattern silently discards any config loading errors. For background uploads this is likely intentional, but logging the error (whenupload_debugis enabled) could help diagnose configuration issues.🔎 Optional: Log config errors in debug mode
let upload_result = async { - let mut config = Config::load().ok().flatten()?; + let mut config = match Config::load() { + Ok(Some(c)) => c, + Ok(None) => return None, + Err(e) => { + if upload_debug_enabled() { + upload_debug_log(format!("[splitrail upload] config load error: {e:#}")); + } + return None; + } + }; if !config.is_configured() { return None; }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
src/analyzers/piebald.rssrc/main.rssrc/tui.rssrc/tui/logic.rssrc/types.rssrc/upload.rssrc/upload/tests.rssrc/utils.rssrc/version_check.rssrc/watcher.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/main.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Follow Rust 2024 edition conventions
Useanyhow::Resultfor error handling in Rust code
Preferasync/awaitover raw futures in Rust code
Useparking_lotlocks overstd::syncfor performance in Rust code
Files:
src/utils.rssrc/tui.rssrc/tui/logic.rssrc/watcher.rssrc/upload/tests.rssrc/upload.rssrc/analyzers/piebald.rssrc/version_check.rssrc/types.rs
src/analyzers/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
src/analyzers/*.rs: Implement theAnalyzertrait for new analyzers in the pluggable architecture system
Place individual analyzer implementations insrc/analyzers/directory with Analyzer trait implementation
Analyzers should discover data files using platform-specific paths in their implementation
Files:
src/analyzers/piebald.rs
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code
Applied to files:
src/utils.rssrc/watcher.rssrc/upload/tests.rssrc/upload.rssrc/version_check.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Use hash-based deduplication with global hash field for message deduplication
Applied to files:
src/utils.rs
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.
Applied to files:
src/utils.rssrc/tui.rssrc/tui/logic.rssrc/watcher.rssrc/upload/tests.rssrc/upload.rssrc/analyzers/piebald.rssrc/version_check.rssrc/types.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Applied to files:
src/tui.rssrc/analyzers/piebald.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Prefer `async/await` over raw futures in Rust code
Applied to files:
src/watcher.rssrc/upload.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation
Applied to files:
src/watcher.rssrc/analyzers/piebald.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `anyhow::Result` for error handling in Rust code
Applied to files:
src/version_check.rs
🧬 Code graph analysis (5)
src/utils.rs (1)
src/types.rs (2)
from_local(51-59)from_str(63-65)
src/tui/logic.rs (1)
src/types.rs (3)
intern_model(25-27)from_local(51-59)from(428-437)
src/watcher.rs (2)
src/config.rs (1)
load(82-94)src/upload.rs (2)
perform_background_upload_messages(291-333)perform_background_upload_messages(368-368)
src/upload.rs (3)
src/watcher.rs (1)
set_upload_status(154-156)src/config.rs (1)
load(82-94)src/utils.rs (1)
get_messages_later_than(230-242)
src/types.rs (2)
src/tui/logic.rs (1)
aggregate_sessions_from_messages(131-180)src/utils.rs (1)
aggregate_by_date(107-227)
🪛 GitHub Actions: checks
src/tui/logic.rs
[error] 1-1: cargo fmt --check failed due to formatting differences: import order changed (intern_model moved after other items).
src/types.rs
[error] 1-1: cargo fmt --check failed due to formatting differences: comment alignment/spacing changed around model reference counts subtraction.
🔇 Additional comments (29)
src/version_check.rs (2)
2-4: LGTM! Parking_lot migration follows guidelines.The switch from
std::sync::Mutextoparking_lot::Mutexaligns with the coding guidelines and PR objectives for improved performance and non-poisoning semantics. KeepingArcfromstd::syncis correct sinceparking_lotdoesn't provide its ownArc.As per coding guidelines: "Use
parking_lotlocks overstd::syncfor performance in Rust code."
95-106: Simplified locking is correct for parking_lot.The direct assignment pattern
*status_clone.lock() = result;(line 102) is cleaner and correct forparking_lot::Mutex, which returns a guard directly rather than aResultand doesn't poison on panic.Tokio is appropriately used here.
The function uses
tokio::spawnto run the version check in the background. Since this is called frommain()(src/main.rs:238) which has the#[tokio::main]attribute, the tokio runtime is active. Additionally, this background task performs network I/O (HTTP API calls to GitHub), which is inherently async and an appropriate use case for tokio—unlike file I/O operations in TUI code, which should use blocking threadpools.src/watcher.rs (2)
4-4: LGTM: Correct use of parking_lot and view types.The import of
parking_lot::Mutexaligns with coding guidelines for better lock performance, andMultiAnalyzerStatsViewsupports the new view-based architecture for incremental updates.Also applies to: 15-15
135-152: LGTM: Synchronous initialization with rayon parallelism.The constructor correctly uses synchronous parallel I/O via rayon instead of async/tokio, aligning with the TUI's blocking I/O design. The doc comment appropriately notes the rayon context requirement.
Based on learnings: In TUI code, prefer blocking I/O with rayon over tokio async/await.
src/types.rs (5)
14-33: LGTM: Efficient model name interning.The global thread-safe interner using
lasso::ThreadedRodeosignificantly reduces memory usage by replacing 24-byte Strings with 4-byte Spurs for model names that repeat across thousands of sessions. The inline annotations are appropriate for these hot-path functions.
35-124: LGTM: Compact date representation.The 4-byte
CompactDaterepresentation (year: u16, month: u8, day: u8) is an effective memory optimization. The use of checked arithmetic prevents overflow, and the serialization/ordering implementations are correct.
125-145: LGTM: Correct model reference counting in SessionAggregate.The use of
BTreeMap<Spur, u32>for models correctly implements reference counting, addressing the previous concern about incorrect model removal when multiple files contribute to the same session. TheArc<str>foranalyzer_nameis also appropriately shared across sessions.
556-641: LGTM: Correct incremental update implementation with model reference counting.The
add_contributionandsubtract_contributionmethods correctly implement reference counting for models usingBTreeMap<Spur, u32>. This prevents the incorrect model removal issue identified in previous reviews, where models shared across multiple files would be prematurely removed during incremental updates.The implementation correctly:
- Increments counts on add (line 580)
- Decrements counts on subtract with removal at zero (lines 627-631)
- Maintains invariant:
(add file1) + (add file2) - (subtract file1) = (add file2)
1-1: Fix cargo fmt issue.The pipeline reports formatting differences. Please run
cargo fmtto resolve the formatting issue around comment alignment/spacing.⛔ Skipped due to learnings
Learnt from: CR Repo: Piebald-AI/splitrail PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-01T00:15:32.862Z Learning: Run post-change verification commands: cargo build, cargo test, cargo clippy, cargo doc, and cargo fmtsrc/upload/tests.rs (1)
7-7: LGTM: Correct parking_lot::Mutex usage in tests.The migration from
std::sync::Mutextoparking_lot::Mutexis correct. Sinceparking_lotlocks don't poison, the removal of.unwrap()calls onlock()is appropriate. This aligns with the project's coding guidelines for usingparking_lotoverstd::syncfor performance.As per coding guidelines: Use
parking_lotlocks overstd::syncfor performance in Rust code.Also applies to: 146-146, 153-153, 311-311, 335-335, 379-379
src/tui.rs (4)
6-8: LGTM: View-based TUI architecture.The function signature and imports correctly reflect the shift to view-based data flow. Using
watch::Receiver<MultiAnalyzerStatsView>allows the TUI to receive pre-computed aggregates instead of raw messages, reducing memory usage and enabling incremental updates.Also applies to: 20-20, 98-105
186-220: LGTM: Efficient shared view access with RwLock.The use of
SharedAnalyzerView(Arc<RwLock>) enables efficient shared access:
- Cloning the Arc (lines 191, 213) is cheap (just pointer increment)
- Read locks (e.g., lines 329, 346, 376) allow concurrent reads
- Locks are properly scoped and released
This pattern is consistent throughout the TUI rendering code and enables lock-free cloning of the view for UI updates.
Also applies to: 329-361, 376-420
91-91: LGTM: Consistent CompactDate usage for filtering.The
CompactDatetype is used consistently throughout for day filters:
- Stored in
Option<CompactDate>for optional filtering- Parsed from strings via
CompactDate::from_str- Compared directly via
==for session filtering- Displayed via
to_string()As a Copy type (4 bytes), it's efficient to pass around without cloning overhead.
Also applies to: 162-162, 467-467, 683-691, 1487-1488, 1899-1900, 1982-1985
710-717: LGTM: Correct view rendering with scoped lock acquisition.The draw functions correctly:
- Accept view types (
&[SharedAnalyzerView],&AnalyzerStatsView,&[SessionAggregate])- Acquire read locks only for the duration needed (e.g., lines 1894-1925)
- Use
resolve_modelto convert internedSpurkeys back to model names- Keep lock scopes minimal to avoid holding locks during rendering
This design enables responsive UI updates while safely sharing data between the watcher and TUI threads.
Also applies to: 987-995, 1460-1469, 1878-1925
src/analyzers/piebald.rs (4)
245-250: LGTM: Synchronous per-source parsing.The
parse_sourceimplementation correctly performs synchronous parsing of a single data source, opening the SQLite database, querying data, and converting toConversationMessageformat. This aligns with the new per-source parsing pattern in the Analyzer trait.Based on learnings: Implement the
Analyzertrait for new analyzers in the pluggable architecture system.
260-266: LGTM: Platform-specific watch directory discovery.The
get_watch_directoriesimplementation correctly discovers the Piebald data directory using the platform-specificdirs::data_dir()and only returns it if the directory exists. This aligns with the coding guideline for analyzers to discover data files using platform-specific paths.As per coding guidelines: Analyzers should discover data files using platform-specific paths in their implementation.
268-271: LGTM: Correct data path validation.The
is_valid_data_pathimplementation correctly validates that the path is a file named exactly "app.db", which is appropriate for Piebald's SQLite database. The use ofis_some_andcleanly chains the validation checks.
291-297: LGTM: Test updated for new API.The test correctly validates the empty sources case using the new
get_stats_with_sourcesAPI and appropriately asserts that the result contains no messages.src/utils.rs (3)
2-7: Good adoption ofparking_lot::Mutex.The switch from
std::sync::Mutextoparking_lot::Mutexaligns with the coding guidelines. TheOnceLockfromstd::syncis appropriate here sinceparking_lotdoesn't provide an equivalent.
106-172: Aggregation logic looks correct.The use of
CompactDate::from_localfor date initialization and the sequential aggregation approach align with the PR's memory optimization goals. The token accumulation usingsaturating_addwith u32 casts is acceptable per the author's analysis in the previous review.
33-35: Nice ergonomic improvement.Accepting
impl Into<u64>allows bothu32andu64values without explicit conversion at call sites, which pairs well with the newTuiStatsfields that useu32.src/tui/logic.rs (4)
8-19: Accumulation logic is correct.The function cleanly accumulates only the 6 TUI-relevant fields with
saturating_addfor safe overflow handling. Theadd_costmethod encapsulates cost accumulation properly.
108-124: Clean helper functions for data availability checks.The separation between
has_data_view(direct check) andhas_data_shared(acquires read lock first) provides a clear API for different access patterns.
126-180: Well-optimized session aggregation.Good use of memory-efficient patterns:
Arc<str>avoids per-session String allocationor_insert_with_keyminimizes redundant cloningintern_modelfor compact model representationshrink_to_fit()releases excess capacityThe comments explaining the cloning strategy are helpful.
182-210: Tests cover the primary cases.The tests validate both positive and negative cases for
has_data_view. Consider adding a test case wherenum_conversations == 0butdaily_statshas non-zero values to verify the second condition, though the current coverage is reasonable.src/upload.rs (4)
7-9: Good use ofparking_lot::Mutex.The switch to
parking_lot::Mutexaligns with the coding guidelines. TheOnceLockfromstd::syncis appropriate since there's noparking_lotequivalent.
234-266: Clean helper abstractions.The
set_upload_statushelper provides atomic status updates, andmake_progress_callbacknicely preserves thedotsanimation state when updating progress counts.
268-286: Status lifecycle management looks correct.The 3-second delay before hiding success status provides user feedback, and error states remain visible until explicitly cleared. The
Nonecase correctly handles no-op scenarios.
335-370: Clean delegation to the messages-based upload.The separation between
perform_background_upload(startup, filters by timestamp) andperform_background_upload_messages(direct messages, supports callback) provides good flexibility. The::<fn()>turbofish is necessary to satisfy type inference when passingNonefor the callback.
Optimize memory usage for model reference counting in SessionAggregate: - Add tinyvec dependency for inline storage - Create ModelCounts struct wrapping TinyVec<[(Spur, u32); 3]> - Provides clean API: increment(), decrement(), get(), iter() - Inline storage for up to 3 models (typical case), spills to heap if more - Reduces per-session overhead from BTreeMap allocations to ~32 bytes inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/types.rs (1)
405-446: Potential negative cost from floating-point subtraction.As flagged in a previous review, line 415 uses direct
f64subtraction forcost, which could produce small negative values due to floating-point precision when subtracting nearly-equal costs. The integer fields correctly usesaturating_sub.Cargo.toml (1)
58-60: Add root certificate store and consider additional reqwest features.Following up on the previous review: while the
"rustls"feature is correct, the configuration is incomplete:
Missing root certificates: Without
rustls-native-certsorwebpki-roots, reqwest cannot validate TLS certificates unless.danger_accept_invalid_certs(true)is used (which disables security). Add one of these dependencies:webpki-roots = "0.26" # or rustls-native-certs = "0.8"Missing JSON feature: The codebase makes API calls to splitrail.dev and api.github.com. Consider adding the
"json"feature for convenient request/response body handling:features = ["rustls", "json"]#!/bin/bash # Description: Verify reqwest usage patterns and check for certificate validation bypass # Check for danger_accept_invalid_certs usage echo "=== Checking for certificate validation bypass ===" rg -n "danger_accept_invalid_certs" --type rust -C 3 # Check for reqwest Client creation echo -e "\n=== Checking reqwest Client initialization ===" rg -nP "reqwest::Client" --type rust -C 5 # Check for JSON body usage echo -e "\n=== Checking for JSON request/response usage ===" rg -nP "\.json\(|\.body\(.*json" --type rust -C 2 | head -30
🧹 Nitpick comments (1)
src/tui.rs (1)
2049-2055: Redundant truncate after resize.
Vec::resizealready handles both growing and shrinking. When*filtered_count < old_len, theresizecall on line 2051 will truncate, making the subsequenttruncateon line 2053 a no-op.🔎 Proposed simplification
fn update_day_filters(filters: &mut Vec<Option<CompactDate>>, filtered_count: &usize) { - let old_len = filters.len(); filters.resize(*filtered_count, None); - if *filtered_count < old_len { - filters.truncate(*filtered_count); - } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
Cargo.tomlsrc/tui.rssrc/tui/logic.rssrc/types.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Follow Rust 2024 edition conventions
Useanyhow::Resultfor error handling in Rust code
Preferasync/awaitover raw futures in Rust code
Useparking_lotlocks overstd::syncfor performance in Rust code
Files:
src/tui.rssrc/types.rssrc/tui/logic.rs
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.
Applied to files:
src/tui.rssrc/types.rssrc/tui/logic.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Applied to files:
src/tui.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Follow Rust 2024 edition conventions
Applied to files:
Cargo.toml
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code
Applied to files:
Cargo.toml
🧬 Code graph analysis (2)
src/tui.rs (3)
src/types.rs (4)
resolve_model(32-34)from_str(64-66)from(485-494)cost(465-467)src/tui/logic.rs (1)
has_data_shared(122-124)src/utils.rs (1)
format_date_for_display(82-104)
src/types.rs (2)
src/tui/logic.rs (1)
aggregate_sessions_from_messages(131-179)src/utils.rs (1)
aggregate_by_date(107-227)
🔇 Additional comments (26)
src/tui.rs (6)
56-83: LGTM - Clean zero-allocation iterator abstraction.The
EitherIterimplementation correctly delegatesIteratormethods to the underlying variant, providing propersize_hintfor both branches.
98-145: LGTM - Appropriate async/sync boundary for TUI.The
block_in_place+block_onpattern correctly bridges the async runtime with the blocking TUI event loop. The spawned task for watcher events is appropriate since it handles channel reception rather than heavy I/O. Based on learnings, this approach aligns with the preference for minimal async in TUI paths.
175-182: Good lock hygiene pattern.Acquiring the lock, extracting what's needed, and releasing immediately prevents contention. This pattern is consistently applied throughout the file.
1062-1089: LGTM - Safe accumulation with appropriate types.Using
u64for totals when summingu32daily values prevents overflow. The explicit casts (as u64) are correct.
1459-1468: LGTM - Session stats table implementation.The function follows the same safe patterns as daily stats, with proper type widening for accumulation and correct model resolution via
resolve_model.
1881-1928: LGTM - Summary stats with sequential lock acquisition.The function correctly acquires read locks one at a time per analyzer view, preventing deadlock. The accumulation pattern is safe with u64 totals.
src/tui/logic.rs (5)
1-6: LGTM - Imports properly ordered.The imports follow Rust conventions with crate imports followed by std imports. The re-export of
SessionAggregateprovides a clean API surface.
8-19: LGTM - Safe accumulation with saturating arithmetic.The function correctly uses
saturating_addto prevent overflow. Theas u32casts fromu64are acceptable here since per-message token counts won't exceedu32::MAXin practice.
108-124: LGTM - Clean data presence check with minimal lock duration.
has_data_sharedcorrectly delegates tohas_data_viewwhile holding the read lock only for the duration of the check.
126-179: LGTM - Efficient session aggregation.Good use of
or_insert_with_keyto minimize cloning. The model reference counting viaModelCounts::incrementcorrectly tracks model occurrences per session, enabling accurate incremental update subtraction.
181-209: LGTM - Basic test coverage for has_data_view.The tests verify the core behavior. Consider adding a test case for
num_conversations = 0with non-emptydaily_statscontaining non-zero values to fully exercise theany()branch.src/types.rs (8)
15-34: LGTM - Efficient model name interning.
ThreadedRodeowithLazyLockprovides thread-safe, lock-free reads after initial interning. The 4-byteSpurkeys significantly reduce memory compared to repeatedStringallocations for model names.
40-91: LGTM - Compact date representation.The 4-byte struct with checked parsing arithmetic is efficient. Per past discussion, range validation for month/day is intentionally omitted since input comes from coding tools with well-formed dates.
130-181: LGTM - Compact reference-counted model tracking.The
TinyVecwith inline capacity for 3 models is well-tuned for the common case. Thedecrementmethod correctly uses saturating subtraction and removes zero-count entries, fixing the previous model removal bug identified in past reviews.
448-517: LGTM - Efficient TUI-specific stats with safe arithmetic.Storing cost as integer cents (
u32) avoids floating-point precision issues. All arithmetic uses saturating operations, and the 24-byte struct is significantly smaller than the fullStats(320 bytes).
268-295: LGTM - Reference-counted model tracking in DailyStats.The
SubAssignimplementation correctly decrements model counts and removes entries when count reaches zero, maintaining accurate incremental update semantics.
613-693: LGTM - Correct incremental update implementation.The
add_contributionandsubtract_contributionmethods correctly handle model reference counting viaModelCounts::increment/decrement, addressing the previous review concern about incorrect model removal during partial updates.
594-611: LGTM - Clean FileContribution construction.The method correctly composes existing aggregation functions and filters out invalid "unknown" dates. Using
session_aggregates.len()asconversation_countis accurate since each session represents one conversation.
824-965: LGTM - Comprehensive model reference counting tests.The test suite thoroughly verifies the incremental update semantics, including partial subtraction scenarios that would have exposed the previous bug. Good coverage of edge cases like multiple files contributing the same model to the same session.
Cargo.toml (7)
8-12: LGTM! Clear allocator feature configuration.The feature flag structure correctly makes mimalloc optional, and the comment provides helpful guidance for memory profiling scenarios.
24-24: Good choice for memory optimization.Adding
lassowith themulti-threadedfeature aligns well with the PR's goal of usingSpurinterning for model names. This should reduce memory allocations for repeated strings while remaining thread-safe with rayon.
20-20: Migration from jwalk to walkdir is complete and correctly implemented.The codebase has been fully migrated with no remaining jwalk references. Parallel processing is appropriately handled via rayon at the file parsing level (as seen in
src/watcher.rs,src/main.rs, andsrc/analyzer.rs), which aligns with the PR's optimization goals. The sequential walkdir traversal is suitable for the flat directory structures used throughout the analyzer data directories.
15-16: No action needed. The versions are compatible—mimalloc 0.1.48 requires libmimalloc-sys^0.1.44, which includes 0.1.44, so this pairing is correct.Likely an incorrect or invalid review comment.
17-17: Remove non-standard metadata tags and verify dependency claims with git context.The dependencies listed (serde 1.0.228, toml 0.9.10, notify 8.2, phf 0.13.1, serde_bytes 0.11.19, simd-json 0.17.0, tiktoken-rs 0.9.1, parking_lot 0.12, tinyvec 1.8, rusqlite 0.38.0, rmcp 0.12.0, schemars 1.2, clap 4.5.53, tempfile 3.24) exist in Cargo.toml. However, the original comment includes review tool metadata tags (
,) that should not appear in final review output. Additionally, specific version transition claims (e.g., "simd-json from 0.15 to 0.17") cannot be verified without git diff context showing actual before/after changes. Include git evidence of the actual version changes made by this PR before finalizing the review assessment.
29-29: Ratatui 0.30.0 is compatible with the codebase.The TUI code uses standard, stable APIs that remain unchanged in 0.30.0:
- Backend: Uses built-in
CrosstermBackend(not affected by Backend trait changes)- Frame rendering: Standard
render_widget()andrender_stateful_widget()calls (unchanged)- Widgets: Block, Cell, Paragraph, Row, Table, TableState, Tabs, Wrap are all compatible (none had breaking changes)
- Style: Basic Color/Modifier/Style usage is unaffected by const style changes
- Tests: No TestBackend usage; thus error type changes don't apply
No compatibility issues detected.
43-43: Clarify bincode 2.0.1 dependency purpose.
bincode 2.0.1is listed in Cargo.toml but no usage was found in the codebase. If this dependency is not yet integrated, defer this review until the code consuming bincode is added. If bincode is unused, remove it from the dependency list to reduce maintenance overhead.Likely an incorrect or invalid review comment.
The previous implementation tried to reconstruct which messages came from which source file using conversation_hash, assuming it equaled hash_text(path). This failed for analyzers that compute conversation_hash differently (Cline, RooCode, KiloCode, OpenCode, Copilot, Piebald). Fix by adding parse_sources_parallel_with_paths which returns messages grouped by source path, preserving the association naturally during parallel parsing. Contribution computation is also parallelized across sources within each analyzer.
|
I still have a small part to rethink. RAM will go up a bit again; because I want to remain correct/accurate; while also being accommodating to other solutions. Drafting. |
…dates Commit 5a3949e fixed file contribution caching but caused ~1GB RAM increase for users with many files (e.g., 90,000 OpenCode message files). The fix correctly cached contributions for all analyzers, but used a heavyweight FileContribution struct (~100+ bytes/file) even for single-message-per-file analyzers. This implements a three-tier contribution caching system based on analyzer data structure: 1. SingleMessage (~40 bytes): For 1-file-1-message analyzers (OpenCode) - Uses CompactMessageStats with u16 cost, u8 tool_calls - Expected savings: 90,000 files × (100 - 40) = ~5.4 MB 2. SingleSession (~72 bytes): For 1-file-1-session analyzers (most others) - Stores aggregated stats per file without individual messages - Expected savings: 90,000 files × (100 - 72) = ~2.5 MB 3. MultiSession (~100+ bytes): For all-in-one-file analyzers (Piebald) - Full MultiSessionContribution with Vec<ConversationMessage> Each analyzer now explicitly declares its strategy via the required contribution_strategy() trait method (no default implementation).
Split the monolithic contribution_cache.rs (571 lines) into a proper module structure for better organization and maintainability: - mod.rs: Common types (PathHash, ContributionStrategy, CompactMessageStats), ContributionCache wrapper, RemovedContribution enum, and AnalyzerStatsView extension methods - single_message.rs: SingleMessageContribution (~40 bytes per file) - single_session.rs: SingleSessionContribution (~72 bytes per file) - multi_session.rs: MultiSessionContribution (~100+ bytes per file) No functional changes - pure refactoring with all re-exports preserved.
- Reorganize tests from single tests.rs into tests/ directory - Group tests by contribution strategy: single_message, single_session, multi_session - Separate compact_stats and basic_operations tests - Move misplaced edge case tests to their proper strategy files - Fix misleading docstrings that claimed to simulate reload_file_incremental
Replace raw u64 session_hash fields with a proper SessionHash wrapper type, following the same pattern as PathHash for type safety and clarity.
Move model string interning into dedicated src/cache/ module with a ModelKey newtype wrapper around MiniSpur. This reduces memory per model reference from 4 bytes (Spur) to 2 bytes (MiniSpur), which is sufficient for <65536 unique model names.
Use c2rust-bitfields to pack stats and date into a 176-bit bitfield, achieving cache-aligned 32-byte struct size. Removes CompactMessageStats in favor of PackedStatsDate with compile-time size assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/watcher.rs (1)
519-519: Tautological assertion provides no coverage.The assertion
updated.analyzer_stats.is_empty() || !updated.analyzer_stats.is_empty()is always true and doesn't verify any meaningful behavior.Suggested fix
Consider asserting something meaningful, or remove the assertion and rely on the "doesn't panic" behavior being sufficient:
- assert!(updated.analyzer_stats.is_empty() || !updated.analyzer_stats.is_empty()); + // Test verifies handling doesn't panic - no specific assertion needed + let _ = updated.analyzer_stats.len();
♻️ Duplicate comments (2)
Cargo.toml (1)
58-61: Reqwest TLS configuration acknowledged.The
"rustls"feature name is correct for reqwest 0.13.1 (renamed from"rustls-tls"in the breaking change). As noted in previous review discussion, the TLS certificate validation configuration insrc/upload.rsshould be verified separately.src/types.rs (1)
396-437: Stats subtraction correctly uses saturating arithmetic for integers.All integer fields use
saturating_subto prevent underflow. Thecostfield (line 406) uses direct subtraction which could theoretically produce small negative values due to floating-point precision, but this was discussed in a previous review and deemed acceptable for practical usage.
🧹 Nitpick comments (12)
src/analyzers/gemini_cli.rs (1)
311-347: Consider extracting shared WalkDir logic.The
discover_data_sourcesandis_availablemethods share nearly identical WalkDir traversal logic. Whileis_availablecorrectly uses.any()for early exit (which is more efficient), consider extracting the common filtering predicate to reduce duplication.That said, this is a minor concern—the current implementation is clear and correct.
🔎 Potential refactor to reduce duplication
fn chat_json_entries() -> impl Iterator<Item = walkdir::DirEntry> { Self::data_dir() .filter(|d| d.is_dir()) .into_iter() .flat_map(|tmp_dir| WalkDir::new(tmp_dir).min_depth(3).max_depth(3).into_iter()) .filter_map(|e| e.ok()) .filter(|e| { e.file_type().is_file() && e.path().extension().is_some_and(|ext| ext == "json") && e.path() .parent() .and_then(|p| p.file_name()) .is_some_and(|name| name == "chats") }) }Then use
chat_json_entries().map(...).collect()indiscover_data_sourcesandchat_json_entries().next().is_some()inis_available.src/analyzers/qwen_code.rs (1)
346-352: Consider logging parse failures inparse_sources_parallel.The
unwrap_or_default()at line 349 silently drops parse errors. While graceful degradation is appropriate for robustness, consider adding debug-level logging so users can diagnose issues with malformed files.🔎 Suggested improvement
fn parse_sources_parallel(&self, sources: &[DataSource]) -> Vec<ConversationMessage> { let all_messages: Vec<ConversationMessage> = sources .par_iter() - .flat_map(|source| self.parse_source(source).unwrap_or_default()) + .flat_map(|source| { + self.parse_source(source).unwrap_or_else(|e| { + tracing::debug!("Failed to parse {:?}: {}", source.path, e); + Vec::new() + }) + }) .collect(); crate::utils::deduplicate_by_local_hash(all_messages) }src/main.rs (3)
217-219: Consider propagating threadpool creation errors.Using
expect()will panic if threadpool creation fails (e.g., due to system resource exhaustion). Consider propagating the error with?and handling it gracefully to provide better error messaging.🔎 Proposed refactor
- let mut stats_manager = { - let pool = rayon::ThreadPoolBuilder::new() - .build() - .expect("Failed to create rayon threadpool"); + let mut stats_manager = match (|| -> Result<_> { + let pool = rayon::ThreadPoolBuilder::new().build()?; + pool.install(|| watcher::RealtimeStatsManager::new(registry)) + })() { + Ok(manager) => manager, + Err(e) => { + eprintln!("Error loading analyzer stats: {e}"); + std::process::exit(1); + } + }; - - let result = pool.install(|| watcher::RealtimeStatsManager::new(registry)); - - // Pool is dropped here, releasing threads - match result { - Ok(manager) => manager, - Err(e) => { - eprintln!("Error loading analyzer stats: {e}"); - std::process::exit(1); - } - } - };
264-277: Consider simplifying configuration status logic.The nested conditionals for determining which configuration is missing are verbose. Consider extracting this into a helper function or using a more concise pattern.
🔎 Proposed refactor
} else { // Auto-upload is enabled but configuration is incomplete let mut status = upload_status.lock(); - if config.is_api_token_missing() && config.is_server_url_missing() { - *status = tui::UploadStatus::MissingConfig; - } else if config.is_api_token_missing() { - *status = tui::UploadStatus::MissingApiToken; - } else if config.is_server_url_missing() { - *status = tui::UploadStatus::MissingServerUrl; - } else { - // Shouldn't happen since is_configured() returned false - *status = tui::UploadStatus::MissingConfig; - } + *status = match (config.is_api_token_missing(), config.is_server_url_missing()) { + (true, true) => tui::UploadStatus::MissingConfig, + (true, false) => tui::UploadStatus::MissingApiToken, + (false, true) => tui::UploadStatus::MissingServerUrl, + (false, false) => tui::UploadStatus::MissingConfig, // Shouldn't happen + }; }
445-460: Expand the SAFETY comment for the FFI call.The SAFETY comment states that
mi_collectis a "safe FFI call" but could be more specific. FFI calls are inherently unsafe in Rust, and the comment should clarify what makes this particular call safe (e.g., no preconditions on state, thread-safe, no memory corruption risks).🔎 Proposed documentation improvement
#[cfg(feature = "mimalloc")] pub fn release_unused_memory() { - // SAFETY: mi_collect is a safe FFI call that triggers garbage collection - // and returns unused memory to the OS. The `force` parameter (true) ensures - // aggressive collection. + // SAFETY: mi_collect is a thread-safe C function with no preconditions on allocator state. + // It triggers garbage collection and returns unused memory to the OS without risk of + // memory corruption. The `force` parameter (true) ensures aggressive collection. unsafe { libmimalloc_sys::mi_collect(true); } }src/analyzers/opencode.rs (2)
480-498: Consider caching context for incremental updates.For single-file incremental updates,
parse_sourceloads the entire projects and sessions context fresh each time. While simple, this may be inefficient for frequent file changes. Consider caching this shared context at the analyzer level to avoid repeated I/O for metadata that rarely changes.Note: The current approach prioritizes simplicity and correctness. If incremental update performance becomes a concern in practice, a shared context cache could be added.
522-531: Consider logging parse errors in parallel processing.The
filter_mapsilently discards files that fail to parse. For debugging and monitoring, consider logging parse errors (e.g., usingwarn_onceor eprintln!) to help diagnose data corruption or format changes.🔎 Proposed improvement
sources .par_iter() .filter_map(|source| { let content = fs::read_to_string(&source.path).ok()?; let mut bytes = content.into_bytes(); - let msg = simd_json::from_slice::<OpenCodeMessage>(&mut bytes).ok()?; + let msg = match simd_json::from_slice::<OpenCodeMessage>(&mut bytes) { + Ok(msg) => msg, + Err(e) => { + eprintln!("Warning: Failed to parse OpenCode file {:?}: {}", source.path, e); + return None; + } + }; let conversation_msg = to_conversation_message(msg, &sessions, &projects, &part_root); Some((source.path.clone(), vec![conversation_msg])) }) .collect()src/analyzers/pi_agent.rs (1)
459-465: Consider logging parse errors in parallel processing.Similar to other analyzers, parse errors are silently discarded with
unwrap_or_default(). Consider logging errors for debugging purposes.🔎 Proposed improvement
fn parse_sources_parallel(&self, sources: &[DataSource]) -> Vec<ConversationMessage> { let all_messages: Vec<ConversationMessage> = sources .par_iter() - .flat_map(|source| self.parse_source(source).unwrap_or_default()) + .flat_map(|source| { + self.parse_source(source).unwrap_or_else(|e| { + eprintln!("Warning: Failed to parse Pi Agent file {:?}: {}", source.path, e); + Vec::new() + }) + }) .collect(); crate::utils::deduplicate_by_local_hash(all_messages) }src/watcher.rs (1)
244-310: Note: Tokio still used for upload despite learning to avoid it.Per retrieved learnings, Tokio should be avoided in TUI background I/O paths in favor of blocking calls on dedicated threads. The
tokio::spawnat line 298 contradicts this, though the async upload with reqwest makes this a reasonable interim choice.Consider migrating to
std::thread::spawnwith blocking reqwest if you proceed with removing Tokio from TUI code as mentioned in PR comments.src/contribution_cache/mod.rs (2)
43-48: Consider renamingfrom_strto avoid trait shadowing.
SessionHash::from_strshadows thestd::str::FromStrtrait method name. While not technically wrong, it could cause confusion. Considerfrom_session_idorhash_strfor clarity.
365-377: Empty sessions not removed after subtraction.Unlike daily_stats which are removed when empty (lines 356-361), session_aggregates entries with zeroed stats remain in the vector. This is likely intentional to avoid O(n) removal, but over time could accumulate stale entries.
If this becomes a concern, consider periodic compaction or removal during subtract:
// After subtracting, optionally remove truly empty sessions self.session_aggregates.retain(|s| !s.stats.is_zero());src/analyzer.rs (1)
933-937: Test relies on rayon's global threadpool.The test calls
load_all_stats_parallel()without an explicit rayon pool, relying on rayon's global pool. This works but contradicts the doc comment "Must be called within a rayon threadpool context." Consider clarifying that the global pool is acceptable, or useload_all_stats_parallel_scoped()in tests for consistency.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (28)
Cargo.tomlsrc/analyzer.rssrc/analyzers/claude_code.rssrc/analyzers/cline.rssrc/analyzers/codex_cli.rssrc/analyzers/copilot.rssrc/analyzers/gemini_cli.rssrc/analyzers/kilo_code.rssrc/analyzers/opencode.rssrc/analyzers/pi_agent.rssrc/analyzers/piebald.rssrc/analyzers/qwen_code.rssrc/analyzers/roo_code.rssrc/cache/mod.rssrc/cache/model_intern.rssrc/contribution_cache/mod.rssrc/contribution_cache/multi_session.rssrc/contribution_cache/single_message.rssrc/contribution_cache/single_session.rssrc/contribution_cache/tests/basic_operations.rssrc/contribution_cache/tests/compact_stats.rssrc/contribution_cache/tests/mod.rssrc/contribution_cache/tests/multi_session.rssrc/contribution_cache/tests/single_message.rssrc/contribution_cache/tests/single_session.rssrc/main.rssrc/types.rssrc/watcher.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/analyzers/cline.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Follow Rust 2024 edition conventions
Useanyhow::Resultfor error handling in Rust code
Preferasync/awaitover raw futures in Rust code
Useparking_lotlocks overstd::syncfor performance in Rust code
Files:
src/contribution_cache/tests/single_message.rssrc/contribution_cache/tests/basic_operations.rssrc/contribution_cache/single_session.rssrc/contribution_cache/tests/mod.rssrc/contribution_cache/tests/compact_stats.rssrc/cache/mod.rssrc/contribution_cache/tests/multi_session.rssrc/analyzers/copilot.rssrc/contribution_cache/single_message.rssrc/main.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/kilo_code.rssrc/contribution_cache/tests/single_session.rssrc/analyzers/claude_code.rssrc/contribution_cache/multi_session.rssrc/analyzers/opencode.rssrc/contribution_cache/mod.rssrc/analyzers/piebald.rssrc/analyzers/gemini_cli.rssrc/analyzers/roo_code.rssrc/analyzer.rssrc/watcher.rssrc/types.rssrc/cache/model_intern.rssrc/analyzers/codex_cli.rs
src/analyzers/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
src/analyzers/*.rs: Implement theAnalyzertrait for new analyzers in the pluggable architecture system
Place individual analyzer implementations insrc/analyzers/directory with Analyzer trait implementation
Analyzers should discover data files using platform-specific paths in their implementation
Files:
src/analyzers/copilot.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/kilo_code.rssrc/analyzers/claude_code.rssrc/analyzers/opencode.rssrc/analyzers/piebald.rssrc/analyzers/gemini_cli.rssrc/analyzers/roo_code.rssrc/analyzers/codex_cli.rs
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.
Applied to files:
src/contribution_cache/tests/single_message.rssrc/contribution_cache/tests/basic_operations.rssrc/contribution_cache/single_session.rssrc/contribution_cache/tests/mod.rssrc/contribution_cache/tests/compact_stats.rssrc/cache/mod.rssrc/contribution_cache/tests/multi_session.rssrc/analyzers/copilot.rssrc/contribution_cache/single_message.rssrc/main.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/kilo_code.rssrc/contribution_cache/tests/single_session.rssrc/analyzers/claude_code.rssrc/contribution_cache/multi_session.rssrc/analyzers/opencode.rssrc/contribution_cache/mod.rssrc/analyzers/piebald.rssrc/analyzers/gemini_cli.rssrc/analyzers/roo_code.rssrc/analyzer.rssrc/watcher.rssrc/types.rssrc/cache/model_intern.rssrc/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
Applied to files:
src/analyzers/copilot.rssrc/main.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/kilo_code.rssrc/analyzers/claude_code.rssrc/analyzers/opencode.rssrc/analyzers/gemini_cli.rssrc/analyzers/roo_code.rssrc/analyzer.rssrc/watcher.rssrc/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation
Applied to files:
src/analyzers/copilot.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/kilo_code.rssrc/analyzers/claude_code.rssrc/analyzers/opencode.rssrc/analyzers/piebald.rssrc/analyzers/gemini_cli.rssrc/analyzers/roo_code.rssrc/analyzer.rssrc/watcher.rssrc/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Applied to files:
src/analyzers/copilot.rssrc/main.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/kilo_code.rssrc/analyzers/claude_code.rssrc/contribution_cache/multi_session.rssrc/analyzers/opencode.rssrc/contribution_cache/mod.rssrc/analyzers/piebald.rssrc/analyzers/gemini_cli.rssrc/analyzers/roo_code.rssrc/analyzer.rssrc/watcher.rssrc/analyzers/codex_cli.rs
📚 Learning: 2026-01-02T01:31:46.893Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: .agents/PERFORMANCE.md:0-0
Timestamp: 2026-01-02T01:31:46.893Z
Learning: Splitrail uses `walkdir` (not `jwalk`) for directory traversal because most analyzer data directories have flat structures where parallel directory walking provides no benefit. Parallel processing is applied at the file parsing level via rayon instead, where the actual performance gains are realized.
Applied to files:
src/analyzers/copilot.rssrc/analyzers/qwen_code.rssrc/analyzers/pi_agent.rssrc/analyzers/claude_code.rssrc/analyzers/opencode.rssrc/analyzers/gemini_cli.rssrc/analyzer.rssrc/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code
Applied to files:
src/main.rsCargo.tomlsrc/watcher.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Use hash-based deduplication with global hash field for message deduplication
Applied to files:
src/analyzers/claude_code.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Follow Rust 2024 edition conventions
Applied to files:
Cargo.toml
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Prefer `async/await` over raw futures in Rust code
Applied to files:
src/watcher.rs
🧬 Code graph analysis (21)
src/contribution_cache/tests/single_message.rs (3)
src/contribution_cache/tests/mod.rs (2)
make_message(29-69)make_view_with_session(82-98)src/contribution_cache/single_message.rs (3)
from_message(155-161)date(165-167)hash_session_id(177-179)src/contribution_cache/mod.rs (2)
new(34-36)new(94-100)
src/contribution_cache/tests/basic_operations.rs (3)
src/contribution_cache/tests/mod.rs (1)
make_message(29-69)src/contribution_cache/mod.rs (5)
from_str(46-48)default(87-89)get_single_message(126-128)get_single_session(140-142)get_multi_session(154-156)src/contribution_cache/single_message.rs (1)
from_message(155-161)
src/contribution_cache/tests/mod.rs (3)
src/types.rs (2)
cost(456-458)from(476-485)src/contribution_cache/single_message.rs (1)
date(165-167)src/contribution_cache/mod.rs (4)
default(87-89)new(34-36)new(94-100)from_str(46-48)
src/contribution_cache/tests/compact_stats.rs (2)
src/contribution_cache/single_message.rs (2)
date(165-167)pack(84-105)src/types.rs (4)
from_parts(44-46)year(50-52)month(56-58)day(62-64)
src/cache/mod.rs (1)
src/cache/model_intern.rs (2)
intern_model(33-35)resolve_model(39-41)
src/contribution_cache/tests/multi_session.rs (1)
src/contribution_cache/tests/mod.rs (2)
make_empty_view(72-79)make_message(29-69)
src/analyzers/copilot.rs (2)
src/utils.rs (1)
hash_text(253-257)src/analyzers/cline.rs (4)
new(20-22)parse_source(312-314)is_valid_data_path(320-322)contribution_strategy(324-326)
src/contribution_cache/single_message.rs (3)
src/cache/model_intern.rs (1)
intern_model(33-35)src/types.rs (6)
from_parts(44-46)from_str(68-70)from_local(32-40)year(50-52)month(56-58)day(62-64)src/contribution_cache/mod.rs (2)
default(87-89)from_str(46-48)
src/main.rs (2)
src/watcher.rs (2)
new(34-69)new(137-152)src/upload.rs (1)
perform_background_upload(337-370)
src/analyzers/qwen_code.rs (6)
src/utils.rs (2)
hash_text(253-257)deduplicate_by_local_hash(279-293)src/analyzers/cline.rs (6)
simd_json(165-165)new(20-22)is_available(308-310)parse_source(312-314)is_valid_data_path(320-322)contribution_strategy(324-326)src/analyzers/codex_cli.rs (11)
simd_json(241-241)simd_json(251-251)simd_json(260-260)simd_json(296-296)simd_json(432-432)data_dir(27-29)new(23-25)is_available(66-75)parse_source(77-80)is_valid_data_path(97-100)contribution_strategy(102-104)src/analyzers/opencode.rs (10)
simd_json(219-219)simd_json(258-258)simd_json(296-296)simd_json(493-493)data_dir(26-28)new(22-24)is_available(464-478)parse_source(480-498)is_valid_data_path(593-605)contribution_strategy(608-610)src/analyzers/gemini_cli.rs (7)
data_dir(22-24)new(18-20)is_available(333-347)parse_source(349-351)parse_json_session_file(201-292)is_valid_data_path(368-376)contribution_strategy(378-380)src/analyzers/piebald.rs (4)
new(21-23)parse_source(246-251)is_valid_data_path(269-272)contribution_strategy(275-277)
src/analyzers/pi_agent.rs (3)
src/utils.rs (1)
deduplicate_by_local_hash(279-293)src/analyzers/claude_code.rs (4)
data_dir(33-35)parse_source(75-82)parse_jsonl_file(522-685)is_valid_data_path(240-251)src/analyzers/qwen_code.rs (3)
data_dir(22-24)parse_source(342-344)is_valid_data_path(361-369)
src/analyzers/kilo_code.rs (5)
src/analyzers/cline.rs (5)
is_available(308-310)parse_source(312-314)get_watch_directories(316-318)is_valid_data_path(320-322)contribution_strategy(324-326)src/analyzers/codex_cli.rs (5)
is_available(66-75)parse_source(77-80)get_watch_directories(90-95)is_valid_data_path(97-100)contribution_strategy(102-104)src/analyzers/roo_code.rs (5)
is_available(332-334)parse_source(336-338)get_watch_directories(340-342)is_valid_data_path(344-346)contribution_strategy(348-350)src/watcher.rs (4)
is_available(381-383)parse_source(373-375)get_watch_directories(385-387)contribution_strategy(389-391)src/analyzers/piebald.rs (4)
parse_source(246-251)get_watch_directories(261-267)is_valid_data_path(269-272)contribution_strategy(275-277)
src/contribution_cache/tests/single_session.rs (5)
src/contribution_cache/tests/mod.rs (2)
make_message(29-69)make_view_with_session(82-98)src/contribution_cache/single_session.rs (1)
from_messages(29-56)src/contribution_cache/single_message.rs (1)
date(165-167)src/cache/model_intern.rs (1)
intern_model(33-35)src/types.rs (2)
get(168-170)from(476-485)
src/contribution_cache/multi_session.rs (2)
src/tui/logic.rs (1)
aggregate_sessions_from_messages(131-179)src/utils.rs (1)
aggregate_by_date(107-227)
src/contribution_cache/mod.rs (3)
src/analyzer.rs (2)
new(297-305)default(290-292)src/types.rs (2)
new(143-145)from_str(68-70)src/contribution_cache/single_message.rs (2)
date(165-167)hash_session_id(177-179)
src/analyzers/roo_code.rs (5)
src/analyzer.rs (14)
vscode_extension_has_sources(108-118)is_available(231-234)is_available(846-848)is_available(1004-1006)parse_source(173-173)parse_source(821-823)parse_source(991-993)get_watch_directories(219-219)get_watch_directories(850-856)get_watch_directories(1008-1010)is_valid_data_path(224-226)contribution_strategy(240-240)contribution_strategy(858-860)contribution_strategy(1012-1014)src/analyzers/cline.rs (5)
is_available(308-310)parse_source(312-314)get_watch_directories(316-318)is_valid_data_path(320-322)contribution_strategy(324-326)src/analyzers/kilo_code.rs (5)
is_available(303-305)parse_source(307-309)get_watch_directories(311-313)is_valid_data_path(315-317)contribution_strategy(319-321)src/watcher.rs (4)
is_available(381-383)parse_source(373-375)get_watch_directories(385-387)contribution_strategy(389-391)src/analyzers/piebald.rs (4)
parse_source(246-251)get_watch_directories(261-267)is_valid_data_path(269-272)contribution_strategy(275-277)
src/analyzer.rs (4)
src/contribution_cache/mod.rs (2)
new(34-36)new(94-100)src/utils.rs (1)
deduplicate_by_global_hash(266-274)src/contribution_cache/single_message.rs (2)
date(165-167)from_message(155-161)src/contribution_cache/single_session.rs (1)
from_messages(29-56)
src/watcher.rs (2)
src/config.rs (1)
load(82-94)src/upload.rs (2)
perform_background_upload_messages(291-333)perform_background_upload_messages(368-368)
src/types.rs (3)
src/tui/logic.rs (1)
aggregate_sessions_from_messages(131-179)src/cache/model_intern.rs (2)
intern_model(33-35)resolve_model(39-41)src/mcp/types.rs (1)
new(118-135)
src/cache/model_intern.rs (1)
src/types.rs (1)
new(143-145)
src/analyzers/codex_cli.rs (3)
src/analyzers/cline.rs (5)
new(20-22)is_available(308-310)parse_source(312-314)is_valid_data_path(320-322)contribution_strategy(324-326)src/analyzers/copilot.rs (5)
new(28-30)is_available(473-486)parse_source(488-490)is_valid_data_path(496-504)contribution_strategy(506-508)src/analyzers/piebald.rs (5)
new(21-23)parse_source(246-251)parse_sources_parallel(253-259)is_valid_data_path(269-272)contribution_strategy(275-277)
🔇 Additional comments (72)
src/contribution_cache/single_session.rs (1)
27-56: Well-structured contribution aggregation.The
from_messagesimplementation correctly:
- Uses the first message for date and session_hash initialization
- Only counts AI messages (those with a model) for ai_message_count
- Properly interns model names for memory efficiency
- Accumulates stats using the
TuiStats::from(&msg.stats)conversionNote: An empty
messagesslice returns a contribution with default values (zeroed stats, default date/hash). This appears intentional for the contribution cache semantics.src/analyzers/roo_code.rs (1)
332-350: Consistent implementation following the established analyzer pattern.The new trait methods align with the architectural shift to per-source parsing:
is_availableuses the sharedvscode_extension_has_sourceshelper (consistent with kilo_code.rs, cline.rs)parse_sourceproperly delegates to the existingparse_roo_code_task_directoryfunctionis_valid_data_pathcorrectly validatesui_messages.jsonfilescontribution_strategyreturnsSingleSession, appropriate for task-directory-based analyzers where each directory represents one sessionAs per coding guidelines, this properly implements the
Analyzertrait for the pluggable architecture system.src/analyzers/piebald.rs (2)
246-277: Solid per-source parsing implementation for SQLite-based analyzer.The implementation correctly:
- Opens the database read-only with proper flags in
parse_source- Uses rayon's
par_iterfor parallel source processing- Applies
deduplicate_by_local_hashafter flattening (Piebald setslocal_hashfrommsg.id)- Returns
MultiSessioncontribution strategy since the SQLite database contains all sessions in one file- Validates
app.dbspecifically inis_valid_data_pathThe
get_watch_directoriesdefensively filters to only existing directories, which is appropriate.
297-303: Test updated for new API surface.The test correctly uses
get_stats_with_sources(Vec::new())and asserts onmessages.is_empty(), aligning with the per-source API changes.src/analyzers/gemini_cli.rs (2)
368-380: Trait implementations look correct.
is_valid_data_pathproperly validates.jsonfiles in achatsdirectorycontribution_strategyreturnsSingleSession, appropriate since each JSON file represents one session
353-359: Verify deduplication behavior withNonelocal_hash.The
parse_sources_parallelmethod callsdeduplicate_by_local_hash, butparse_json_session_filesetslocal_hash: Nonefor all messages. Confirm whetherdeduplicate_by_local_hashcan effectively deduplicate messages whenlocal_hashisNone, or if a different deduplication strategy (such as using a global hash field as suggested in the architecture) should be used instead.Cargo.toml (1)
8-16: Mimalloc integration for memory optimization.Good addition of the mimalloc allocator as a default feature with the ability to disable for profiling (
--no-default-features). Thelibmimalloc-syswith"extended"feature enablesmi_collectfor therelease_unused_memory()calls mentioned in the PR objectives.src/types.rs (5)
22-95: Well-designed compact date representation.
CompactDateefficiently packs date into 4 bytes (u16 year + u8 month + u8 day) with:
- Proper
from_localconversion using chrono's local timezone- Manual parsing with checked arithmetic to prevent overflow
- Correct
Ord/PartialOrdimplementation via tuple comparison- Proper
Serialize/Deserializethrough string representationPrevious review suggested adding month/day range validation, which was declined since dates come from trusted coding tool sources.
134-185: Efficient inline model reference counting.
ModelCountsusingTinyVec<[(ModelKey, u32); 3]>is a smart optimization:
- Inline storage for up to 3 models (common case) avoids heap allocation
- Spills to heap only when needed
increment/decrementwith proper cleanup (retain on non-zero counts)decrementusessaturating_subto prevent underflow
439-508: Lightweight TuiStats for memory efficiency.Good design choices:
- 24 bytes vs 320 bytes for full
Statscost_centsasu32avoids f32 precision issues while enabling integer arithmeticsaturating_add/saturating_subin arithmetic implementationsFrom<&Stats>conversion rounds cost to centsThe
u32token fields provide sufficient capacity (~4B tokens) for per-session/per-day aggregates.
544-596: View-based architecture enables efficient TUI updates.The
SharedAnalyzerViewpattern (Arc<RwLock<AnalyzerStatsView>>) is well-suited for the incremental update model:
Arcenables cheap cloning for channel communicationRwLock(fromparking_lotper coding guidelines) allows concurrent reads with exclusive writesinto_viewconsumes ownership and drops messages, reducing memory footprintThe
aggregate_sessions_from_messagescall during conversion pre-computes session aggregates for rendering.
718-867: Comprehensive tests for model reference counting.The tests properly verify:
DailyStatsmodel add/subtract with removal at zero countSessionAggregatemodel ref counting throughadd_multi_session_contribution/subtract_multi_session_contribution- Partial subtraction preserving remaining contributions
- Multiple models per session
These tests address the correctness concern raised in previous reviews (now fixed in commit b019197).
src/cache/mod.rs (1)
1-5: Clean module organization for caching utilities.Good encapsulation: the
model_internimplementation details remain private while exposing only the necessary public API (ModelKey,intern_model,resolve_model). This allows internal refactoring without breaking the public interface.src/contribution_cache/tests/basic_operations.rs (6)
15-44: LGTM: Comprehensive basic insert/get test.The test correctly validates the round-trip behavior of inserting and retrieving a SingleMessageContribution, including stats extraction and session hash verification.
46-66: LGTM: Clean SingleSession insert/get validation.The test properly exercises the SingleSessionContribution path and validates the retrieved data.
68-85: LGTM: MultiSession insert/get test is sound.The test correctly validates the MultiSessionContribution flow.
87-143: LGTM: Thorough removal test covering all contribution types.The test comprehensively validates that
remove_anycorrectly identifies and removes each contribution type, returning the appropriate variant.
145-159: LGTM: Clear operation test is correct.The test properly validates that
clear()removes all entries from the cache.
165-177: LGTM: PathHash consistency test is essential.The test correctly validates deterministic hashing behavior, which is critical for cache correctness.
src/cache/model_intern.rs (3)
1-8: LGTM: Well-chosen interner with appropriate size constraints.Using
ThreadedRodeo<MiniSpur>with 2-byte keys is a sound choice for model names. The assumption of < 65,536 unique models is very reasonable for this use case.
10-29: Good defensive documentation for Default implementation.The note about
Defaultbeing implemented for trait bounds but having undefined resolution behavior is important. This is good defensive programming.
31-67: LGTM: Clean API with comprehensive test coverage.The public API is minimal and well-designed. The
#[inline]annotations are appropriate for these thin wrappers, and the test coverage is thorough.src/contribution_cache/tests/single_message.rs (5)
12-60: LGTM: Comprehensive SingleMessageContribution construction tests.The tests thoroughly cover message construction with and without models, stat extraction, and hash consistency. The coverage is excellent.
66-137: LGTM: View add/subtract operations are well-tested.The tests correctly validate the add/subtract semantics for SingleMessage contributions, including the important edge case of user messages (model=None) not incrementing AI message counts.
143-190: LGTM: File update flow test simulates realistic incremental updates.The test correctly models the subtract-old/add-new pattern for incremental file updates, which is a key feature of this PR.
193-227: LGTM: File deletion test validates cleanup behavior.The test properly validates that file deletion removes contributions and cleans up empty daily stats entries.
230-280: LGTM: Multi-file aggregation test is crucial for validation.The test correctly validates that multiple files can contribute to the same session and that partial deletion maintains correct aggregated stats.
src/contribution_cache/tests/single_session.rs (4)
12-100: LGTM: Comprehensive SingleSessionContribution construction tests.The tests thoroughly cover message aggregation scenarios, including empty input, single model, and multiple models with correct count tracking.
106-220: LGTM: View operations tests validate add/subtract semantics thoroughly.The tests correctly validate daily stats, session stats, and model count tracking through add/subtract operations, including cleanup when counts reach zero.
227-353: LGTM: File lifecycle tests validate incremental update flows.The tests properly simulate file update and deletion flows for single-session contributions, which are essential for the incremental processing feature.
356-388: LGTM: Date boundary test validates strategy-specific behavior.The test correctly validates that SingleSession contributions use the first message's date for daily stats aggregation, which is the appropriate behavior for this contribution strategy.
src/contribution_cache/multi_session.rs (2)
1-24: LGTM: MultiSessionContribution structure is well-designed.The struct appropriately represents aggregated data for multi-session files, with clear documentation and sensible field types.
26-42: LGTM: from_messages implementation is clean and efficient.The implementation correctly delegates to existing aggregation utilities and makes good optimization choices:
- Uses
Arc<str>to avoid per-session allocations- Filters out "unknown" dates to maintain data quality
- Reuses proven aggregation logic from other modules
src/contribution_cache/tests/compact_stats.rs (1)
1-160: Well-structured test coverage for PackedStatsDate bitfield operations.The tests comprehensively cover:
- Normal value packing/unpacking (lines 6-32)
- TUI stats conversion including cost_cents (lines 34-56)
- Maximum bit-limit values (lines 58-85)
- Saturation behavior for overflow values (lines 87-111)
- Real-world observed values (lines 113-140)
- Zero/default values (lines 142-160)
The edge case coverage is thorough and validates the bitfield packing implementation correctly.
src/contribution_cache/tests/multi_session.rs (1)
1-304: Comprehensive test coverage for multi-session contribution strategy.The tests validate the complete incremental update lifecycle:
- Creation from messages with session aggregation (lines 13-63)
- Empty message handling (lines 65-74)
- View add/subtract operations (lines 80-152)
- Session merging with combined stats (lines 154-190)
- File update flow with subtract-old/add-new pattern (lines 196-257)
- File deletion cleanup (lines 259-304)
The pattern matching on
RemovedContribution::MultiSessionat line 296 correctly handles type-safe cache removal.src/contribution_cache/single_message.rs (3)
45-59: Well-designed compact bitfield representation.The 176-bit packed layout is clearly documented and the
#[repr(C, align(1))]ensures minimal padding. The bitfield ranges are correctly specified for each field.
81-105: Saturation logic correctly prevents bit overflow.The
pack()method properly saturates all values to their bit limits:
- Token fields use appropriate masks (0x7FF_FFFF for 27-bit, 0x3FF_FFFF for 26-bit)
- Cost conversion handles f64 overflow with
.min(u16::MAX as f64)- Year offset uses both
saturating_suband.min(63)for safe bounds
138-150: Compile-time size assertion ensures memory layout stability.The
const _: () = assert!(...)pattern at line 150 guarantees the 32-byte layout at compile time, catching any future field changes that would break the cache alignment optimization.src/analyzers/kilo_code.rs (1)
303-321: Correctly implements per-source parsing pattern.The new trait methods follow the established pattern from other analyzers:
is_available()uses synchronousvscode_extension_has_sources(line 304)parse_source()delegates to the existingparse_kilo_code_task_directoryhelperis_valid_data_path()validatesui_messages.jsonfilescontribution_strategy()returnsSingleSessionmatching similar VSCode extension analyzersBased on learnings, this correctly avoids Tokio in the TUI-related paths.
src/analyzers/qwen_code.rs (2)
304-324: Data source discovery correctly uses WalkDir for flat directory traversal.The
min_depth(3).max_depth(3)constraint efficiently targets the~/.qwen/tmp/*/chats/*.jsonpattern without unnecessary recursion. Based on learnings, this aligns with the project's preference forwalkdiroverjwalkfor flat structures.
361-373: Per-source validation and contribution strategy correctly implemented.The
is_valid_data_pathvalidation matches the discovery filter logic, ensuring consistency.SingleSessionis the appropriate strategy for Qwen Code's 1-file-1-session data model.src/contribution_cache/tests/mod.rs (2)
28-69: Well-designed test helper for creating configurable messages.The
make_messagehelper provides good defaults while allowing customization of key fields. The date parsing with fallback ensures tests don't fail on date format issues, and the role assignment based on model presence matches real-world semantics.
71-98: Test view helpers enable consistent test setup.
make_empty_viewandmake_view_with_sessionprovide clean starting points for testing view mutations. The use ofArc::fromfor analyzer names aligns with the production code's memory optimization approach.src/analyzers/copilot.rs (3)
452-470: Data source discovery correctly migrated to WalkDir.The
min_depth(3).max_depth(3)constraint efficiently targetsworkspaceStorage/*/chatSessions/*.jsonfiles. This aligns with the project's preference forwalkdirfor flat directory structures.
473-508: Per-source parsing pattern correctly implemented.The new trait methods follow the established pattern:
is_available()uses the same filter logic as discoveryparse_source()delegates toparse_copilot_session_fileget_watch_directories()returns the workspace storage pathsis_valid_data_path()validates chatSessions JSON filescontribution_strategy()returnsSingleSessionBased on learnings, this correctly avoids Tokio in the background I/O paths.
32-48: Platform coverage limited to macOS.The
workspace_storage_dirs()method only returns macOS paths (Library/Application Support). Linux (~/.config/{fork}) and Windows (%APPDATA%) paths are not included. If cross-platform support is intended, consider extending this method.#!/bin/bash # Check if other analyzers have cross-platform workspace storage paths rg -n "workspace_storage" src/analyzers/ -A 10 | head -50 # Check if Linux/Windows paths exist elsewhere for Copilot rg -n "workspaceStorage" src/analyzers/copilot.rssrc/main.rs (1)
28-30: LGTM: Global allocator correctly configured.The mimalloc global allocator declaration is properly feature-gated and follows Rust best practices.
src/analyzers/opencode.rs (2)
442-461: LGTM: Data source discovery correctly implemented.The WalkDir traversal with
min_depth(2)andmax_depth(2)correctly matches OpenCode's directory structure for message files. The switch from jwalk to walkdir aligns with the PR's performance strategy.Based on learnings: Splitrail uses walkdir (not jwalk) for flat directory structures where parallel directory walking provides no benefit.
593-605: LGTM: Path validation correctly enforces OpenCode structure.The validation logic correctly checks for
.jsonfiles at depth 2 from the data directory, matching the structure enforced indiscover_data_sources.src/analyzers/claude_code.rs (3)
55-73: LGTM: Data source discovery correctly implemented.The WalkDir traversal with appropriate depth constraints correctly matches Claude Code's directory structure.
Based on learnings: Consistent with walkdir usage for flat directory structures where parallel walking provides no benefit.
85-231: LGTM: Deduplication logic correctly handles Claude Code's complex cross-file semantics.The sequential deduplication approach with fingerprint tracking and insertion order preservation correctly handles Claude Code's duplicate/split message scenarios. The trade-off of sequential processing for correctness is appropriate given the complexity of cross-file deduplication requirements.
240-251: LGTM: Path validation correct.The validation correctly enforces Claude Code's directory structure with
.jsonlfiles at depth 2.src/analyzers/pi_agent.rs (2)
415-433: LGTM: Data source discovery correctly implemented.The WalkDir traversal correctly matches Pi Agent's directory structure with appropriate depth constraints.
474-485: LGTM: Path validation correct.The validation correctly enforces Pi Agent's directory structure.
src/analyzers/codex_cli.rs (3)
49-64: LGTM: Data source discovery correctly handles Codex CLI's flexible structure.Unlike other analyzers, Codex CLI sessions can be arbitrarily nested under
~/.codex/sessions/, so the lack of depth constraints is appropriate. The WalkDir traversal will recursively find all.jsonlfiles regardless of nesting level.
82-88: LGTM: No deduplication needed for Codex CLI sessions.The comment correctly notes that Codex CLI sessions are separate, so deduplication is unnecessary. This differs from analyzers like Claude Code where messages can appear across multiple files.
97-100: LGTM: Path validation appropriate for Codex CLI's flexible structure.The validation correctly allows any
.jsonlfile under the sessions directory, regardless of depth, matching Codex CLI's flexible session organization.src/watcher.rs (3)
4-16: Good adoption ofparking_lot::Mutex.The switch from
std::sync::Mutextoparking_lot::Mutexaligns with the coding guidelines for improved performance. The imports are well-organized with the new view-based types.
135-152: Constructor simplification looks good.The sync constructor with parallel initial load is cleaner. The documentation correctly notes the rayon threadpool requirement.
167-194: Incremental update logic is well-structured.The branching between cached incremental updates and full reload fallback is correct. The file deletion path properly removes from cache before broadcasting.
src/contribution_cache/mod.rs (4)
1-14: Well-documented module structure.The module-level documentation clearly explains the memory footprint of each contribution type, which helps maintainers understand the design tradeoffs.
92-174: Clean cache design with type-safe operations.The three-map design provides both type safety and memory efficiency. The
#[inline]hints on frequently-called accessors are appropriate.
208-216: Silent no-op when session not found during contribution add.When adding a single-message contribution, if the session isn't found, the contribution is partially applied (daily stats updated) but session stats are skipped. The comment notes this is intentional, but this asymmetry could cause stats drift if sessions are created out-of-order.
Consider whether this scenario can occur in practice and if a warning log would help debugging.
345-347: O(n log n) sort on every multi-session add.Re-sorting all sessions after each add is correct for maintaining order invariants, but could be optimized with insertion sort for the common case of adding pre-sorted sessions. Given that MultiSession is used for "all-in-one-file" analyzers where incremental adds are rare, this is acceptable.
src/analyzer.rs (8)
4-16: Imports properly organized for new functionality.The new imports for
rayon,DashMap,parking_lot, and contribution cache types are well-organized and necessary for the parallel/incremental architecture.
100-151: Good refactoring of VSCode extension discovery.Extracting
walk_vscode_extension_tasksreduces duplication andvscode_extension_has_sourcesproperly short-circuits on first match for efficiency.
182-215: Well-designed parallel parsing with path preservation.
parse_sources_parallel_with_pathscorrectly preserves the path-to-messages association needed for contribution caching. The defaultparse_sources_parallelproperly deduplicates usingglobal_hash.
271-305: Registry structure supports incremental updates well.The
analyzer_orderfield correctly addresses the DashMap iteration order instability mentioned in tests. Usingparking_lot::RwLockaligns with coding guidelines.
395-543: Parallel loading with contribution caching is well-implemented.The strategy-based contribution caching correctly uses
PathHash::new(&path)from the source paths returned byparse_sources_parallel_with_paths, avoiding the conversation_hash mismatch issue flagged in past reviews. The sequential cache insertion after parallel computation is correct.
545-658: Incremental file reload implements correct subtract/add pattern.The implementation correctly:
- Validates path before processing
- Marks file dirty for upload tracking
- Gets old contribution from cache (if exists)
- Parses new file content
- Computes new contribution
- Updates cache
- Subtracts old contribution from view
- Adds new contribution to view
This achieves the O(1) incremental update goal.
750-781: Incremental upload loading is efficient.Sequential parsing for dirty files (typically 1-2) is appropriate. The deduplication and timestamp filtering are correctly ordered.
1096-1162: Excellent regression test for stable analyzer ordering.This test catches the DashMap iteration order issue that could cause tabs to jump. Testing order stability after updates and removals is thorough.
|
Any chance of getting this merged? OpenCode will likely move to sqlite backend, and I want to avoid conflicts when the code will inevitably require adjustment. |
|
Hi @Sewer56! I'm so sorry about that! We got busy and forgot about this one. Please next time ping us sooner! The 600 to 25 MB of memory is amazing; great job! I noticed that there's Otherwise, the code looks great! The model string interning that you're doing is clever. |
|
@Sewer56 Thank you again! |
|
👏 |

Improved Memory Usage, Incremental File Level Updates
Summary
Major performance and memory optimisation overhaul for Splitrail. Idle memory usage is improved by around a factor of 32.
After:

Before:

This is a small overhaul of how we store data under the hood. Summary of changes below.
The big one is the allocator change.
glibc's malloc has some well-known issues with multithreaded workloads:You can find details in various articles if you search for
MALLOC_ARENA_MAX, e.g. https://blog.cloudflare.com/the-effect-of-switching-to-tcmalloc-on-rocksdb-memory-use/Previously, we kept rayon thread pools alive due to using
par_iterandjwalkscrambled throughout the codebase. That means, glibc would accumulate arenas and hold onto memory indefinitely.I've switched to:
mimallocas the allocator - it handles thread-local caching more gracefully and returns memory to the OS more aggressivelyrayonpools: we now spawn pools, do the work, and tear them down, rather than keeping them alive.The
rayonpools are not kept alive once the TUI starts because there is no reason for us to do so. There is another major improvement here, incremental file level updates.Instead of re-scanning everything after a file changes, we only handle removed files, changed files and incoming files. Since these come one at a time, using threadpools is not needed- there is only 1 item to process.
Everything here is mostly focused around the TUI. Non-TUI code is largely untouched.
Key Changes
Memory Optimizations
String(18B + heap)MiniSpurvialassointerner (2B)CompactDate(4B packed)Stringper sessionArc<str>sharedStats(320B)TuiStats(24B)PathBufPathHash(8B xxh3)lasso::ThreadedRodeostores each unique model name once, referenced by 4-byte keysyear:u16 + month:u8 + day:u8struct (4 bytes total)u32and integer centsSome types were changed to more compact or suitable representations. Data not needed for the TUI was removed for the structs used to present data within the TUI.
Scoped Rayon Threadpools && mimalloc
Mentioned above. Setup threadpool. Do the work, tear down threadpool. During startup.
View-Based TUI Architecture
Structs added based on requirements of the TUI.
Previously we stored all messages (expensive), and computed things on the fly as we navigated through the TUI.
Now everything is computed ahead of time, with only the details displayed by the TUI.
Incremental updates mutate on this, speaking of...
O(1) Incremental Updates
Per-file contribution cache enables true incremental updates:
FileContributionview.subtract_contribution(&old)+view.add_contribution(&new)RwLockPreviously CPU spiked to around 70% on my machine with OpenCode, briefly.
Updated Packages
reqwest_simd_jsonextension traits for HTTP client integrationArchitecture Overview
Summary by CodeRabbit
New Features
Documentation
Performance
Chores
✏️ Tip: You can customize this high-level summary in your review settings.