Skip to content

Conversation

@Sewer56
Copy link
Contributor

@Sewer56 Sewer56 commented Jan 2, 2026

Improved Memory Usage, Incremental File Level Updates

Summary

Major performance and memory optimisation overhaul for Splitrail. Idle memory usage is improved by around a factor of 32.

After:
image

Before:
image

This is a small overhaul of how we store data under the hood. Summary of changes below.
The big one is the allocator change.

glibc's malloc has some well-known issues with multithreaded workloads:

  • It eagerly creates per-thread arenas (up to 8× CPU cores by default) to reduce lock contention
  • Memory freed in one arena can't be used by another, leading to fragmentation
  • It's reluctant to return memory to the OS unless there's contiguous free space at the top of an arena

You can find details in various articles if you search for MALLOC_ARENA_MAX, e.g. https://blog.cloudflare.com/the-effect-of-switching-to-tcmalloc-on-rocksdb-memory-use/

Previously, we kept rayon thread pools alive due to using par_iter and jwalk scrambled throughout the codebase. That means, glibc would accumulate arenas and hold onto memory indefinitely.

I've switched to:

  • mimalloc as the allocator - it handles thread-local caching more gracefully and returns memory to the OS more aggressively
  • Temporary rayon pools: we now spawn pools, do the work, and tear them down, rather than keeping them alive.

The rayon pools are not kept alive once the TUI starts because there is no reason for us to do so. There is another major improvement here, incremental file level updates.

Instead of re-scanning everything after a file changes, we only handle removed files, changed files and incoming files. Since these come one at a time, using threadpools is not needed- there is only 1 item to process.

Everything here is mostly focused around the TUI. Non-TUI code is largely untouched.


Key Changes

Memory Optimizations

Technique Before After Savings
Model names String (18B + heap) MiniSpur via lasso interner (2B) ~14-16 bytes/occurrence
Dates String/DateTime CompactDate (4B packed) ~8-24 bytes/date
Analyzer names String per session Arc<str> shared ~16-24 bytes/session
TUI stats Stats (320B) TuiStats (24B) ~296 bytes/session
Cache keys PathBuf PathHash (8B xxh3) Eliminates cloning
  • String Interning: Global lasso::ThreadedRodeo stores each unique model name once, referenced by 4-byte keys
  • CompactDate: Packed year:u16 + month:u8 + day:u8 struct (4 bytes total)
  • TuiStats: Lightweight projection with only TUI-visible fields, uses u32 and integer cents

Some types were changed to more compact or suitable representations. Data not needed for the TUI was removed for the structs used to present data within the TUI.

Scoped Rayon Threadpools && mimalloc

Mentioned above. Setup threadpool. Do the work, tear down threadpool. During startup.

let stats = {
    let pool = rayon::ThreadPoolBuilder::new().build()?;
    pool.install(|| registry.load_all_stats_parallel())?
    // Pool dropped here, threads released
};
release_unused_memory();  // mi_collect(true) , tell mimalloc to clean up.

View-Based TUI Architecture

Structs added based on requirements of the TUI.

pub struct AnalyzerStatsView {
    pub daily_stats: BTreeMap<String, DailyStats>,
    pub session_aggregates: Vec<SessionAggregate>,  // Pre-computed
    pub analyzer_name: Arc<str>,                    // Shared
}

pub type SharedAnalyzerView = Arc<RwLock<AnalyzerStatsView>>; // Shared with UI.

Previously we stored all messages (expensive), and computed things on the fly as we navigated through the TUI.
Now everything is computed ahead of time, with only the details displayed by the TUI.

Incremental updates mutate on this, speaking of...

O(1) Incremental Updates

Per-file contribution cache enables true incremental updates:

  1. When file changes -> lookup old FileContribution
  2. Parse only the changed file
  3. view.subtract_contribution(&old) + view.add_contribution(&new)
    • i.e. Remove 0.20 cost from last state of file, now add 0.24 after new prompt added.
  4. In-place mutation via RwLock

Previously CPU spiked to around 70% on my machine with OpenCode, briefly.

Updated Packages

  • Upgraded simd-json 0.15 -> 0.17
  • Added reqwest_simd_json extension traits for HTTP client integration

Architecture Overview

File Change Event
       |
       v
reload_file_incremental()
  |-- Parse single file (sequential)
  |-- FileContribution::from_messages()
  |-- file_contribution_cache.insert(path_hash, new)
  +-- shared_view.write()
        |-- subtract_contribution(&old)
        +-- add_contribution(&new)
       |
       v
TUI receives Arc clone (pointer copy)
       |
       v
draw_ui() acquires read locks as needed

Summary by CodeRabbit

  • New Features

    • Real‑time incremental TUI with faster day/session views, session aggregation, and lower memory use.
    • Incremental background uploads with progress and selective (changed‑only) message uploads.
    • On‑demand file‑operation summaries included in daily stats.
  • Documentation

    • New docs for MCP server, analyzer integration, performance, pricing, TUI, and core types.
  • Performance

    • Much faster parallel stats loading, improved caching, and explicit memory reclamation.
  • Chores

    • Dependency upgrades and optional memory allocator; model name interning for lower memory.

✏️ Tip: You can customize this high-level summary in your review settings.

Move specialized documentation to on-demand context files:
- .agents/MCP.md - MCP server details
- .agents/NEW_ANALYZER.md - Adding new analyzers (moved from skill)
- .agents/PERFORMANCE.md - Performance optimization
- .agents/PRICING.md - Model pricing updates
- .agents/TESTING.md - Test strategy
- .agents/TUI.md - Real-time monitoring
- .agents/TYPES.md - Core data structures

Reduces always-read context from 158 to 58 lines (63% reduction).
Also adds MCP and Configuration sections to README.md.
Remove code blocks and struct definitions that duplicate source code.
AI agents will read the actual source files for implementation patterns.
Consolidate TESTING.md into NEW_ANALYZER.md. Total reduction: 751 -> 232 lines.
… updates

Replace full-stats caching with lightweight view types for TUI rendering.
Messages are converted to pre-computed aggregates immediately after parsing,
reducing memory from ~550MB to ~70MB.

Key changes:
- Add view types: SessionAggregate, AnalyzerStatsView, FileContribution
- Add mimalloc v3 as global allocator (replaces glibc malloc arenas)
- Replace message cache with FileContribution cache for O(1) incremental updates
- Use temporary Rayon thread pool for initial load, dropped after parsing

Mimalloc is optional via feature flag (enabled by default). Disable for
memory profiling with heaptrack: cargo build --no-default-features

Removed 1500+ lines of dead code and 37 tests that only covered removed paths.
Replace PathBuf keys with xxh3 hash-based PathHash newtype to eliminate
allocations during incremental cache updates. Cache lookups and updates
now use PathHash::new(path) which computes the hash in place without
any heap allocation.
…short-circuit after finding first file

Add walk_data_dir() helpers to all analyzers that reuse WalkDir config
between discover_data_sources() and is_available(). The is_available()
implementations now use .any() which stops iteration after finding the
first matching file, avoiding the cost of discovering all files just to
check availability.

- Add walk_data_dir() helper to claude_code, codex_cli, gemini_cli,
  opencode, qwen_code, pi_agent analyzers
- Add walk_data_dirs() helper to copilot analyzer (multiple roots)
- Add walk_vscode_extension_tasks() and vscode_extension_has_sources()
  shared helpers for cline, kilo_code, roo_code analyzers
- Refactor discover_vscode_extension_sources() to use shared helper
- Add get_stats_with_sources() default trait method to avoid double
  discovery when loading stats
- Make get_watch_directories() a required trait method (no default)
- Remove duplicated get_stats() and is_available() from all analyzers
Add is_valid_data_path() to Analyzer trait to filter invalid paths
(directories, wrong file types) before attempting to parse. Each
analyzer implements validation for its specific data format:

- OpenCode: .json at depth 2 (session_id/message.json)
- Claude Code: .jsonl at depth 2 (project_id/conversation.jsonl)
- VSCode extensions: ui_messages.json files
- Copilot: .json in chatSessions directory
- Gemini/Qwen: .json in chats directory
- Codex CLI: .jsonl files
- Pi Agent: .jsonl at depth 2
- Piebald: app.db file

Fixes 'Is a directory' errors when OpenCode creates session directories.
Adds file-based debug logging to help diagnose potential deadlock
issues with RwLock contention. Logs lock acquisition/release for
both read and write locks, watch channel operations, and TUI lifecycle.

Enable with: SPLITRAIL_DEBUG_LOG=1
Logs to: /tmp/splitrail-debug.log
…mory

- Change SessionAggregate.analyzer_name from String to Arc<str>, saving ~200KB
  by sharing a single allocation across all sessions from the same analyzer
- Change AnalyzerStatsView.analyzer_name to Arc<str> for consistency
- Fix redundant cloning in aggregate_sessions_from_messages using or_insert_with_key,
  reducing from 3 clones to 1 per message (~200-300KB savings)
- Remove unused Serialize/Deserialize derives from view-only types
- Fix pre-existing clippy warning in debug_log.rs (collapsible if)

Total estimated idle memory savings: ~400-500KB (8-9%)
The deadlock occurred when:
1. Thread 1 (TUI) held read lock on a view in draw_ui
2. Thread 2 (Watcher) queued for write lock on same view
3. Thread 1 tried to acquire another read in draw_summary_stats
4. parking_lot's fair queuing blocked Thread 1's new read behind Thread 2's write
5. Neither thread could proceed - classic priority inversion deadlock

Fix: Scope the read lock in draw_ui to release it before calling
draw_summary_stats, which acquires its own independent read locks.
Replace full Stats (320 bytes) with compact TuiStats (24 bytes) in
SessionAggregate and DailyStats, keeping only the 6 fields displayed
in the TUI: input/output/reasoning/cached tokens, cost, and tool calls.

- Use cost_cents (u32) instead of cost (f64) to avoid float precision issues
- Make format_number generic via impl Into<u64> to accept both u32 and u64
- Fix MCP get_file_operations to compute from raw messages (not TuiStats)
- Keep full Stats in ConversationMessage for upload serialization

Memory savings: ~710 KB (91% reduction in stats memory) for typical usage.
- Add lasso crate for model name interning (4-byte Spur vs 24+ byte String)
- Replace day_key String with compact DayKey struct (4 bytes vs 34+ bytes)
  - Stores year(u16) + month(u8) + day(u8) directly, no heap allocation
  - DayKey::from_local() extracts components from DateTime without formatting
- Pre-size models Vec with capacity of 2 (most sessions use 1-2 models)
- Add shrink_to_fit() after building session aggregates

Estimated memory savings: ~1.5MB at idle (~37% reduction)
- Change session_day_filters from Vec<Option<String>> to Vec<Option<DayKey>>
- Change DailyStats.date from String to DayKey
- Remove PartialEq<str/String> impls from DayKey (no longer needed)
- Add Default derive to DayKey, add Serialize/Deserialize impls
- Simplify update_day_filters using resize/truncate
The rayon thread pool was adding complexity without benefit since
analyzers are already parallelized internally. Simplifies to async/await.
Replace rayon thread pool with tokio's async runtime for parallel
analyzer loading. Each analyzer now runs concurrently via join_all
while parsing files sequentially within each analyzer.

Switch from jwalk to the lighter walkdir crate for directory
traversal. Remove rayon dependency entirely.

- Consolidate parse_sources/parse_sources_parallel into parse_sources
- Inline walk_data_dir() helpers into discover/is_available methods
- Update all analyzers to use walkdir's into_path() API
- Remove redundant deduplication overrides where global_hash suffices
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/mcp/server.rs (1)

248-307: Consider simplifying the message filtering logic.

The current implementation filters messages in two separate steps (analyzer, then date), creating intermediate Vec collections. This could be streamlined with iterator chaining.

🔎 Proposed refactor for cleaner filtering
     let stats = self.load_stats().map_err(|e| e.to_string())?;

-    // Collect messages, optionally filtered by analyzer
-    let messages: Vec<_> = if let Some(ref analyzer_name) = req.analyzer {
-        stats
-            .analyzer_stats
-            .iter()
-            .filter(|a| a.analyzer_name.eq_ignore_ascii_case(analyzer_name))
-            .flat_map(|a| a.messages.iter())
-            .collect()
-    } else {
-        stats
-            .analyzer_stats
-            .iter()
-            .flat_map(|a| a.messages.iter())
-            .collect()
-    };
-
-    // Filter by date if specified
-    let filtered: Vec<_> = if let Some(ref date) = req.date {
-        messages
-            .into_iter()
-            .filter(|m| {
-                m.date
-                    .with_timezone(&chrono::Local)
-                    .format("%Y-%m-%d")
-                    .to_string()
-                    == *date
-            })
-            .collect()
-    } else {
-        messages
-    };
-
-    // Sum file operations from raw Stats
+    // Filter and sum file operations in one pass
     let mut response = FileOpsResponse::default();
-    for msg in filtered {
+    
+    for msg in stats
+        .analyzer_stats
+        .iter()
+        .filter(|a| {
+            req.analyzer
+                .as_ref()
+                .map(|name| a.analyzer_name.eq_ignore_ascii_case(name))
+                .unwrap_or(true)
+        })
+        .flat_map(|a| a.messages.iter())
+        .filter(|m| {
+            req.date
+                .as_ref()
+                .map(|d| {
+                    m.date
+                        .with_timezone(&chrono::Local)
+                        .format("%Y-%m-%d")
+                        .to_string()
+                        == *d
+                })
+                .unwrap_or(true)
+        })
+    {
         response.files_read += msg.stats.files_read;
         // ... rest of accumulation
     }
src/types.rs (1)

424-434: Fix inconsistent saturation handling in From<&Stats> implementation.

The From<&Stats> implementation (lines 424-434) uses plain casts that truncate values above u32::MAX, while the AddAssign, SubAssign operators and accumulate_tui_stats function all use saturating_add/saturating_sub. Although large aggregations are handled correctly through the saturating operators used in the actual code flow, the From implementation should be consistent:

TuiStats {
    input_tokens: s.input_tokens.min(u32::MAX as u64) as u32,
    output_tokens: s.output_tokens.min(u32::MAX as u64) as u32,
    reasoning_tokens: s.reasoning_tokens.min(u32::MAX as u64) as u32,
    cached_tokens: s.cached_tokens.min(u32::MAX as u64) as u32,
    cost_cents: (s.cost * 100.0).round() as u32,
    tool_calls: s.tool_calls,
}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64eaaf8 and 92aa7e5.

📒 Files selected for processing (3)
  • src/mcp/server.rs
  • src/mcp/types.rs
  • src/types.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Follow Rust 2024 edition conventions
Use anyhow::Result for error handling in Rust code
Prefer async/await over raw futures in Rust code
Use parking_lot locks over std::sync for performance in Rust code

Files:

  • src/mcp/types.rs
  • src/types.rs
  • src/mcp/server.rs
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Normalize parsed data to common format defined in `src/types.rs` for cross-platform compatibility

Applied to files:

  • src/mcp/types.rs
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.

Applied to files:

  • src/mcp/types.rs
  • src/types.rs
  • src/mcp/server.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system

Applied to files:

  • src/types.rs
  • src/mcp/server.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation

Applied to files:

  • src/mcp/server.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation

Applied to files:

  • src/mcp/server.rs
🧬 Code graph analysis (2)
src/types.rs (3)
src/tui/logic.rs (1)
  • aggregate_sessions_from_messages (131-182)
src/utils.rs (1)
  • aggregate_by_date (108-228)
src/mcp/types.rs (1)
  • new (118-135)
src/mcp/server.rs (2)
src/mcp/types.rs (1)
  • new (118-135)
src/types.rs (1)
  • cost (405-407)
🔇 Additional comments (10)
src/types.rs (4)

14-33: LGTM: Model interner is well-designed.

The global ThreadedRodeo interner is a solid choice for reducing memory usage of repeated model names. The use of LazyLock ensures thread-safe initialization, and the &'static str return from resolve_model is safe given ThreadedRodeo's lifetime guarantees.


125-143: LGTM: Efficient memory design.

Good use of Arc<str> for shared analyzer names and Vec<Spur> for interned models. This significantly reduces memory overhead compared to storing full Strings in every SessionAggregate.


534-551: LGTM: FileContribution construction is correct.

The method properly delegates to aggregation utilities and correctly computes conversation count from unique sessions. Filtering out "unknown" dates is a sensible safeguard.


553-598: LGTM: Incremental addition logic is sound.

The method correctly handles both merging existing sessions and appending new ones. Model deduplication and earliest-timestamp tracking are properly implemented, and the final sort maintains consistent ordering.

src/mcp/types.rs (2)

104-136: LGTM: Clean separation of TUI stats and on-demand file operations.

The introduction of DateFileOps and the DailySummary::new constructor properly addresses the previous concern about file operation stats being hardcoded to zero. The design cleanly separates TUI-relevant fields (DailyStats) from less-frequently-needed file operation stats (computed on-demand from raw messages).


163-163: LGTM: Default derive is appropriate.

Adding Default to FileOpsResponse is a minor convenience improvement that's used in the refactored get_file_operations endpoint.

src/mcp/server.rs (4)

41-83: LGTM: Synchronous stats loading and on-demand file ops computation.

The change from async to synchronous load_stats aligns with the PR objectives to avoid Tokio in background paths. The compute_file_ops_by_date method correctly aggregates file operations from raw messages on-demand, avoiding the need to store them in DailyStats.

Based on learnings: Prefer blocking I/O on dedicated threads rather than async/await with Tokio for TUI-related code.


119-153: LGTM: Daily stats now correctly include file operations.

The endpoint properly integrates on-demand file operation computation via compute_file_ops_by_date and passes the results to DailySummary::new. This resolves the previous issue where file operation fields were hardcoded to zero.


313-390: LGTM: Consistent updates to use method-based cost access.

All endpoints correctly use ds.stats.cost() method instead of direct field access, aligning with the TuiStats API. The change from .iter() to .into_iter() in list_analyzers is a minor efficiency improvement.


436-507: LGTM: Resource handlers updated consistently.

Both resource handlers correctly use synchronous load_stats and the cost() method accessor. The logic remains sound.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/main.rs (1)

3-3: Consider using parking_lot::Mutex for consistency with project guidelines.

The import uses std::sync::Mutex, which is used for upload_status at line 234. While this isn't a performance-critical hot path, the coding guidelines recommend parking_lot locks over std::sync for better performance characteristics. Consider replacing std::sync::Mutex with parking_lot::Mutex for consistency across the codebase.

🔎 Suggested change
-use std::sync::{Arc, Mutex};
+use parking_lot::Mutex;
+use std::sync::Arc;

Then at line 234, the usage remains the same:

let upload_status = Arc::new(Mutex::new(tui::UploadStatus::None));

As per coding guidelines, parking_lot is preferred for performance in Rust code.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92aa7e5 and 01fb5af.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • .agents/PERFORMANCE.md
  • Cargo.toml
  • src/main.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • Cargo.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Follow Rust 2024 edition conventions
Use anyhow::Result for error handling in Rust code
Prefer async/await over raw futures in Rust code
Use parking_lot locks over std::sync for performance in Rust code

Files:

  • src/main.rs
🧠 Learnings (8)
📓 Common learnings
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: .agents/PERFORMANCE.md:0-0
Timestamp: 2026-01-02T01:31:46.893Z
Learning: Splitrail uses `walkdir` (not `jwalk`) for directory traversal because most analyzer data directories have flat structures where parallel directory walking provides no benefit. Parallel processing is applied at the file parsing level via rayon instead, where the actual performance gains are realized.
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.

Applied to files:

  • src/main.rs
📚 Learning: 2026-01-01T00:15:25.382Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-01T00:15:25.382Z
Learning: Applies to AGENTS.md : Document agent responsibilities and capabilities in AGENTS.md

Applied to files:

  • .agents/PERFORMANCE.md
📚 Learning: 2026-01-02T01:31:46.893Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: .agents/PERFORMANCE.md:0-0
Timestamp: 2026-01-02T01:31:46.893Z
Learning: Splitrail uses `walkdir` (not `jwalk`) for directory traversal because most analyzer data directories have flat structures where parallel directory walking provides no benefit. Parallel processing is applied at the file parsing level via rayon instead, where the actual performance gains are realized.

Applied to files:

  • .agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system

Applied to files:

  • .agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation

Applied to files:

  • .agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation

Applied to files:

  • .agents/PERFORMANCE.md
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code

Applied to files:

  • .agents/PERFORMANCE.md
🔇 Additional comments (7)
.agents/PERFORMANCE.md (1)

1-18: LGTM! Well-structured performance documentation.

The documentation accurately reflects the project's performance optimizations, correctly references walkdir for directory traversal (as per project learnings), and provides clear, actionable guidelines. The mention of parking_lot locks aligns with coding standards, and the techniques listed match the architectural changes in this PR.

src/main.rs (6)

25-27: LGTM! Proper global allocator declaration.

The mimalloc global allocator is correctly feature-gated and uses the official crate as recommended in previous reviews.


212-231: LGTM! Excellent scoped threadpool pattern with memory reclamation.

The temporary rayon threadpool is created, used via install(), and then dropped immediately after initialization. The explicit release_unused_memory() call ensures memory is returned to the OS. This pattern aligns perfectly with the PR's memory-management objectives.


246-260: LGTM! Proper memory management for background auto-upload.

The separate registry for upload and the call to release_unused_memory() after scoped parallel loading ensures the background upload doesn't retain unnecessary memory. This aligns with the incremental upload approach described in the PR objectives.


294-304: LGTM! Consistent scoped threadpool pattern.

Identical pattern to run_default: create temporary pool, load stats via install(), drop pool, release memory. Consistent implementation across the codebase.


392-402: LGTM! Consistent memory management in stats command.

Same scoped threadpool and memory-release pattern applied consistently.


444-458: LGTM! Well-documented memory release helper.

The implementation uses the official libmimalloc_sys::mi_collect (addressing previous review feedback) with a clear SAFETY comment. The feature-gated no-op version ensures compilation without mimalloc. The function signature and documentation are clear.

Replace all std::sync::Mutex usage with parking_lot::Mutex for better
performance, as recommended in the coding guidelines. This also simplifies
lock patterns since parking_lot::Mutex::lock() never fails.

Additionally removes a redundant upload_in_progress check in watcher.rs
that duplicated an earlier check in the same function.
Change SessionAggregate.models from Vec<Spur> to BTreeMap<Spur, u32>
to correctly handle incremental updates when multiple files contribute
the same model to a session. This matches the pattern already used by
DailyStats.models.

Add tests for model reference counting in both DailyStats and
SessionAggregate.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/types.rs (1)

348-389: Optional: Consider clamping floating-point cost subtraction.

Line 358 uses direct floating-point subtraction for cost, which can produce tiny negative values (e.g., -1e-15) due to precision errors when subtracting nearly-equal costs. While functionally harmless, consider clamping to zero for cleaner output:

-        self.cost -= rhs.cost;
+        self.cost = (self.cost - rhs.cost).max(0.0);

This is a cosmetic refinement and not critical for functionality.

🧹 Nitpick comments (4)
src/watcher.rs (1)

297-309: Consider replacing tokio::spawn with std::thread::spawn for background uploads.

Line 298 uses tokio::spawn for background upload tasks. Per your earlier feedback, TUI-related code should avoid tokio and prefer blocking I/O on a spawned thread or small threadpool.

Consider refactoring to:

  • Use std::thread::spawn instead of tokio::spawn
  • Make the upload call blocking (or wrap async call in a local runtime within the thread)
  • This eliminates tokio dependency from the TUI path
Example approach using std::thread::spawn
-        // Spawn background upload task with only the messages to upload
-        tokio::spawn(async move {
-            upload::perform_background_upload_messages(
+        // Spawn background upload on dedicated thread
+        std::thread::spawn(move || {
+            // Run blocking upload or create local tokio runtime if needed
+            let rt = tokio::runtime::Builder::new_current_thread()
+                .enable_all()
+                .build()
+                .unwrap();
+            rt.block_on(async {
+                upload::perform_background_upload_messages(
-                messages,
-                upload_status,
-                None,
-                Some(move || dirty_files.clear()),
-            )
-            .await;
+                    messages,
+                    upload_status,
+                    None,
+                    Some(move || dirty_files.clear()),
+                )
+                .await;
+            });
 
-            // Mark upload as complete
-            *upload_in_progress.lock() = false;
+                // Mark upload as complete
+                *upload_in_progress.lock() = false;
+            });
         });

Alternatively, consider making the upload path fully synchronous to avoid tokio entirely.

Based on learnings: In TUI code (src/tui.rs, src/watcher.rs, and related), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool.

src/analyzers/piebald.rs (1)

252-258: Consider logging parse errors in parallel processing.

Line 255 uses unwrap_or_default() which silently ignores parse errors for individual sources. While this provides fault tolerance, it may hide issues from users.

Consider logging errors before returning the default:

         let all_messages: Vec<ConversationMessage> = sources
             .par_iter()
-            .flat_map(|source| self.parse_source(source).unwrap_or_default())
+            .flat_map(|source| {
+                self.parse_source(source).unwrap_or_else(|e| {
+                    eprintln!("Error parsing source {:?}: {}", source.path, e);
+                    Vec::new()
+                })
+            })
             .collect();

This maintains fault tolerance while providing visibility into parsing issues.

src/utils.rs (1)

264-293: Remove redundant HashSet imports.

HashSet is already imported at line 1 (use std::collections::{BTreeMap, HashSet};). The inner use std::collections::HashSet; statements at lines 267 and 280 are unnecessary.

🔎 Suggested cleanup
 /// Sequential deduplication by global_hash using HashSet.
 /// Used for post-init processing (incremental updates, uploads).
 pub fn deduplicate_by_global_hash(messages: Vec<ConversationMessage>) -> Vec<ConversationMessage> {
-    use std::collections::HashSet;
-
     let mut seen: HashSet<String> = HashSet::with_capacity(messages.len() / 2);
     messages
         .into_iter()
         .filter(|msg| seen.insert(msg.global_hash.clone()))
         .collect()
 }

 /// Sequential deduplication by local_hash using HashSet.
 /// Messages without local_hash are always kept.
 /// Used for post-init processing (incremental updates, uploads).
 pub fn deduplicate_by_local_hash(messages: Vec<ConversationMessage>) -> Vec<ConversationMessage> {
-    use std::collections::HashSet;
-
     let mut seen: HashSet<String> = HashSet::with_capacity(messages.len() / 2);
     messages
         .into_iter()
         .filter(|msg| {
src/upload.rs (1)

304-312: Consider logging config load failures for debugging.

The Config::load().ok().flatten()? pattern silently discards any config loading errors. For background uploads this is likely intentional, but logging the error (when upload_debug is enabled) could help diagnose configuration issues.

🔎 Optional: Log config errors in debug mode
     let upload_result = async {
-        let mut config = Config::load().ok().flatten()?;
+        let mut config = match Config::load() {
+            Ok(Some(c)) => c,
+            Ok(None) => return None,
+            Err(e) => {
+                if upload_debug_enabled() {
+                    upload_debug_log(format!("[splitrail upload] config load error: {e:#}"));
+                }
+                return None;
+            }
+        };
         if !config.is_configured() {
             return None;
         }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01fb5af and b019197.

📒 Files selected for processing (10)
  • src/analyzers/piebald.rs
  • src/main.rs
  • src/tui.rs
  • src/tui/logic.rs
  • src/types.rs
  • src/upload.rs
  • src/upload/tests.rs
  • src/utils.rs
  • src/version_check.rs
  • src/watcher.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/main.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Follow Rust 2024 edition conventions
Use anyhow::Result for error handling in Rust code
Prefer async/await over raw futures in Rust code
Use parking_lot locks over std::sync for performance in Rust code

Files:

  • src/utils.rs
  • src/tui.rs
  • src/tui/logic.rs
  • src/watcher.rs
  • src/upload/tests.rs
  • src/upload.rs
  • src/analyzers/piebald.rs
  • src/version_check.rs
  • src/types.rs
src/analyzers/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/analyzers/*.rs: Implement the Analyzer trait for new analyzers in the pluggable architecture system
Place individual analyzer implementations in src/analyzers/ directory with Analyzer trait implementation
Analyzers should discover data files using platform-specific paths in their implementation

Files:

  • src/analyzers/piebald.rs
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code

Applied to files:

  • src/utils.rs
  • src/watcher.rs
  • src/upload/tests.rs
  • src/upload.rs
  • src/version_check.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Use hash-based deduplication with global hash field for message deduplication

Applied to files:

  • src/utils.rs
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.

Applied to files:

  • src/utils.rs
  • src/tui.rs
  • src/tui/logic.rs
  • src/watcher.rs
  • src/upload/tests.rs
  • src/upload.rs
  • src/analyzers/piebald.rs
  • src/version_check.rs
  • src/types.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system

Applied to files:

  • src/tui.rs
  • src/analyzers/piebald.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Prefer `async/await` over raw futures in Rust code

Applied to files:

  • src/watcher.rs
  • src/upload.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation

Applied to files:

  • src/watcher.rs
  • src/analyzers/piebald.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `anyhow::Result` for error handling in Rust code

Applied to files:

  • src/version_check.rs
🧬 Code graph analysis (5)
src/utils.rs (1)
src/types.rs (2)
  • from_local (51-59)
  • from_str (63-65)
src/tui/logic.rs (1)
src/types.rs (3)
  • intern_model (25-27)
  • from_local (51-59)
  • from (428-437)
src/watcher.rs (2)
src/config.rs (1)
  • load (82-94)
src/upload.rs (2)
  • perform_background_upload_messages (291-333)
  • perform_background_upload_messages (368-368)
src/upload.rs (3)
src/watcher.rs (1)
  • set_upload_status (154-156)
src/config.rs (1)
  • load (82-94)
src/utils.rs (1)
  • get_messages_later_than (230-242)
src/types.rs (2)
src/tui/logic.rs (1)
  • aggregate_sessions_from_messages (131-180)
src/utils.rs (1)
  • aggregate_by_date (107-227)
🪛 GitHub Actions: checks
src/tui/logic.rs

[error] 1-1: cargo fmt --check failed due to formatting differences: import order changed (intern_model moved after other items).

src/types.rs

[error] 1-1: cargo fmt --check failed due to formatting differences: comment alignment/spacing changed around model reference counts subtraction.

🔇 Additional comments (29)
src/version_check.rs (2)

2-4: LGTM! Parking_lot migration follows guidelines.

The switch from std::sync::Mutex to parking_lot::Mutex aligns with the coding guidelines and PR objectives for improved performance and non-poisoning semantics. Keeping Arc from std::sync is correct since parking_lot doesn't provide its own Arc.

As per coding guidelines: "Use parking_lot locks over std::sync for performance in Rust code."


95-106: Simplified locking is correct for parking_lot.

The direct assignment pattern *status_clone.lock() = result; (line 102) is cleaner and correct for parking_lot::Mutex, which returns a guard directly rather than a Result and doesn't poison on panic.

Tokio is appropriately used here.

The function uses tokio::spawn to run the version check in the background. Since this is called from main() (src/main.rs:238) which has the #[tokio::main] attribute, the tokio runtime is active. Additionally, this background task performs network I/O (HTTP API calls to GitHub), which is inherently async and an appropriate use case for tokio—unlike file I/O operations in TUI code, which should use blocking threadpools.

src/watcher.rs (2)

4-4: LGTM: Correct use of parking_lot and view types.

The import of parking_lot::Mutex aligns with coding guidelines for better lock performance, and MultiAnalyzerStatsView supports the new view-based architecture for incremental updates.

Also applies to: 15-15


135-152: LGTM: Synchronous initialization with rayon parallelism.

The constructor correctly uses synchronous parallel I/O via rayon instead of async/tokio, aligning with the TUI's blocking I/O design. The doc comment appropriately notes the rayon context requirement.

Based on learnings: In TUI code, prefer blocking I/O with rayon over tokio async/await.

src/types.rs (5)

14-33: LGTM: Efficient model name interning.

The global thread-safe interner using lasso::ThreadedRodeo significantly reduces memory usage by replacing 24-byte Strings with 4-byte Spurs for model names that repeat across thousands of sessions. The inline annotations are appropriate for these hot-path functions.


35-124: LGTM: Compact date representation.

The 4-byte CompactDate representation (year: u16, month: u8, day: u8) is an effective memory optimization. The use of checked arithmetic prevents overflow, and the serialization/ordering implementations are correct.


125-145: LGTM: Correct model reference counting in SessionAggregate.

The use of BTreeMap<Spur, u32> for models correctly implements reference counting, addressing the previous concern about incorrect model removal when multiple files contribute to the same session. The Arc<str> for analyzer_name is also appropriately shared across sessions.


556-641: LGTM: Correct incremental update implementation with model reference counting.

The add_contribution and subtract_contribution methods correctly implement reference counting for models using BTreeMap<Spur, u32>. This prevents the incorrect model removal issue identified in previous reviews, where models shared across multiple files would be prematurely removed during incremental updates.

The implementation correctly:

  • Increments counts on add (line 580)
  • Decrements counts on subtract with removal at zero (lines 627-631)
  • Maintains invariant: (add file1) + (add file2) - (subtract file1) = (add file2)

1-1: Fix cargo fmt issue.

The pipeline reports formatting differences. Please run cargo fmt to resolve the formatting issue around comment alignment/spacing.

⛔ Skipped due to learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Run post-change verification commands: cargo build, cargo test, cargo clippy, cargo doc, and cargo fmt
src/upload/tests.rs (1)

7-7: LGTM: Correct parking_lot::Mutex usage in tests.

The migration from std::sync::Mutex to parking_lot::Mutex is correct. Since parking_lot locks don't poison, the removal of .unwrap() calls on lock() is appropriate. This aligns with the project's coding guidelines for using parking_lot over std::sync for performance.

As per coding guidelines: Use parking_lot locks over std::sync for performance in Rust code.

Also applies to: 146-146, 153-153, 311-311, 335-335, 379-379

src/tui.rs (4)

6-8: LGTM: View-based TUI architecture.

The function signature and imports correctly reflect the shift to view-based data flow. Using watch::Receiver<MultiAnalyzerStatsView> allows the TUI to receive pre-computed aggregates instead of raw messages, reducing memory usage and enabling incremental updates.

Also applies to: 20-20, 98-105


186-220: LGTM: Efficient shared view access with RwLock.

The use of SharedAnalyzerView (Arc<RwLock>) enables efficient shared access:

  • Cloning the Arc (lines 191, 213) is cheap (just pointer increment)
  • Read locks (e.g., lines 329, 346, 376) allow concurrent reads
  • Locks are properly scoped and released

This pattern is consistent throughout the TUI rendering code and enables lock-free cloning of the view for UI updates.

Also applies to: 329-361, 376-420


91-91: LGTM: Consistent CompactDate usage for filtering.

The CompactDate type is used consistently throughout for day filters:

  • Stored in Option<CompactDate> for optional filtering
  • Parsed from strings via CompactDate::from_str
  • Compared directly via == for session filtering
  • Displayed via to_string()

As a Copy type (4 bytes), it's efficient to pass around without cloning overhead.

Also applies to: 162-162, 467-467, 683-691, 1487-1488, 1899-1900, 1982-1985


710-717: LGTM: Correct view rendering with scoped lock acquisition.

The draw functions correctly:

  • Accept view types (&[SharedAnalyzerView], &AnalyzerStatsView, &[SessionAggregate])
  • Acquire read locks only for the duration needed (e.g., lines 1894-1925)
  • Use resolve_model to convert interned Spur keys back to model names
  • Keep lock scopes minimal to avoid holding locks during rendering

This design enables responsive UI updates while safely sharing data between the watcher and TUI threads.

Also applies to: 987-995, 1460-1469, 1878-1925

src/analyzers/piebald.rs (4)

245-250: LGTM: Synchronous per-source parsing.

The parse_source implementation correctly performs synchronous parsing of a single data source, opening the SQLite database, querying data, and converting to ConversationMessage format. This aligns with the new per-source parsing pattern in the Analyzer trait.

Based on learnings: Implement the Analyzer trait for new analyzers in the pluggable architecture system.


260-266: LGTM: Platform-specific watch directory discovery.

The get_watch_directories implementation correctly discovers the Piebald data directory using the platform-specific dirs::data_dir() and only returns it if the directory exists. This aligns with the coding guideline for analyzers to discover data files using platform-specific paths.

As per coding guidelines: Analyzers should discover data files using platform-specific paths in their implementation.


268-271: LGTM: Correct data path validation.

The is_valid_data_path implementation correctly validates that the path is a file named exactly "app.db", which is appropriate for Piebald's SQLite database. The use of is_some_and cleanly chains the validation checks.


291-297: LGTM: Test updated for new API.

The test correctly validates the empty sources case using the new get_stats_with_sources API and appropriately asserts that the result contains no messages.

src/utils.rs (3)

2-7: Good adoption of parking_lot::Mutex.

The switch from std::sync::Mutex to parking_lot::Mutex aligns with the coding guidelines. The OnceLock from std::sync is appropriate here since parking_lot doesn't provide an equivalent.


106-172: Aggregation logic looks correct.

The use of CompactDate::from_local for date initialization and the sequential aggregation approach align with the PR's memory optimization goals. The token accumulation using saturating_add with u32 casts is acceptable per the author's analysis in the previous review.


33-35: Nice ergonomic improvement.

Accepting impl Into<u64> allows both u32 and u64 values without explicit conversion at call sites, which pairs well with the new TuiStats fields that use u32.

src/tui/logic.rs (4)

8-19: Accumulation logic is correct.

The function cleanly accumulates only the 6 TUI-relevant fields with saturating_add for safe overflow handling. The add_cost method encapsulates cost accumulation properly.


108-124: Clean helper functions for data availability checks.

The separation between has_data_view (direct check) and has_data_shared (acquires read lock first) provides a clear API for different access patterns.


126-180: Well-optimized session aggregation.

Good use of memory-efficient patterns:

  • Arc<str> avoids per-session String allocation
  • or_insert_with_key minimizes redundant cloning
  • intern_model for compact model representation
  • shrink_to_fit() releases excess capacity

The comments explaining the cloning strategy are helpful.


182-210: Tests cover the primary cases.

The tests validate both positive and negative cases for has_data_view. Consider adding a test case where num_conversations == 0 but daily_stats has non-zero values to verify the second condition, though the current coverage is reasonable.

src/upload.rs (4)

7-9: Good use of parking_lot::Mutex.

The switch to parking_lot::Mutex aligns with the coding guidelines. The OnceLock from std::sync is appropriate since there's no parking_lot equivalent.


234-266: Clean helper abstractions.

The set_upload_status helper provides atomic status updates, and make_progress_callback nicely preserves the dots animation state when updating progress counts.


268-286: Status lifecycle management looks correct.

The 3-second delay before hiding success status provides user feedback, and error states remain visible until explicitly cleared. The None case correctly handles no-op scenarios.


335-370: Clean delegation to the messages-based upload.

The separation between perform_background_upload (startup, filters by timestamp) and perform_background_upload_messages (direct messages, supports callback) provides good flexibility. The ::<fn()> turbofish is necessary to satisfy type inference when passing None for the callback.

Optimize memory usage for model reference counting in SessionAggregate:
- Add tinyvec dependency for inline storage
- Create ModelCounts struct wrapping TinyVec<[(Spur, u32); 3]>
- Provides clean API: increment(), decrement(), get(), iter()
- Inline storage for up to 3 models (typical case), spills to heap if more
- Reduces per-session overhead from BTreeMap allocations to ~32 bytes inline
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/types.rs (1)

405-446: Potential negative cost from floating-point subtraction.

As flagged in a previous review, line 415 uses direct f64 subtraction for cost, which could produce small negative values due to floating-point precision when subtracting nearly-equal costs. The integer fields correctly use saturating_sub.

Cargo.toml (1)

58-60: Add root certificate store and consider additional reqwest features.

Following up on the previous review: while the "rustls" feature is correct, the configuration is incomplete:

  1. Missing root certificates: Without rustls-native-certs or webpki-roots, reqwest cannot validate TLS certificates unless .danger_accept_invalid_certs(true) is used (which disables security). Add one of these dependencies:

    webpki-roots = "0.26"  # or rustls-native-certs = "0.8"
  2. Missing JSON feature: The codebase makes API calls to splitrail.dev and api.github.com. Consider adding the "json" feature for convenient request/response body handling:

    features = ["rustls", "json"]
#!/bin/bash
# Description: Verify reqwest usage patterns and check for certificate validation bypass

# Check for danger_accept_invalid_certs usage
echo "=== Checking for certificate validation bypass ==="
rg -n "danger_accept_invalid_certs" --type rust -C 3

# Check for reqwest Client creation
echo -e "\n=== Checking reqwest Client initialization ==="
rg -nP "reqwest::Client" --type rust -C 5

# Check for JSON body usage
echo -e "\n=== Checking for JSON request/response usage ==="
rg -nP "\.json\(|\.body\(.*json" --type rust -C 2 | head -30
🧹 Nitpick comments (1)
src/tui.rs (1)

2049-2055: Redundant truncate after resize.

Vec::resize already handles both growing and shrinking. When *filtered_count < old_len, the resize call on line 2051 will truncate, making the subsequent truncate on line 2053 a no-op.

🔎 Proposed simplification
 fn update_day_filters(filters: &mut Vec<Option<CompactDate>>, filtered_count: &usize) {
-    let old_len = filters.len();
     filters.resize(*filtered_count, None);
-    if *filtered_count < old_len {
-        filters.truncate(*filtered_count);
-    }
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b019197 and b9b4ff0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml
  • src/tui.rs
  • src/tui/logic.rs
  • src/types.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Follow Rust 2024 edition conventions
Use anyhow::Result for error handling in Rust code
Prefer async/await over raw futures in Rust code
Use parking_lot locks over std::sync for performance in Rust code

Files:

  • src/tui.rs
  • src/types.rs
  • src/tui/logic.rs
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.

Applied to files:

  • src/tui.rs
  • src/types.rs
  • src/tui/logic.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system

Applied to files:

  • src/tui.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Follow Rust 2024 edition conventions

Applied to files:

  • Cargo.toml
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code

Applied to files:

  • Cargo.toml
🧬 Code graph analysis (2)
src/tui.rs (3)
src/types.rs (4)
  • resolve_model (32-34)
  • from_str (64-66)
  • from (485-494)
  • cost (465-467)
src/tui/logic.rs (1)
  • has_data_shared (122-124)
src/utils.rs (1)
  • format_date_for_display (82-104)
src/types.rs (2)
src/tui/logic.rs (1)
  • aggregate_sessions_from_messages (131-179)
src/utils.rs (1)
  • aggregate_by_date (107-227)
🔇 Additional comments (26)
src/tui.rs (6)

56-83: LGTM - Clean zero-allocation iterator abstraction.

The EitherIter implementation correctly delegates Iterator methods to the underlying variant, providing proper size_hint for both branches.


98-145: LGTM - Appropriate async/sync boundary for TUI.

The block_in_place + block_on pattern correctly bridges the async runtime with the blocking TUI event loop. The spawned task for watcher events is appropriate since it handles channel reception rather than heavy I/O. Based on learnings, this approach aligns with the preference for minimal async in TUI paths.


175-182: Good lock hygiene pattern.

Acquiring the lock, extracting what's needed, and releasing immediately prevents contention. This pattern is consistently applied throughout the file.


1062-1089: LGTM - Safe accumulation with appropriate types.

Using u64 for totals when summing u32 daily values prevents overflow. The explicit casts (as u64) are correct.


1459-1468: LGTM - Session stats table implementation.

The function follows the same safe patterns as daily stats, with proper type widening for accumulation and correct model resolution via resolve_model.


1881-1928: LGTM - Summary stats with sequential lock acquisition.

The function correctly acquires read locks one at a time per analyzer view, preventing deadlock. The accumulation pattern is safe with u64 totals.

src/tui/logic.rs (5)

1-6: LGTM - Imports properly ordered.

The imports follow Rust conventions with crate imports followed by std imports. The re-export of SessionAggregate provides a clean API surface.


8-19: LGTM - Safe accumulation with saturating arithmetic.

The function correctly uses saturating_add to prevent overflow. The as u32 casts from u64 are acceptable here since per-message token counts won't exceed u32::MAX in practice.


108-124: LGTM - Clean data presence check with minimal lock duration.

has_data_shared correctly delegates to has_data_view while holding the read lock only for the duration of the check.


126-179: LGTM - Efficient session aggregation.

Good use of or_insert_with_key to minimize cloning. The model reference counting via ModelCounts::increment correctly tracks model occurrences per session, enabling accurate incremental update subtraction.


181-209: LGTM - Basic test coverage for has_data_view.

The tests verify the core behavior. Consider adding a test case for num_conversations = 0 with non-empty daily_stats containing non-zero values to fully exercise the any() branch.

src/types.rs (8)

15-34: LGTM - Efficient model name interning.

ThreadedRodeo with LazyLock provides thread-safe, lock-free reads after initial interning. The 4-byte Spur keys significantly reduce memory compared to repeated String allocations for model names.


40-91: LGTM - Compact date representation.

The 4-byte struct with checked parsing arithmetic is efficient. Per past discussion, range validation for month/day is intentionally omitted since input comes from coding tools with well-formed dates.


130-181: LGTM - Compact reference-counted model tracking.

The TinyVec with inline capacity for 3 models is well-tuned for the common case. The decrement method correctly uses saturating subtraction and removes zero-count entries, fixing the previous model removal bug identified in past reviews.


448-517: LGTM - Efficient TUI-specific stats with safe arithmetic.

Storing cost as integer cents (u32) avoids floating-point precision issues. All arithmetic uses saturating operations, and the 24-byte struct is significantly smaller than the full Stats (320 bytes).


268-295: LGTM - Reference-counted model tracking in DailyStats.

The SubAssign implementation correctly decrements model counts and removes entries when count reaches zero, maintaining accurate incremental update semantics.


613-693: LGTM - Correct incremental update implementation.

The add_contribution and subtract_contribution methods correctly handle model reference counting via ModelCounts::increment/decrement, addressing the previous review concern about incorrect model removal during partial updates.


594-611: LGTM - Clean FileContribution construction.

The method correctly composes existing aggregation functions and filters out invalid "unknown" dates. Using session_aggregates.len() as conversation_count is accurate since each session represents one conversation.


824-965: LGTM - Comprehensive model reference counting tests.

The test suite thoroughly verifies the incremental update semantics, including partial subtraction scenarios that would have exposed the previous bug. Good coverage of edge cases like multiple files contributing the same model to the same session.

Cargo.toml (7)

8-12: LGTM! Clear allocator feature configuration.

The feature flag structure correctly makes mimalloc optional, and the comment provides helpful guidance for memory profiling scenarios.


24-24: Good choice for memory optimization.

Adding lasso with the multi-threaded feature aligns well with the PR's goal of using Spur interning for model names. This should reduce memory allocations for repeated strings while remaining thread-safe with rayon.


20-20: Migration from jwalk to walkdir is complete and correctly implemented.

The codebase has been fully migrated with no remaining jwalk references. Parallel processing is appropriately handled via rayon at the file parsing level (as seen in src/watcher.rs, src/main.rs, and src/analyzer.rs), which aligns with the PR's optimization goals. The sequential walkdir traversal is suitable for the flat directory structures used throughout the analyzer data directories.


15-16: No action needed. The versions are compatible—mimalloc 0.1.48 requires libmimalloc-sys ^0.1.44, which includes 0.1.44, so this pairing is correct.

Likely an incorrect or invalid review comment.


17-17: Remove non-standard metadata tags and verify dependency claims with git context.

The dependencies listed (serde 1.0.228, toml 0.9.10, notify 8.2, phf 0.13.1, serde_bytes 0.11.19, simd-json 0.17.0, tiktoken-rs 0.9.1, parking_lot 0.12, tinyvec 1.8, rusqlite 0.38.0, rmcp 0.12.0, schemars 1.2, clap 4.5.53, tempfile 3.24) exist in Cargo.toml. However, the original comment includes review tool metadata tags (, ) that should not appear in final review output. Additionally, specific version transition claims (e.g., "simd-json from 0.15 to 0.17") cannot be verified without git diff context showing actual before/after changes. Include git evidence of the actual version changes made by this PR before finalizing the review assessment.


29-29: Ratatui 0.30.0 is compatible with the codebase.

The TUI code uses standard, stable APIs that remain unchanged in 0.30.0:

  • Backend: Uses built-in CrosstermBackend (not affected by Backend trait changes)
  • Frame rendering: Standard render_widget() and render_stateful_widget() calls (unchanged)
  • Widgets: Block, Cell, Paragraph, Row, Table, TableState, Tabs, Wrap are all compatible (none had breaking changes)
  • Style: Basic Color/Modifier/Style usage is unaffected by const style changes
  • Tests: No TestBackend usage; thus error type changes don't apply

No compatibility issues detected.


43-43: Clarify bincode 2.0.1 dependency purpose.

bincode 2.0.1 is listed in Cargo.toml but no usage was found in the codebase. If this dependency is not yet integrated, defer this review until the code consuming bincode is added. If bincode is unused, remove it from the dependency list to reduce maintenance overhead.

Likely an incorrect or invalid review comment.

The previous implementation tried to reconstruct which messages came from
which source file using conversation_hash, assuming it equaled hash_text(path).
This failed for analyzers that compute conversation_hash differently (Cline,
RooCode, KiloCode, OpenCode, Copilot, Piebald).

Fix by adding parse_sources_parallel_with_paths which returns messages grouped
by source path, preserving the association naturally during parallel parsing.
Contribution computation is also parallelized across sources within each analyzer.
@Sewer56 Sewer56 marked this pull request as draft January 2, 2026 05:57
@Sewer56
Copy link
Contributor Author

Sewer56 commented Jan 2, 2026

I still have a small part to rethink.
per #99 (comment)

RAM will go up a bit again; because I want to remain correct/accurate; while also being accommodating to other solutions.

Drafting.

…dates

Commit 5a3949e fixed file contribution caching but caused ~1GB RAM increase
for users with many files (e.g., 90,000 OpenCode message files). The fix
correctly cached contributions for all analyzers, but used a heavyweight
FileContribution struct (~100+ bytes/file) even for single-message-per-file
analyzers.

This implements a three-tier contribution caching system based on analyzer
data structure:

1. SingleMessage (~40 bytes): For 1-file-1-message analyzers (OpenCode)
   - Uses CompactMessageStats with u16 cost, u8 tool_calls
   - Expected savings: 90,000 files × (100 - 40) = ~5.4 MB

2. SingleSession (~72 bytes): For 1-file-1-session analyzers (most others)
   - Stores aggregated stats per file without individual messages
   - Expected savings: 90,000 files × (100 - 72) = ~2.5 MB

3. MultiSession (~100+ bytes): For all-in-one-file analyzers (Piebald)
   - Full MultiSessionContribution with Vec<ConversationMessage>

Each analyzer now explicitly declares its strategy via the required
contribution_strategy() trait method (no default implementation).
Split the monolithic contribution_cache.rs (571 lines) into a proper
module structure for better organization and maintainability:

- mod.rs: Common types (PathHash, ContributionStrategy, CompactMessageStats),
  ContributionCache wrapper, RemovedContribution enum, and AnalyzerStatsView
  extension methods
- single_message.rs: SingleMessageContribution (~40 bytes per file)
- single_session.rs: SingleSessionContribution (~72 bytes per file)
- multi_session.rs: MultiSessionContribution (~100+ bytes per file)

No functional changes - pure refactoring with all re-exports preserved.
- Reorganize tests from single tests.rs into tests/ directory
- Group tests by contribution strategy: single_message, single_session, multi_session
- Separate compact_stats and basic_operations tests
- Move misplaced edge case tests to their proper strategy files
- Fix misleading docstrings that claimed to simulate reload_file_incremental
Replace raw u64 session_hash fields with a proper SessionHash wrapper type,
following the same pattern as PathHash for type safety and clarity.
Move model string interning into dedicated src/cache/ module with a
ModelKey newtype wrapper around MiniSpur. This reduces memory per
model reference from 4 bytes (Spur) to 2 bytes (MiniSpur), which is
sufficient for <65536 unique model names.
Use c2rust-bitfields to pack stats and date into a 176-bit bitfield,
achieving cache-aligned 32-byte struct size. Removes CompactMessageStats
in favor of PackedStatsDate with compile-time size assertion.
@Sewer56 Sewer56 marked this pull request as ready for review January 3, 2026 05:31
@Sewer56
Copy link
Contributor Author

Sewer56 commented Jan 3, 2026

Final memory usage on my end is around 25MB, from 580-600MB.
Dataset of 91000 messages.

Opencode is pretty much worst case scenario as 1 message == 1 file.
Cheap for incrementals, but uses more memory.

I didn't optimise the rest as memory heavy as Opencode, where I went down to bitpacking. Memory usage for cases where 1 file == 1 session or 1 file == everything should be negligible in comparison.

image

I noted the bitpacked structure in code.
Needless to say, I've been generous, no chance a single message will ever hit 134M input tokens.

Duration is a placeholder. I may show tokens/s on supported analyzers in future.

This needs some testing on non-Opencode. Tests pass, there are tests for it. But, I don't have a large enough dataset to get good numbers.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/watcher.rs (1)

519-519: Tautological assertion provides no coverage.

The assertion updated.analyzer_stats.is_empty() || !updated.analyzer_stats.is_empty() is always true and doesn't verify any meaningful behavior.

Suggested fix

Consider asserting something meaningful, or remove the assertion and rely on the "doesn't panic" behavior being sufficient:

-        assert!(updated.analyzer_stats.is_empty() || !updated.analyzer_stats.is_empty());
+        // Test verifies handling doesn't panic - no specific assertion needed
+        let _ = updated.analyzer_stats.len();
♻️ Duplicate comments (2)
Cargo.toml (1)

58-61: Reqwest TLS configuration acknowledged.

The "rustls" feature name is correct for reqwest 0.13.1 (renamed from "rustls-tls" in the breaking change). As noted in previous review discussion, the TLS certificate validation configuration in src/upload.rs should be verified separately.

src/types.rs (1)

396-437: Stats subtraction correctly uses saturating arithmetic for integers.

All integer fields use saturating_sub to prevent underflow. The cost field (line 406) uses direct subtraction which could theoretically produce small negative values due to floating-point precision, but this was discussed in a previous review and deemed acceptable for practical usage.

🧹 Nitpick comments (12)
src/analyzers/gemini_cli.rs (1)

311-347: Consider extracting shared WalkDir logic.

The discover_data_sources and is_available methods share nearly identical WalkDir traversal logic. While is_available correctly uses .any() for early exit (which is more efficient), consider extracting the common filtering predicate to reduce duplication.

That said, this is a minor concern—the current implementation is clear and correct.

🔎 Potential refactor to reduce duplication
fn chat_json_entries() -> impl Iterator<Item = walkdir::DirEntry> {
    Self::data_dir()
        .filter(|d| d.is_dir())
        .into_iter()
        .flat_map(|tmp_dir| WalkDir::new(tmp_dir).min_depth(3).max_depth(3).into_iter())
        .filter_map(|e| e.ok())
        .filter(|e| {
            e.file_type().is_file()
                && e.path().extension().is_some_and(|ext| ext == "json")
                && e.path()
                    .parent()
                    .and_then(|p| p.file_name())
                    .is_some_and(|name| name == "chats")
        })
}

Then use chat_json_entries().map(...).collect() in discover_data_sources and chat_json_entries().next().is_some() in is_available.

src/analyzers/qwen_code.rs (1)

346-352: Consider logging parse failures in parse_sources_parallel.

The unwrap_or_default() at line 349 silently drops parse errors. While graceful degradation is appropriate for robustness, consider adding debug-level logging so users can diagnose issues with malformed files.

🔎 Suggested improvement
     fn parse_sources_parallel(&self, sources: &[DataSource]) -> Vec<ConversationMessage> {
         let all_messages: Vec<ConversationMessage> = sources
             .par_iter()
-            .flat_map(|source| self.parse_source(source).unwrap_or_default())
+            .flat_map(|source| {
+                self.parse_source(source).unwrap_or_else(|e| {
+                    tracing::debug!("Failed to parse {:?}: {}", source.path, e);
+                    Vec::new()
+                })
+            })
             .collect();
         crate::utils::deduplicate_by_local_hash(all_messages)
     }
src/main.rs (3)

217-219: Consider propagating threadpool creation errors.

Using expect() will panic if threadpool creation fails (e.g., due to system resource exhaustion). Consider propagating the error with ? and handling it gracefully to provide better error messaging.

🔎 Proposed refactor
-    let mut stats_manager = {
-        let pool = rayon::ThreadPoolBuilder::new()
-            .build()
-            .expect("Failed to create rayon threadpool");
+    let mut stats_manager = match (|| -> Result<_> {
+        let pool = rayon::ThreadPoolBuilder::new().build()?;
+        pool.install(|| watcher::RealtimeStatsManager::new(registry))
+    })() {
+        Ok(manager) => manager,
+        Err(e) => {
+            eprintln!("Error loading analyzer stats: {e}");
+            std::process::exit(1);
+        }
+    };
-
-        let result = pool.install(|| watcher::RealtimeStatsManager::new(registry));
-
-        // Pool is dropped here, releasing threads
-        match result {
-            Ok(manager) => manager,
-            Err(e) => {
-                eprintln!("Error loading analyzer stats: {e}");
-                std::process::exit(1);
-            }
-        }
-    };

264-277: Consider simplifying configuration status logic.

The nested conditionals for determining which configuration is missing are verbose. Consider extracting this into a helper function or using a more concise pattern.

🔎 Proposed refactor
        } else {
            // Auto-upload is enabled but configuration is incomplete
            let mut status = upload_status.lock();
-            if config.is_api_token_missing() && config.is_server_url_missing() {
-                *status = tui::UploadStatus::MissingConfig;
-            } else if config.is_api_token_missing() {
-                *status = tui::UploadStatus::MissingApiToken;
-            } else if config.is_server_url_missing() {
-                *status = tui::UploadStatus::MissingServerUrl;
-            } else {
-                // Shouldn't happen since is_configured() returned false
-                *status = tui::UploadStatus::MissingConfig;
-            }
+            *status = match (config.is_api_token_missing(), config.is_server_url_missing()) {
+                (true, true) => tui::UploadStatus::MissingConfig,
+                (true, false) => tui::UploadStatus::MissingApiToken,
+                (false, true) => tui::UploadStatus::MissingServerUrl,
+                (false, false) => tui::UploadStatus::MissingConfig, // Shouldn't happen
+            };
        }

445-460: Expand the SAFETY comment for the FFI call.

The SAFETY comment states that mi_collect is a "safe FFI call" but could be more specific. FFI calls are inherently unsafe in Rust, and the comment should clarify what makes this particular call safe (e.g., no preconditions on state, thread-safe, no memory corruption risks).

🔎 Proposed documentation improvement
 #[cfg(feature = "mimalloc")]
 pub fn release_unused_memory() {
-    // SAFETY: mi_collect is a safe FFI call that triggers garbage collection
-    // and returns unused memory to the OS. The `force` parameter (true) ensures
-    // aggressive collection.
+    // SAFETY: mi_collect is a thread-safe C function with no preconditions on allocator state.
+    // It triggers garbage collection and returns unused memory to the OS without risk of
+    // memory corruption. The `force` parameter (true) ensures aggressive collection.
     unsafe {
         libmimalloc_sys::mi_collect(true);
     }
 }
src/analyzers/opencode.rs (2)

480-498: Consider caching context for incremental updates.

For single-file incremental updates, parse_source loads the entire projects and sessions context fresh each time. While simple, this may be inefficient for frequent file changes. Consider caching this shared context at the analyzer level to avoid repeated I/O for metadata that rarely changes.

Note: The current approach prioritizes simplicity and correctness. If incremental update performance becomes a concern in practice, a shared context cache could be added.


522-531: Consider logging parse errors in parallel processing.

The filter_map silently discards files that fail to parse. For debugging and monitoring, consider logging parse errors (e.g., using warn_once or eprintln!) to help diagnose data corruption or format changes.

🔎 Proposed improvement
         sources
             .par_iter()
             .filter_map(|source| {
                 let content = fs::read_to_string(&source.path).ok()?;
                 let mut bytes = content.into_bytes();
-                let msg = simd_json::from_slice::<OpenCodeMessage>(&mut bytes).ok()?;
+                let msg = match simd_json::from_slice::<OpenCodeMessage>(&mut bytes) {
+                    Ok(msg) => msg,
+                    Err(e) => {
+                        eprintln!("Warning: Failed to parse OpenCode file {:?}: {}", source.path, e);
+                        return None;
+                    }
+                };
                 let conversation_msg =
                     to_conversation_message(msg, &sessions, &projects, &part_root);
                 Some((source.path.clone(), vec![conversation_msg]))
             })
             .collect()
src/analyzers/pi_agent.rs (1)

459-465: Consider logging parse errors in parallel processing.

Similar to other analyzers, parse errors are silently discarded with unwrap_or_default(). Consider logging errors for debugging purposes.

🔎 Proposed improvement
     fn parse_sources_parallel(&self, sources: &[DataSource]) -> Vec<ConversationMessage> {
         let all_messages: Vec<ConversationMessage> = sources
             .par_iter()
-            .flat_map(|source| self.parse_source(source).unwrap_or_default())
+            .flat_map(|source| {
+                self.parse_source(source).unwrap_or_else(|e| {
+                    eprintln!("Warning: Failed to parse Pi Agent file {:?}: {}", source.path, e);
+                    Vec::new()
+                })
+            })
             .collect();
         crate::utils::deduplicate_by_local_hash(all_messages)
     }
src/watcher.rs (1)

244-310: Note: Tokio still used for upload despite learning to avoid it.

Per retrieved learnings, Tokio should be avoided in TUI background I/O paths in favor of blocking calls on dedicated threads. The tokio::spawn at line 298 contradicts this, though the async upload with reqwest makes this a reasonable interim choice.

Consider migrating to std::thread::spawn with blocking reqwest if you proceed with removing Tokio from TUI code as mentioned in PR comments.

src/contribution_cache/mod.rs (2)

43-48: Consider renaming from_str to avoid trait shadowing.

SessionHash::from_str shadows the std::str::FromStr trait method name. While not technically wrong, it could cause confusion. Consider from_session_id or hash_str for clarity.


365-377: Empty sessions not removed after subtraction.

Unlike daily_stats which are removed when empty (lines 356-361), session_aggregates entries with zeroed stats remain in the vector. This is likely intentional to avoid O(n) removal, but over time could accumulate stale entries.

If this becomes a concern, consider periodic compaction or removal during subtract:

// After subtracting, optionally remove truly empty sessions
self.session_aggregates.retain(|s| !s.stats.is_zero());
src/analyzer.rs (1)

933-937: Test relies on rayon's global threadpool.

The test calls load_all_stats_parallel() without an explicit rayon pool, relying on rayon's global pool. This works but contradicts the doc comment "Must be called within a rayon threadpool context." Consider clarifying that the global pool is acceptable, or use load_all_stats_parallel_scoped() in tests for consistency.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9b4ff0 and 67e4294.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • Cargo.toml
  • src/analyzer.rs
  • src/analyzers/claude_code.rs
  • src/analyzers/cline.rs
  • src/analyzers/codex_cli.rs
  • src/analyzers/copilot.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/kilo_code.rs
  • src/analyzers/opencode.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/piebald.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/roo_code.rs
  • src/cache/mod.rs
  • src/cache/model_intern.rs
  • src/contribution_cache/mod.rs
  • src/contribution_cache/multi_session.rs
  • src/contribution_cache/single_message.rs
  • src/contribution_cache/single_session.rs
  • src/contribution_cache/tests/basic_operations.rs
  • src/contribution_cache/tests/compact_stats.rs
  • src/contribution_cache/tests/mod.rs
  • src/contribution_cache/tests/multi_session.rs
  • src/contribution_cache/tests/single_message.rs
  • src/contribution_cache/tests/single_session.rs
  • src/main.rs
  • src/types.rs
  • src/watcher.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/analyzers/cline.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Follow Rust 2024 edition conventions
Use anyhow::Result for error handling in Rust code
Prefer async/await over raw futures in Rust code
Use parking_lot locks over std::sync for performance in Rust code

Files:

  • src/contribution_cache/tests/single_message.rs
  • src/contribution_cache/tests/basic_operations.rs
  • src/contribution_cache/single_session.rs
  • src/contribution_cache/tests/mod.rs
  • src/contribution_cache/tests/compact_stats.rs
  • src/cache/mod.rs
  • src/contribution_cache/tests/multi_session.rs
  • src/analyzers/copilot.rs
  • src/contribution_cache/single_message.rs
  • src/main.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/kilo_code.rs
  • src/contribution_cache/tests/single_session.rs
  • src/analyzers/claude_code.rs
  • src/contribution_cache/multi_session.rs
  • src/analyzers/opencode.rs
  • src/contribution_cache/mod.rs
  • src/analyzers/piebald.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/roo_code.rs
  • src/analyzer.rs
  • src/watcher.rs
  • src/types.rs
  • src/cache/model_intern.rs
  • src/analyzers/codex_cli.rs
src/analyzers/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/analyzers/*.rs: Implement the Analyzer trait for new analyzers in the pluggable architecture system
Place individual analyzer implementations in src/analyzers/ directory with Analyzer trait implementation
Analyzers should discover data files using platform-specific paths in their implementation

Files:

  • src/analyzers/copilot.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/kilo_code.rs
  • src/analyzers/claude_code.rs
  • src/analyzers/opencode.rs
  • src/analyzers/piebald.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/roo_code.rs
  • src/analyzers/codex_cli.rs
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation
📚 Learning: 2026-01-02T01:09:18.967Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: src/watcher.rs:0-0
Timestamp: 2026-01-02T01:09:18.967Z
Learning: In Rust code related to the TUI (e.g., src/tui.rs, src/watcher.rs, and related modules), avoid using Tokio in background I/O paths. Prefer blocking I/O on a dedicated thread or a small thread pool spawned specifically for that work, rather than performing async/await with a Tokio runtime. Use std::thread::spawn or a local threadpool (e.g., threadpool, rayon) for tasks like server uploads; ensure non-blocking UI by offloading blocking work and minimize await points on the main UI thread.

Applied to files:

  • src/contribution_cache/tests/single_message.rs
  • src/contribution_cache/tests/basic_operations.rs
  • src/contribution_cache/single_session.rs
  • src/contribution_cache/tests/mod.rs
  • src/contribution_cache/tests/compact_stats.rs
  • src/cache/mod.rs
  • src/contribution_cache/tests/multi_session.rs
  • src/analyzers/copilot.rs
  • src/contribution_cache/single_message.rs
  • src/main.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/kilo_code.rs
  • src/contribution_cache/tests/single_session.rs
  • src/analyzers/claude_code.rs
  • src/contribution_cache/multi_session.rs
  • src/analyzers/opencode.rs
  • src/contribution_cache/mod.rs
  • src/analyzers/piebald.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/roo_code.rs
  • src/analyzer.rs
  • src/watcher.rs
  • src/types.rs
  • src/cache/model_intern.rs
  • src/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Place individual analyzer implementations in `src/analyzers/` directory with Analyzer trait implementation

Applied to files:

  • src/analyzers/copilot.rs
  • src/main.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/kilo_code.rs
  • src/analyzers/claude_code.rs
  • src/analyzers/opencode.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/roo_code.rs
  • src/analyzer.rs
  • src/watcher.rs
  • src/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Analyzers should discover data files using platform-specific paths in their implementation

Applied to files:

  • src/analyzers/copilot.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/kilo_code.rs
  • src/analyzers/claude_code.rs
  • src/analyzers/opencode.rs
  • src/analyzers/piebald.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/roo_code.rs
  • src/analyzer.rs
  • src/watcher.rs
  • src/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to src/analyzers/*.rs : Implement the `Analyzer` trait for new analyzers in the pluggable architecture system

Applied to files:

  • src/analyzers/copilot.rs
  • src/main.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/kilo_code.rs
  • src/analyzers/claude_code.rs
  • src/contribution_cache/multi_session.rs
  • src/analyzers/opencode.rs
  • src/contribution_cache/mod.rs
  • src/analyzers/piebald.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzers/roo_code.rs
  • src/analyzer.rs
  • src/watcher.rs
  • src/analyzers/codex_cli.rs
📚 Learning: 2026-01-02T01:31:46.893Z
Learnt from: Sewer56
Repo: Piebald-AI/splitrail PR: 99
File: .agents/PERFORMANCE.md:0-0
Timestamp: 2026-01-02T01:31:46.893Z
Learning: Splitrail uses `walkdir` (not `jwalk`) for directory traversal because most analyzer data directories have flat structures where parallel directory walking provides no benefit. Parallel processing is applied at the file parsing level via rayon instead, where the actual performance gains are realized.

Applied to files:

  • src/analyzers/copilot.rs
  • src/analyzers/qwen_code.rs
  • src/analyzers/pi_agent.rs
  • src/analyzers/claude_code.rs
  • src/analyzers/opencode.rs
  • src/analyzers/gemini_cli.rs
  • src/analyzer.rs
  • src/analyzers/codex_cli.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Use `parking_lot` locks over `std::sync` for performance in Rust code

Applied to files:

  • src/main.rs
  • Cargo.toml
  • src/watcher.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Use hash-based deduplication with global hash field for message deduplication

Applied to files:

  • src/analyzers/claude_code.rs
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Follow Rust 2024 edition conventions

Applied to files:

  • Cargo.toml
📚 Learning: 2026-01-01T00:15:32.862Z
Learnt from: CR
Repo: Piebald-AI/splitrail PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-01T00:15:32.862Z
Learning: Applies to **/*.rs : Prefer `async/await` over raw futures in Rust code

Applied to files:

  • src/watcher.rs
🧬 Code graph analysis (21)
src/contribution_cache/tests/single_message.rs (3)
src/contribution_cache/tests/mod.rs (2)
  • make_message (29-69)
  • make_view_with_session (82-98)
src/contribution_cache/single_message.rs (3)
  • from_message (155-161)
  • date (165-167)
  • hash_session_id (177-179)
src/contribution_cache/mod.rs (2)
  • new (34-36)
  • new (94-100)
src/contribution_cache/tests/basic_operations.rs (3)
src/contribution_cache/tests/mod.rs (1)
  • make_message (29-69)
src/contribution_cache/mod.rs (5)
  • from_str (46-48)
  • default (87-89)
  • get_single_message (126-128)
  • get_single_session (140-142)
  • get_multi_session (154-156)
src/contribution_cache/single_message.rs (1)
  • from_message (155-161)
src/contribution_cache/tests/mod.rs (3)
src/types.rs (2)
  • cost (456-458)
  • from (476-485)
src/contribution_cache/single_message.rs (1)
  • date (165-167)
src/contribution_cache/mod.rs (4)
  • default (87-89)
  • new (34-36)
  • new (94-100)
  • from_str (46-48)
src/contribution_cache/tests/compact_stats.rs (2)
src/contribution_cache/single_message.rs (2)
  • date (165-167)
  • pack (84-105)
src/types.rs (4)
  • from_parts (44-46)
  • year (50-52)
  • month (56-58)
  • day (62-64)
src/cache/mod.rs (1)
src/cache/model_intern.rs (2)
  • intern_model (33-35)
  • resolve_model (39-41)
src/contribution_cache/tests/multi_session.rs (1)
src/contribution_cache/tests/mod.rs (2)
  • make_empty_view (72-79)
  • make_message (29-69)
src/analyzers/copilot.rs (2)
src/utils.rs (1)
  • hash_text (253-257)
src/analyzers/cline.rs (4)
  • new (20-22)
  • parse_source (312-314)
  • is_valid_data_path (320-322)
  • contribution_strategy (324-326)
src/contribution_cache/single_message.rs (3)
src/cache/model_intern.rs (1)
  • intern_model (33-35)
src/types.rs (6)
  • from_parts (44-46)
  • from_str (68-70)
  • from_local (32-40)
  • year (50-52)
  • month (56-58)
  • day (62-64)
src/contribution_cache/mod.rs (2)
  • default (87-89)
  • from_str (46-48)
src/main.rs (2)
src/watcher.rs (2)
  • new (34-69)
  • new (137-152)
src/upload.rs (1)
  • perform_background_upload (337-370)
src/analyzers/qwen_code.rs (6)
src/utils.rs (2)
  • hash_text (253-257)
  • deduplicate_by_local_hash (279-293)
src/analyzers/cline.rs (6)
  • simd_json (165-165)
  • new (20-22)
  • is_available (308-310)
  • parse_source (312-314)
  • is_valid_data_path (320-322)
  • contribution_strategy (324-326)
src/analyzers/codex_cli.rs (11)
  • simd_json (241-241)
  • simd_json (251-251)
  • simd_json (260-260)
  • simd_json (296-296)
  • simd_json (432-432)
  • data_dir (27-29)
  • new (23-25)
  • is_available (66-75)
  • parse_source (77-80)
  • is_valid_data_path (97-100)
  • contribution_strategy (102-104)
src/analyzers/opencode.rs (10)
  • simd_json (219-219)
  • simd_json (258-258)
  • simd_json (296-296)
  • simd_json (493-493)
  • data_dir (26-28)
  • new (22-24)
  • is_available (464-478)
  • parse_source (480-498)
  • is_valid_data_path (593-605)
  • contribution_strategy (608-610)
src/analyzers/gemini_cli.rs (7)
  • data_dir (22-24)
  • new (18-20)
  • is_available (333-347)
  • parse_source (349-351)
  • parse_json_session_file (201-292)
  • is_valid_data_path (368-376)
  • contribution_strategy (378-380)
src/analyzers/piebald.rs (4)
  • new (21-23)
  • parse_source (246-251)
  • is_valid_data_path (269-272)
  • contribution_strategy (275-277)
src/analyzers/pi_agent.rs (3)
src/utils.rs (1)
  • deduplicate_by_local_hash (279-293)
src/analyzers/claude_code.rs (4)
  • data_dir (33-35)
  • parse_source (75-82)
  • parse_jsonl_file (522-685)
  • is_valid_data_path (240-251)
src/analyzers/qwen_code.rs (3)
  • data_dir (22-24)
  • parse_source (342-344)
  • is_valid_data_path (361-369)
src/analyzers/kilo_code.rs (5)
src/analyzers/cline.rs (5)
  • is_available (308-310)
  • parse_source (312-314)
  • get_watch_directories (316-318)
  • is_valid_data_path (320-322)
  • contribution_strategy (324-326)
src/analyzers/codex_cli.rs (5)
  • is_available (66-75)
  • parse_source (77-80)
  • get_watch_directories (90-95)
  • is_valid_data_path (97-100)
  • contribution_strategy (102-104)
src/analyzers/roo_code.rs (5)
  • is_available (332-334)
  • parse_source (336-338)
  • get_watch_directories (340-342)
  • is_valid_data_path (344-346)
  • contribution_strategy (348-350)
src/watcher.rs (4)
  • is_available (381-383)
  • parse_source (373-375)
  • get_watch_directories (385-387)
  • contribution_strategy (389-391)
src/analyzers/piebald.rs (4)
  • parse_source (246-251)
  • get_watch_directories (261-267)
  • is_valid_data_path (269-272)
  • contribution_strategy (275-277)
src/contribution_cache/tests/single_session.rs (5)
src/contribution_cache/tests/mod.rs (2)
  • make_message (29-69)
  • make_view_with_session (82-98)
src/contribution_cache/single_session.rs (1)
  • from_messages (29-56)
src/contribution_cache/single_message.rs (1)
  • date (165-167)
src/cache/model_intern.rs (1)
  • intern_model (33-35)
src/types.rs (2)
  • get (168-170)
  • from (476-485)
src/contribution_cache/multi_session.rs (2)
src/tui/logic.rs (1)
  • aggregate_sessions_from_messages (131-179)
src/utils.rs (1)
  • aggregate_by_date (107-227)
src/contribution_cache/mod.rs (3)
src/analyzer.rs (2)
  • new (297-305)
  • default (290-292)
src/types.rs (2)
  • new (143-145)
  • from_str (68-70)
src/contribution_cache/single_message.rs (2)
  • date (165-167)
  • hash_session_id (177-179)
src/analyzers/roo_code.rs (5)
src/analyzer.rs (14)
  • vscode_extension_has_sources (108-118)
  • is_available (231-234)
  • is_available (846-848)
  • is_available (1004-1006)
  • parse_source (173-173)
  • parse_source (821-823)
  • parse_source (991-993)
  • get_watch_directories (219-219)
  • get_watch_directories (850-856)
  • get_watch_directories (1008-1010)
  • is_valid_data_path (224-226)
  • contribution_strategy (240-240)
  • contribution_strategy (858-860)
  • contribution_strategy (1012-1014)
src/analyzers/cline.rs (5)
  • is_available (308-310)
  • parse_source (312-314)
  • get_watch_directories (316-318)
  • is_valid_data_path (320-322)
  • contribution_strategy (324-326)
src/analyzers/kilo_code.rs (5)
  • is_available (303-305)
  • parse_source (307-309)
  • get_watch_directories (311-313)
  • is_valid_data_path (315-317)
  • contribution_strategy (319-321)
src/watcher.rs (4)
  • is_available (381-383)
  • parse_source (373-375)
  • get_watch_directories (385-387)
  • contribution_strategy (389-391)
src/analyzers/piebald.rs (4)
  • parse_source (246-251)
  • get_watch_directories (261-267)
  • is_valid_data_path (269-272)
  • contribution_strategy (275-277)
src/analyzer.rs (4)
src/contribution_cache/mod.rs (2)
  • new (34-36)
  • new (94-100)
src/utils.rs (1)
  • deduplicate_by_global_hash (266-274)
src/contribution_cache/single_message.rs (2)
  • date (165-167)
  • from_message (155-161)
src/contribution_cache/single_session.rs (1)
  • from_messages (29-56)
src/watcher.rs (2)
src/config.rs (1)
  • load (82-94)
src/upload.rs (2)
  • perform_background_upload_messages (291-333)
  • perform_background_upload_messages (368-368)
src/types.rs (3)
src/tui/logic.rs (1)
  • aggregate_sessions_from_messages (131-179)
src/cache/model_intern.rs (2)
  • intern_model (33-35)
  • resolve_model (39-41)
src/mcp/types.rs (1)
  • new (118-135)
src/cache/model_intern.rs (1)
src/types.rs (1)
  • new (143-145)
src/analyzers/codex_cli.rs (3)
src/analyzers/cline.rs (5)
  • new (20-22)
  • is_available (308-310)
  • parse_source (312-314)
  • is_valid_data_path (320-322)
  • contribution_strategy (324-326)
src/analyzers/copilot.rs (5)
  • new (28-30)
  • is_available (473-486)
  • parse_source (488-490)
  • is_valid_data_path (496-504)
  • contribution_strategy (506-508)
src/analyzers/piebald.rs (5)
  • new (21-23)
  • parse_source (246-251)
  • parse_sources_parallel (253-259)
  • is_valid_data_path (269-272)
  • contribution_strategy (275-277)
🔇 Additional comments (72)
src/contribution_cache/single_session.rs (1)

27-56: Well-structured contribution aggregation.

The from_messages implementation correctly:

  • Uses the first message for date and session_hash initialization
  • Only counts AI messages (those with a model) for ai_message_count
  • Properly interns model names for memory efficiency
  • Accumulates stats using the TuiStats::from(&msg.stats) conversion

Note: An empty messages slice returns a contribution with default values (zeroed stats, default date/hash). This appears intentional for the contribution cache semantics.

src/analyzers/roo_code.rs (1)

332-350: Consistent implementation following the established analyzer pattern.

The new trait methods align with the architectural shift to per-source parsing:

  • is_available uses the shared vscode_extension_has_sources helper (consistent with kilo_code.rs, cline.rs)
  • parse_source properly delegates to the existing parse_roo_code_task_directory function
  • is_valid_data_path correctly validates ui_messages.json files
  • contribution_strategy returns SingleSession, appropriate for task-directory-based analyzers where each directory represents one session

As per coding guidelines, this properly implements the Analyzer trait for the pluggable architecture system.

src/analyzers/piebald.rs (2)

246-277: Solid per-source parsing implementation for SQLite-based analyzer.

The implementation correctly:

  • Opens the database read-only with proper flags in parse_source
  • Uses rayon's par_iter for parallel source processing
  • Applies deduplicate_by_local_hash after flattening (Piebald sets local_hash from msg.id)
  • Returns MultiSession contribution strategy since the SQLite database contains all sessions in one file
  • Validates app.db specifically in is_valid_data_path

The get_watch_directories defensively filters to only existing directories, which is appropriate.


297-303: Test updated for new API surface.

The test correctly uses get_stats_with_sources(Vec::new()) and asserts on messages.is_empty(), aligning with the per-source API changes.

src/analyzers/gemini_cli.rs (2)

368-380: Trait implementations look correct.

  • is_valid_data_path properly validates .json files in a chats directory
  • contribution_strategy returns SingleSession, appropriate since each JSON file represents one session

353-359: Verify deduplication behavior with None local_hash.

The parse_sources_parallel method calls deduplicate_by_local_hash, but parse_json_session_file sets local_hash: None for all messages. Confirm whether deduplicate_by_local_hash can effectively deduplicate messages when local_hash is None, or if a different deduplication strategy (such as using a global hash field as suggested in the architecture) should be used instead.

Cargo.toml (1)

8-16: Mimalloc integration for memory optimization.

Good addition of the mimalloc allocator as a default feature with the ability to disable for profiling (--no-default-features). The libmimalloc-sys with "extended" feature enables mi_collect for the release_unused_memory() calls mentioned in the PR objectives.

src/types.rs (5)

22-95: Well-designed compact date representation.

CompactDate efficiently packs date into 4 bytes (u16 year + u8 month + u8 day) with:

  • Proper from_local conversion using chrono's local timezone
  • Manual parsing with checked arithmetic to prevent overflow
  • Correct Ord/PartialOrd implementation via tuple comparison
  • Proper Serialize/Deserialize through string representation

Previous review suggested adding month/day range validation, which was declined since dates come from trusted coding tool sources.


134-185: Efficient inline model reference counting.

ModelCounts using TinyVec<[(ModelKey, u32); 3]> is a smart optimization:

  • Inline storage for up to 3 models (common case) avoids heap allocation
  • Spills to heap only when needed
  • increment/decrement with proper cleanup (retain on non-zero counts)
  • decrement uses saturating_sub to prevent underflow

439-508: Lightweight TuiStats for memory efficiency.

Good design choices:

  • 24 bytes vs 320 bytes for full Stats
  • cost_cents as u32 avoids f32 precision issues while enabling integer arithmetic
  • saturating_add/saturating_sub in arithmetic implementations
  • From<&Stats> conversion rounds cost to cents

The u32 token fields provide sufficient capacity (~4B tokens) for per-session/per-day aggregates.


544-596: View-based architecture enables efficient TUI updates.

The SharedAnalyzerView pattern (Arc<RwLock<AnalyzerStatsView>>) is well-suited for the incremental update model:

  • Arc enables cheap cloning for channel communication
  • RwLock (from parking_lot per coding guidelines) allows concurrent reads with exclusive writes
  • into_view consumes ownership and drops messages, reducing memory footprint

The aggregate_sessions_from_messages call during conversion pre-computes session aggregates for rendering.


718-867: Comprehensive tests for model reference counting.

The tests properly verify:

  • DailyStats model add/subtract with removal at zero count
  • SessionAggregate model ref counting through add_multi_session_contribution/subtract_multi_session_contribution
  • Partial subtraction preserving remaining contributions
  • Multiple models per session

These tests address the correctness concern raised in previous reviews (now fixed in commit b019197).

src/cache/mod.rs (1)

1-5: Clean module organization for caching utilities.

Good encapsulation: the model_intern implementation details remain private while exposing only the necessary public API (ModelKey, intern_model, resolve_model). This allows internal refactoring without breaking the public interface.

src/contribution_cache/tests/basic_operations.rs (6)

15-44: LGTM: Comprehensive basic insert/get test.

The test correctly validates the round-trip behavior of inserting and retrieving a SingleMessageContribution, including stats extraction and session hash verification.


46-66: LGTM: Clean SingleSession insert/get validation.

The test properly exercises the SingleSessionContribution path and validates the retrieved data.


68-85: LGTM: MultiSession insert/get test is sound.

The test correctly validates the MultiSessionContribution flow.


87-143: LGTM: Thorough removal test covering all contribution types.

The test comprehensively validates that remove_any correctly identifies and removes each contribution type, returning the appropriate variant.


145-159: LGTM: Clear operation test is correct.

The test properly validates that clear() removes all entries from the cache.


165-177: LGTM: PathHash consistency test is essential.

The test correctly validates deterministic hashing behavior, which is critical for cache correctness.

src/cache/model_intern.rs (3)

1-8: LGTM: Well-chosen interner with appropriate size constraints.

Using ThreadedRodeo<MiniSpur> with 2-byte keys is a sound choice for model names. The assumption of < 65,536 unique models is very reasonable for this use case.


10-29: Good defensive documentation for Default implementation.

The note about Default being implemented for trait bounds but having undefined resolution behavior is important. This is good defensive programming.


31-67: LGTM: Clean API with comprehensive test coverage.

The public API is minimal and well-designed. The #[inline] annotations are appropriate for these thin wrappers, and the test coverage is thorough.

src/contribution_cache/tests/single_message.rs (5)

12-60: LGTM: Comprehensive SingleMessageContribution construction tests.

The tests thoroughly cover message construction with and without models, stat extraction, and hash consistency. The coverage is excellent.


66-137: LGTM: View add/subtract operations are well-tested.

The tests correctly validate the add/subtract semantics for SingleMessage contributions, including the important edge case of user messages (model=None) not incrementing AI message counts.


143-190: LGTM: File update flow test simulates realistic incremental updates.

The test correctly models the subtract-old/add-new pattern for incremental file updates, which is a key feature of this PR.


193-227: LGTM: File deletion test validates cleanup behavior.

The test properly validates that file deletion removes contributions and cleans up empty daily stats entries.


230-280: LGTM: Multi-file aggregation test is crucial for validation.

The test correctly validates that multiple files can contribute to the same session and that partial deletion maintains correct aggregated stats.

src/contribution_cache/tests/single_session.rs (4)

12-100: LGTM: Comprehensive SingleSessionContribution construction tests.

The tests thoroughly cover message aggregation scenarios, including empty input, single model, and multiple models with correct count tracking.


106-220: LGTM: View operations tests validate add/subtract semantics thoroughly.

The tests correctly validate daily stats, session stats, and model count tracking through add/subtract operations, including cleanup when counts reach zero.


227-353: LGTM: File lifecycle tests validate incremental update flows.

The tests properly simulate file update and deletion flows for single-session contributions, which are essential for the incremental processing feature.


356-388: LGTM: Date boundary test validates strategy-specific behavior.

The test correctly validates that SingleSession contributions use the first message's date for daily stats aggregation, which is the appropriate behavior for this contribution strategy.

src/contribution_cache/multi_session.rs (2)

1-24: LGTM: MultiSessionContribution structure is well-designed.

The struct appropriately represents aggregated data for multi-session files, with clear documentation and sensible field types.


26-42: LGTM: from_messages implementation is clean and efficient.

The implementation correctly delegates to existing aggregation utilities and makes good optimization choices:

  • Uses Arc<str> to avoid per-session allocations
  • Filters out "unknown" dates to maintain data quality
  • Reuses proven aggregation logic from other modules
src/contribution_cache/tests/compact_stats.rs (1)

1-160: Well-structured test coverage for PackedStatsDate bitfield operations.

The tests comprehensively cover:

  • Normal value packing/unpacking (lines 6-32)
  • TUI stats conversion including cost_cents (lines 34-56)
  • Maximum bit-limit values (lines 58-85)
  • Saturation behavior for overflow values (lines 87-111)
  • Real-world observed values (lines 113-140)
  • Zero/default values (lines 142-160)

The edge case coverage is thorough and validates the bitfield packing implementation correctly.

src/contribution_cache/tests/multi_session.rs (1)

1-304: Comprehensive test coverage for multi-session contribution strategy.

The tests validate the complete incremental update lifecycle:

  • Creation from messages with session aggregation (lines 13-63)
  • Empty message handling (lines 65-74)
  • View add/subtract operations (lines 80-152)
  • Session merging with combined stats (lines 154-190)
  • File update flow with subtract-old/add-new pattern (lines 196-257)
  • File deletion cleanup (lines 259-304)

The pattern matching on RemovedContribution::MultiSession at line 296 correctly handles type-safe cache removal.

src/contribution_cache/single_message.rs (3)

45-59: Well-designed compact bitfield representation.

The 176-bit packed layout is clearly documented and the #[repr(C, align(1))] ensures minimal padding. The bitfield ranges are correctly specified for each field.


81-105: Saturation logic correctly prevents bit overflow.

The pack() method properly saturates all values to their bit limits:

  • Token fields use appropriate masks (0x7FF_FFFF for 27-bit, 0x3FF_FFFF for 26-bit)
  • Cost conversion handles f64 overflow with .min(u16::MAX as f64)
  • Year offset uses both saturating_sub and .min(63) for safe bounds

138-150: Compile-time size assertion ensures memory layout stability.

The const _: () = assert!(...) pattern at line 150 guarantees the 32-byte layout at compile time, catching any future field changes that would break the cache alignment optimization.

src/analyzers/kilo_code.rs (1)

303-321: Correctly implements per-source parsing pattern.

The new trait methods follow the established pattern from other analyzers:

  • is_available() uses synchronous vscode_extension_has_sources (line 304)
  • parse_source() delegates to the existing parse_kilo_code_task_directory helper
  • is_valid_data_path() validates ui_messages.json files
  • contribution_strategy() returns SingleSession matching similar VSCode extension analyzers

Based on learnings, this correctly avoids Tokio in the TUI-related paths.

src/analyzers/qwen_code.rs (2)

304-324: Data source discovery correctly uses WalkDir for flat directory traversal.

The min_depth(3).max_depth(3) constraint efficiently targets the ~/.qwen/tmp/*/chats/*.json pattern without unnecessary recursion. Based on learnings, this aligns with the project's preference for walkdir over jwalk for flat structures.


361-373: Per-source validation and contribution strategy correctly implemented.

The is_valid_data_path validation matches the discovery filter logic, ensuring consistency. SingleSession is the appropriate strategy for Qwen Code's 1-file-1-session data model.

src/contribution_cache/tests/mod.rs (2)

28-69: Well-designed test helper for creating configurable messages.

The make_message helper provides good defaults while allowing customization of key fields. The date parsing with fallback ensures tests don't fail on date format issues, and the role assignment based on model presence matches real-world semantics.


71-98: Test view helpers enable consistent test setup.

make_empty_view and make_view_with_session provide clean starting points for testing view mutations. The use of Arc::from for analyzer names aligns with the production code's memory optimization approach.

src/analyzers/copilot.rs (3)

452-470: Data source discovery correctly migrated to WalkDir.

The min_depth(3).max_depth(3) constraint efficiently targets workspaceStorage/*/chatSessions/*.json files. This aligns with the project's preference for walkdir for flat directory structures.


473-508: Per-source parsing pattern correctly implemented.

The new trait methods follow the established pattern:

  • is_available() uses the same filter logic as discovery
  • parse_source() delegates to parse_copilot_session_file
  • get_watch_directories() returns the workspace storage paths
  • is_valid_data_path() validates chatSessions JSON files
  • contribution_strategy() returns SingleSession

Based on learnings, this correctly avoids Tokio in the background I/O paths.


32-48: Platform coverage limited to macOS.

The workspace_storage_dirs() method only returns macOS paths (Library/Application Support). Linux (~/.config/{fork}) and Windows (%APPDATA%) paths are not included. If cross-platform support is intended, consider extending this method.

#!/bin/bash
# Check if other analyzers have cross-platform workspace storage paths
rg -n "workspace_storage" src/analyzers/ -A 10 | head -50

# Check if Linux/Windows paths exist elsewhere for Copilot
rg -n "workspaceStorage" src/analyzers/copilot.rs
src/main.rs (1)

28-30: LGTM: Global allocator correctly configured.

The mimalloc global allocator declaration is properly feature-gated and follows Rust best practices.

src/analyzers/opencode.rs (2)

442-461: LGTM: Data source discovery correctly implemented.

The WalkDir traversal with min_depth(2) and max_depth(2) correctly matches OpenCode's directory structure for message files. The switch from jwalk to walkdir aligns with the PR's performance strategy.

Based on learnings: Splitrail uses walkdir (not jwalk) for flat directory structures where parallel directory walking provides no benefit.


593-605: LGTM: Path validation correctly enforces OpenCode structure.

The validation logic correctly checks for .json files at depth 2 from the data directory, matching the structure enforced in discover_data_sources.

src/analyzers/claude_code.rs (3)

55-73: LGTM: Data source discovery correctly implemented.

The WalkDir traversal with appropriate depth constraints correctly matches Claude Code's directory structure.

Based on learnings: Consistent with walkdir usage for flat directory structures where parallel walking provides no benefit.


85-231: LGTM: Deduplication logic correctly handles Claude Code's complex cross-file semantics.

The sequential deduplication approach with fingerprint tracking and insertion order preservation correctly handles Claude Code's duplicate/split message scenarios. The trade-off of sequential processing for correctness is appropriate given the complexity of cross-file deduplication requirements.


240-251: LGTM: Path validation correct.

The validation correctly enforces Claude Code's directory structure with .jsonl files at depth 2.

src/analyzers/pi_agent.rs (2)

415-433: LGTM: Data source discovery correctly implemented.

The WalkDir traversal correctly matches Pi Agent's directory structure with appropriate depth constraints.


474-485: LGTM: Path validation correct.

The validation correctly enforces Pi Agent's directory structure.

src/analyzers/codex_cli.rs (3)

49-64: LGTM: Data source discovery correctly handles Codex CLI's flexible structure.

Unlike other analyzers, Codex CLI sessions can be arbitrarily nested under ~/.codex/sessions/, so the lack of depth constraints is appropriate. The WalkDir traversal will recursively find all .jsonl files regardless of nesting level.


82-88: LGTM: No deduplication needed for Codex CLI sessions.

The comment correctly notes that Codex CLI sessions are separate, so deduplication is unnecessary. This differs from analyzers like Claude Code where messages can appear across multiple files.


97-100: LGTM: Path validation appropriate for Codex CLI's flexible structure.

The validation correctly allows any .jsonl file under the sessions directory, regardless of depth, matching Codex CLI's flexible session organization.

src/watcher.rs (3)

4-16: Good adoption of parking_lot::Mutex.

The switch from std::sync::Mutex to parking_lot::Mutex aligns with the coding guidelines for improved performance. The imports are well-organized with the new view-based types.


135-152: Constructor simplification looks good.

The sync constructor with parallel initial load is cleaner. The documentation correctly notes the rayon threadpool requirement.


167-194: Incremental update logic is well-structured.

The branching between cached incremental updates and full reload fallback is correct. The file deletion path properly removes from cache before broadcasting.

src/contribution_cache/mod.rs (4)

1-14: Well-documented module structure.

The module-level documentation clearly explains the memory footprint of each contribution type, which helps maintainers understand the design tradeoffs.


92-174: Clean cache design with type-safe operations.

The three-map design provides both type safety and memory efficiency. The #[inline] hints on frequently-called accessors are appropriate.


208-216: Silent no-op when session not found during contribution add.

When adding a single-message contribution, if the session isn't found, the contribution is partially applied (daily stats updated) but session stats are skipped. The comment notes this is intentional, but this asymmetry could cause stats drift if sessions are created out-of-order.

Consider whether this scenario can occur in practice and if a warning log would help debugging.


345-347: O(n log n) sort on every multi-session add.

Re-sorting all sessions after each add is correct for maintaining order invariants, but could be optimized with insertion sort for the common case of adding pre-sorted sessions. Given that MultiSession is used for "all-in-one-file" analyzers where incremental adds are rare, this is acceptable.

src/analyzer.rs (8)

4-16: Imports properly organized for new functionality.

The new imports for rayon, DashMap, parking_lot, and contribution cache types are well-organized and necessary for the parallel/incremental architecture.


100-151: Good refactoring of VSCode extension discovery.

Extracting walk_vscode_extension_tasks reduces duplication and vscode_extension_has_sources properly short-circuits on first match for efficiency.


182-215: Well-designed parallel parsing with path preservation.

parse_sources_parallel_with_paths correctly preserves the path-to-messages association needed for contribution caching. The default parse_sources_parallel properly deduplicates using global_hash.


271-305: Registry structure supports incremental updates well.

The analyzer_order field correctly addresses the DashMap iteration order instability mentioned in tests. Using parking_lot::RwLock aligns with coding guidelines.


395-543: Parallel loading with contribution caching is well-implemented.

The strategy-based contribution caching correctly uses PathHash::new(&path) from the source paths returned by parse_sources_parallel_with_paths, avoiding the conversation_hash mismatch issue flagged in past reviews. The sequential cache insertion after parallel computation is correct.


545-658: Incremental file reload implements correct subtract/add pattern.

The implementation correctly:

  1. Validates path before processing
  2. Marks file dirty for upload tracking
  3. Gets old contribution from cache (if exists)
  4. Parses new file content
  5. Computes new contribution
  6. Updates cache
  7. Subtracts old contribution from view
  8. Adds new contribution to view

This achieves the O(1) incremental update goal.


750-781: Incremental upload loading is efficient.

Sequential parsing for dirty files (typically 1-2) is appropriate. The deduplication and timestamp filtering are correctly ordered.


1096-1162: Excellent regression test for stable analyzer ordering.

This test catches the DashMap iteration order issue that could cause tabs to jump. Testing order stability after updates and removals is thorough.

@Sewer56
Copy link
Contributor Author

Sewer56 commented Jan 15, 2026

Any chance of getting this merged?

OpenCode will likely move to sqlite backend, and I want to avoid conflicts when the code will inevitably require adjustment.

https://x.com/i/status/2011631185399857465

@bl-ue
Copy link
Member

bl-ue commented Jan 15, 2026

Hi @Sewer56! I'm so sorry about that! We got busy and forgot about this one. Please next time ping us sooner!

The 600 to 25 MB of memory is amazing; great job!

I noticed that there's .agents/*.md instead of the skills setup you did in #94—did you maybe base this branch on an older version version of #94's branch?

Otherwise, the code looks great! The model string interning that you're doing is clever.

@Sewer56
Copy link
Contributor Author

Sewer56 commented Jan 15, 2026

Yeah, I started the branch before #94 was finalised @bl-ue (while feedback was being done)
I think merging main into this branch should remove that diff.

@bl-ue bl-ue merged commit 4a41ed6 into Piebald-AI:main Jan 15, 2026
6 checks passed
@bl-ue
Copy link
Member

bl-ue commented Jan 15, 2026

@Sewer56 Thank you again!

@Sewer56
Copy link
Contributor Author

Sewer56 commented Jan 15, 2026

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants