Skip to content

Implement SQLite-backed mtime state store#3186

Merged
senamakel merged 3 commits into
tinyhumansai:mainfrom
manucian-official:patch-7
Jun 4, 2026
Merged

Implement SQLite-backed mtime state store#3186
senamakel merged 3 commits into
tinyhumansai:mainfrom
manucian-official:patch-7

Conversation

@manucian-official
Copy link
Copy Markdown
Contributor

@manucian-official manucian-official commented Jun 2, 2026

for my issue

Summary by CodeRabbit

  • New Features
    • Enhanced file state tracking and persistence for vault monitoring, enabling accurate detection and management of file modifications and deletions across application sessions.

This module implements a SQLite-backed state store for tracking file modification times, allowing the vault file watcher to detect changes efficiently. It includes functions for recording seen files, marking files as deleted, retrieving last modification times, and loading all non-deleted states.
@manucian-official manucian-official requested a review from a team June 2, 2026 06:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a new Rust module that implements an SQLite-backed persistent state store for tracking vault file modification times and deletion status. The store provides methods to record file mtimes, mark files as deleted, query last-seen mtimes, and bulk-load all tracked file states, with comprehensive unit tests validating the complete behavior.

Changes

Vault file watcher state persistence

Layer / File(s) Summary
Data model and store contract
src/openhuman/workspace/state.rs
Public FileState struct and WatcherStateStore type definitions, along with module documentation describing the SQLite-backed mtime persistence store.
SQLite store implementation with tests
src/openhuman/workspace/state.rs
Store opens/creates SQLite database, configures pragmas, and lazily creates vault_watcher_state table. Methods upsert seen mtimes, mark paths as deleted (soft-delete), fetch last non-deleted mtimes returning None when missing or deleted, and bulk-load all tracked states. Unit tests validate mtime recording/retrieval, unknown-path behavior, deletion semantics, upsert idempotency, and load_all row reflection including deleted flag.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A watcher takes notes of each file it sees,
Recording mtimes in SQLite with ease,
With soft-deletion marks and states to retrieve,
The vault now remembers what files to believe! 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement SQLite-backed mtime state store' directly and clearly describes the main change: adding an SQLite-backed persistent state store for tracking file modification times, which matches the core purpose of the new module.
Docstring Coverage ✅ Passed Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/openhuman/workspace/state.rs (2)

32-104: ⚡ Quick win

Add debug/trace instrumentation around the new state-store operations.

This new persistence boundary has no diagnostics, so watcher/state drift on startup will be hard to triage. Please add stable, grep-friendly debug/trace events around open, record_seen, record_deleted, last_mtime, and load_all, with correlation fields and redacted path data rather than raw file paths.

As per coding guidelines, "In Rust, default to verbose diagnostics on new/changed flows using log/tracing at debug/trace levels with stable grep-friendly prefixes and correlation fields" and "Never log secrets or full PII — redact sensitive information in logs".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/workspace/state.rs` around lines 32 - 104, Add verbose,
grep-friendly tracing/logging around the WatcherStateStore persistence methods
(open, record_seen, record_deleted, last_mtime, load_all): emit a debug/trace
event at method entry and on notable outcomes (success/failure) using a stable
prefix like "watcher::state::" and include a correlation field (e.g., request_id
or op_id) and redacted path (e.g., show only file name or hashed/obfuscated
path) rather than raw paths; ensure errors returned from Connection::open and
SQL operations are logged with their error details (but not full path), and use
tracing::debug!/trace! (or log::debug!) consistently so callers can grep
"watcher::state::" to find events.

115-169: ⚡ Quick win

Add a reopen test for the actual persistence contract.

All current tests exercise a single open connection. The PR’s core behavior is persistence across restarts, so this needs one case that writes state, drops the store, reopens the same SQLite file, and re-asserts last_mtime/load_all.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/workspace/state.rs` around lines 115 - 169, Add a test that
verifies persistence across restarts: create a NamedTempFile, open a
WatcherStateStore with WatcherStateStore::open(temp.path()), call record_seen
(and optionally record_deleted), drop or let the store go out of scope, then
reopen a new WatcherStateStore with the same temp.path() and assert last_mtime
and load_all return the previously written values; keep the NamedTempFile alive
for the duration of the test so the SQLite file is not removed. Reference:
create NamedTempFile, WatcherStateStore::open, record_seen, record_deleted,
last_mtime, load_all, and name the test e.g. reopen_persists_state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/workspace/state.rs`:
- Around line 86-103: The load_all function currently selects all rows from
vault_watcher_state and returns tombstones despite its contract to seed only
active (non-deleted) state; change the SQL in load_all (in function load_all) to
filter out deleted rows (e.g. add WHERE deleted = 0/false) so only non-deleted
FileState entries are returned, keep the existing row mapping (path, mtime_secs,
deleted) but you can drop reading the deleted column if you prefer since it will
always be false after the filter; ensure the returned mtime conversion and types
remain unchanged.

---

Nitpick comments:
In `@src/openhuman/workspace/state.rs`:
- Around line 32-104: Add verbose, grep-friendly tracing/logging around the
WatcherStateStore persistence methods (open, record_seen, record_deleted,
last_mtime, load_all): emit a debug/trace event at method entry and on notable
outcomes (success/failure) using a stable prefix like "watcher::state::" and
include a correlation field (e.g., request_id or op_id) and redacted path (e.g.,
show only file name or hashed/obfuscated path) rather than raw paths; ensure
errors returned from Connection::open and SQL operations are logged with their
error details (but not full path), and use tracing::debug!/trace! (or
log::debug!) consistently so callers can grep "watcher::state::" to find events.
- Around line 115-169: Add a test that verifies persistence across restarts:
create a NamedTempFile, open a WatcherStateStore with
WatcherStateStore::open(temp.path()), call record_seen (and optionally
record_deleted), drop or let the store go out of scope, then reopen a new
WatcherStateStore with the same temp.path() and assert last_mtime and load_all
return the previously written values; keep the NamedTempFile alive for the
duration of the test so the SQLite file is not removed. Reference: create
NamedTempFile, WatcherStateStore::open, record_seen, record_deleted, last_mtime,
load_all, and name the test e.g. reopen_persists_state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 21b0755e-d85b-4ab6-b8e4-08e4997e5251

📥 Commits

Reviewing files that changed from the base of the PR and between 1e03dd2 and 526e40f.

📒 Files selected for processing (1)
  • src/openhuman/workspace/state.rs

Comment thread src/openhuman/workspace/state.rs
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manucian-official heads up — CI is failing on this PR (Rust Tauri Coverage is red), so i'll hold off on a full approve until that's sorted out. i did spot a couple things while going through the diff though:

The bigger one: state.rs is added to src/openhuman/workspace/ but workspace/mod.rs has no mod state; or pub mod state; declaration. The Rust compiler won't process this file at all — the module and its unit tests are effectively dead code. Nothing in the crate can reach WatcherStateStore until you add the declaration. This needs to be wired in before the PR does anything useful.

Smaller one inline below.

Fix the module declaration and the CI failure, and i'll come back for a full pass.

Comment thread src/openhuman/workspace/state.rs
@senamakel
Copy link
Copy Markdown
Member

@coderabbitai can u pls summarize the changes in this PR? pros/cons?

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat src/openhuman/workspace/state.rs

Length of output: 5683



@senamakel Sure! Here's a breakdown of PR #3186:


What It Does

This PR introduces a new Rust module (src/openhuman/workspace/state.rs) that implements a SQLite-backed persistent store for tracking file modification times (mtime) in the vault file watcher. Instead of re-scanning all files from scratch on every restart, the watcher can now persist what it's already seen and only process genuine changes.

Schema

CREATE TABLE IF NOT EXISTS vault_watcher_state (
    path       TEXT    PRIMARY KEY,
    mtime_secs INTEGER NOT NULL,
    deleted    INTEGER NOT NULL DEFAULT 0
);

Key API

Method Purpose
WatcherStateStore::open(db_path) Opens/creates the SQLite DB with WAL mode
record_seen(path, mtime_secs) Upserts a file's last-seen mtime
record_deleted(path) Soft-deletes a path (keeps row, sets deleted=1)
last_mtime(path) Returns stored mtime, or None if missing/deleted
load_all() Bulk-loads all rows for startup seeding

✅ Pros

  1. Persistence across restarts — the vault watcher won't reprocess every file on startup, reducing unnecessary I/O and ingestion overhead.
  2. WAL journal mode + NORMAL sync — good performance/durability tradeoff for a local state store.
  3. Soft deletes — deleted paths are kept in the DB so the watcher doesn't accidentally re-process them on reconnect.
  4. Lazy table creation — the table is created on first connection; no migration tooling needed initially.
  5. Upsert semanticsINSERT ... ON CONFLICT DO UPDATE is clean and race-safe within a single connection.
  6. Good test coverage — 5 unit tests cover record/retrieve, unknown path, deletion, upsert, and bulk load.
  7. prepare_cached used in last_mtime for the hot path, avoiding repeated statement compilation.

⚠️ Cons / Potential Issues

  1. u64i64 silent overflowmtime_secs is cast with mtime as i64 with no overflow check. Values beyond ~year 2554 would silently wrap to negative. A u64::try_from or a TryInto<i64> with a proper error would be safer.

  2. Lossy path encodingpath.to_string_lossy() silently replaces non-UTF-8 bytes with U+FFFD. On Linux, file paths can be arbitrary bytes, so two distinct paths could theoretically map to the same string key, causing data corruption.

  3. load_all() returns deleted rows too — despite the doc comment saying "non-deleted states", the SQL query fetches all rows (WHERE clause is missing). Callers must filter deleted == true themselves, which is error-prone.

  4. record_deleted inserts mtime_secs = 0 — when inserting a never-seen deleted path, it stores mtime_secs=0, which is semantically odd (epoch time). A NULL-able column would be cleaner.

  5. No row cleanup / vacuuming — soft-deleted rows accumulate indefinitely. Over time, especially in large vaults with many deleted files, the DB will grow unboundedly with no way to prune stale entries.

  6. No bulk insert batchingrecord_seen issues one execute() per file. Scanning a large vault at startup could trigger thousands of individual writes; a transaction wrapper would significantly improve performance.

  7. No module wiring visible — there's no mod state; declaration or usage of WatcherStateStore in the rest of the codebase shown in the PR, so it's unclear how this integrates with the actual file watcher.


Overall it's a solid foundation, but the path encoding issue (#2) and the load_all filter bug (#3) are the most likely to cause real bugs in production. The u64→i64 cast and the lack of batching are worth addressing before merging as well.

@senamakel senamakel self-assigned this Jun 4, 2026
…gging + tests

- Add `pub mod state;` to workspace/mod.rs so the module is reachable
  (addresses @graycyrus blocker — file was dead code without it)
- Filter deleted rows in load_all() with WHERE deleted = 0 to match
  its documented "non-deleted states" contract (addresses @coderabbitai)
- Move `use rusqlite::OptionalExtension` to top-level imports
  (addresses @graycyrus inline comment)
- Add debug/trace logging with [watcher::state] prefix on open,
  record_seen, record_deleted, load_all (addresses @coderabbitai nitpick,
  CLAUDE.md diagnostic logging requirement)
- Add reopen_persists_state test verifying persistence across restarts
  (addresses @coderabbitai nitpick)
- Update load_all_excludes_deleted test to assert filtered behavior
@senamakel senamakel merged commit ea7ada7 into tinyhumansai:main Jun 4, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants