Skip to content

Conversation

@neonwatty
Copy link

@neonwatty neonwatty commented Dec 3, 2025

A follow up to the conversation in #46 - as a first step, migrating JSON cache to Sqlite.

What this PR does

Changes

  • SQLite schema: 8 tables (projects, sessions, cached_files, cached_entries, etc.) + indexes
  • Thread-safe connections: Uses threading.local() with WAL mode
  • Lazy migration: Auto-migrates existing JSON caches on first access, then deletes them
  • Test updates: All 346 tests pass with isolated temp databases

Migration Strategy

When CacheManager is initialized for a project:

  1. If JSON cache exists → migrate to SQLite, delete JSON
  2. Otherwise → use SQLite directly

Users don't need to do anything—migration is automatic and transparent.

Test plan

  • All existing tests pass (346 passed, 8 skipped)
  • Manual testing: processed 24 projects, 245 sessions (I personally ad a 550MB cache of project data)
  • Cache clear/rebuild verified
  • JSON migration verified (legacy caches auto-migrate and delete)
  • New tests added: schema verification, JSON migration, thread safety

Open Questions

  1. DB location: Used ~/.claude/cache.db. Should it be ~/.claude/projects/cache.db instead?
  2. Feature flag: Worth adding CLAUDE_CODE_LOG_SQLITE_CACHE=0 env var for rollback?

Summary by CodeRabbit

  • New Features

    • Automatic migration of legacy cache format to new storage system
    • Thread-safe cache operations for safe concurrent access
  • Bug Fixes

    • Enhanced error handling for cache operations with dedicated exception types

✏️ Tip: You can customize this high-level summary in your review settings.

Migrates per-project JSON cache to single SQLite database at ~/.claude/cache.db.
This provides a foundation for future tags/bookmarks feature.

Changes:
- SQLite schema with 8 tables (projects, sessions, cached_files, etc.) + indexes
- Thread-safe connections using threading.local() with WAL mode
- Lazy migration: auto-migrates existing JSON caches on first access
- Preserves existing CacheManager public API (no breaking changes)

Test updates:
- Added temp_sqlite_db fixture for test isolation
- Added 6 new tests: schema verification, JSON migration, thread safety
- All 346 tests pass
@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

Walkthrough

Replaces JSON-based caching with a SQLite-backed system, introducing a complete database schema, migration logic, thread-safe connection management, and a new exception hierarchy. Cache operations now persist to and query from SQLite instead of filesystem JSON files.

Changes

Cohort / File(s) Summary
Core SQLite Cache Implementation
claude_code_log/cache.py
Introduces SQLite schema (8 tables with indexes), exception hierarchy (CacheError, CacheDatabaseError, CacheMigrationError), and thread-local database connections with WAL mode. Adds migration logic from legacy JSON cache, class-level configuration via set_db_path, and SQL-based cache persistence and retrieval. Updates all cache operations to use SQLite INSERT/UPDATE/DELETE.
Test Infrastructure
test/conftest.py
Adds autouse fixture temp_sqlite_db that initializes a temporary SQLite database per test, configures CacheManager with per-test db_path, and cleans up connections and initialization state after each test.
Cache Unit Tests
test/test_cache.py
Replaces filesystem-based cache validation with SQLite checks; updates assertions to verify data in database using schema queries and methods like is_file_cached and _get_cached_file_id. Adds tests for JSON-to-SQLite migration, concurrent writes with thread-local connections, corrupted cache data handling, and schema/index validation.
Integration Tests
test/test_cache_integration.py, test/test_integration_realistic.py
Updates test signatures to include temp_sqlite_db fixture; replaces filesystem cache directory checks with CacheManager queries and get_cached_project_data() results. Removes assertions on per-project cache file presence; validates cache state through SQLite-backed CacheManager instead.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • claude_code_log/cache.py: New SQLite schema with 8 tables, foreign keys, and indexes; migration logic with file cleanup; thread-local connection management with initialization guards; refactored cache operations across multiple methods with conflict handling and exception raising.
  • Test files: Multiple files updated with similar patterns (filesystem → SQLite checks), but consistency and correctness of migrations, thread-safety assertions, and edge cases require careful verification.

Areas requiring extra attention:

  • Migration logic in _migrate_json_cache_if_needed: correctness of data transfer, file cleanup timing, and edge cases (partial migrations, concurrent access)
  • Thread-local connection lifecycle and cleanup, particularly in close_all_connections and fixture teardown
  • Exception handling transitions from silent/print-based to raising CacheDatabaseError and CacheMigrationError
  • Consistency of test fixtures across 4 test files and correctness of SQLite state assertions

Poem

🐰 From JSON files that scattered wide,
To SQLite where data hides,
With schemas neat and threads that bind,
A cache transformed, refined, redesigned!
WAL mode whispers, indexes glow,
Migration magic: watch it flow. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: migrate cache from JSON to SQLite (#46)' is clear, concise, and accurately summarizes the main change—a migration from JSON-based caching to SQLite-backed caching. It directly reflects the primary objective of the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
claude_code_log/cache.py (5)

239-260: Thread-local connection lifecycle is per-thread only; document / guard for non-test usage

_connection and close_all_connections correctly provide a thread-local sqlite3.Connection per thread, but close_all_connections only closes the connection for the calling thread’s threading.local state. Connections created in worker threads are left to GC, which is fine for tests but can surprise in long-lived processes.

Two follow-ups to consider:

  • Clarify in close_all_connections docstring that it only affects the current thread.
  • For robustness, add a comment near set_db_path noting it should be called before any connections are created, or be paired with a per-thread cleanup strategy if used in production code.

No immediate bug given current usage (tests set DB path before first CacheManager instance), but worth tightening expectations.


261-299: Database initialization guard is sound; consider handling partial schema_version states

_ensure_database’s use of _init_lock and _db_initialized looks correct for single-process, multi-threaded use and avoids re-running schema creation.

If you later introduce real migrations (CURRENT_SCHEMA_VERSION > 1), you may want to:

  • Explicitly wrap any DDL/DML upgrade steps in a transaction before writing a new schema_version row.
  • Fail fast with a clear CacheDatabaseError when encountering a higher schema_version than supported by this code, instead of silently treating it as “current”.

Nothing to fix now with CURRENT_SCHEMA_VERSION = 1, just a note for future schema evolution.


568-603: Date filtering semantics correct but can be simplified and extended

The load_cached_entries_filtered implementation:

  • Correctly uses dateparser.parse and parse_timestamp, and normalizes message timestamps to naive datetimes for comparison.
  • Treats “today”, “yesterday”, and “X days ago” as day ranges, which satisfies the natural-language requirements.

Two small improvements:

  • The to_date branch sets end-of-day in both the if and else clauses; you can collapse this into a single to_dt = to_dt.replace(...) to reduce noise.
  • If you want more intuitive ranges for phrases like "last week" or "last month", you might want to normalize from_dt to start-of-day similarly when dateparser returns a midnight time.

Not blockers, but would make the logic clearer and more consistent.


528-547: Consider deterministic ordering when loading cached entries

load_cached_entries currently selects from cached_entries without an ORDER BY, and then flattens all rows:

cursor = self._connection.execute(
    "SELECT entries_json FROM cached_entries WHERE cached_file_id = ?",
    (cached_file_id,),
)

SQLite does not guarantee row order without ORDER BY, so the resulting entry sequence can differ from the original chronological ordering that was implied by timestamp keys.

If downstream consumers assume chronological order (rather than re-sorting by timestamp), you may see subtle behavior changes vs the legacy JSON cache. To make ordering explicit:

-            cursor = self._connection.execute(
-                "SELECT entries_json FROM cached_entries WHERE cached_file_id = ?",
-                (cached_file_id,),
-            )
+            cursor = self._connection.execute(
+                "SELECT entries_json FROM cached_entries "
+                "WHERE cached_file_id = ? "
+                "ORDER BY timestamp_key",
+                (cached_file_id,),
+            )

Similar ordering could be added to load_cached_entries_filtered as well.


988-1007: clear_cache behavior matches intent; consider reusing _remove_json_cache()

clear_cache correctly:

  • Deletes the project row (cascading to related tables).
  • Resets _project_id so a new project record is created on next use.
  • Cleans up any legacy JSON cache directory.

Given you already have _remove_json_cache() with error handling and logging, you could slightly simplify and centralize behavior by delegating the final block:

-            # Also clean up any legacy JSON cache if it exists
-            if self.cache_dir.exists():
-                shutil.rmtree(self.cache_dir)
+            # Also clean up any legacy JSON cache if it exists
+            self._remove_json_cache()

This keeps JSON cache deletion logic in a single place.

test/conftest.py (1)

11-31: SQLite test fixture design is solid; thread cleanup relies on GC

The temp_sqlite_db autouse fixture cleanly:

  • Points CacheManager at a per-test DB file under tmp_path.
  • Ensures schema is (re)initialized by resetting _db_initialized after each test.
  • Calls close_all_connections() to drop the main-thread connection between tests.

Given CacheManager uses threading.local for connections, note that close_all_connections() only closes the current thread’s connection; any connections created inside worker threads (e.g. in thread-safety tests) are left to Python’s GC. That’s fine for this suite, but if you ever see DB files held open unexpectedly, this is where you’d need stronger cleanup.

Otherwise, the fixture is well-structured.

test/test_integration_realistic.py (2)

234-253: Good addition verifying caches are actually created in SQLite

The added block in test_clear_cache_with_projects_dir that walks projects and asserts at least one has cached_files in SQLite via CacheManager(...).get_cached_project_data() is a nice end-to-end sanity check that --all-projects really populates the new backend, not just HTML.

If you want consistency with other tests, you could pass get_library_version() instead of a hard-coded "1.0.0", but it’s not required here since you only assert that some cache exists.


419-433: SQLite-backed cache verification for real projects looks correct

In TestCacheWithRealData.test_cache_creation_all_projects, the new CacheManager(project_dir, "1.0.0") + get_cached_project_data() assertions provide a solid integration check that processing real projects populates the SQLite cache (not just HTML files).

Using cached_data.version / .sessions presence as smoke tests is appropriate; deeper structure checks are already covered elsewhere.

test/test_cache.py (1)

42-47: Fixture still patches get_library_version though CacheManager doesn’t use it

The cache_manager fixture wraps CacheManager(...) in:

with patch("claude_code_log.cache.get_library_version", return_value=mock_version):
    return CacheManager(temp_project_dir, mock_version)

Since CacheManager.__init__ only uses the explicitly passed library_version argument and never calls get_library_version, this patch is effectively a no-op now.

It’s harmless but could be removed (or relocated into tests that exercise code paths which actually call get_library_version, such as CLI integration tests) to reduce confusion.

test/test_cache_integration.py (1)

415-436: Version-upgrade integration scenario exercises cache reuse path

test_cache_version_upgrade_scenario now:

  • Seeds a project with aggregates under an “old” version via CacheManager(project_dir, "1.0.0").
  • Runs convert_jsonl_to_html under a patched get_library_version returning "2.0.0".

Even though _is_cache_version_compatible currently treats all versions as compatible, this acts as a regression test that a version bump doesn’t break conversion with existing SQLite cache present.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62dd09d and 45e3701.

📒 Files selected for processing (5)
  • claude_code_log/cache.py (10 hunks)
  • test/conftest.py (1 hunks)
  • test/test_cache.py (8 hunks)
  • test/test_cache_integration.py (9 hunks)
  • test/test_integration_realistic.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

  • test/conftest.py
  • test/test_integration_realistic.py
  • test/test_cache_integration.py
  • test/test_cache.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

  • test/conftest.py
  • test/test_integration_realistic.py
  • test/test_cache_integration.py
  • claude_code_log/cache.py
  • test/test_cache.py
claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

  • claude_code_log/cache.py
🧬 Code graph analysis (2)
test/conftest.py (1)
claude_code_log/cache.py (2)
  • set_db_path (234-237)
  • close_all_connections (255-259)
test/test_cache_integration.py (3)
test/conftest.py (1)
  • temp_sqlite_db (12-30)
test/test_cache.py (1)
  • cache_manager (43-46)
claude_code_log/cache.py (3)
  • CacheManager (200-1093)
  • get_cached_project_data (899-986)
  • update_project_aggregates (811-854)
🔇 Additional comments (13)
claude_code_log/cache.py (1)

1055-1093: get_cache_stats is robust; consider exposing more metrics if needed

get_cache_stats safely handles missing project rows and returns a simple, dictionary-based view of stats. This is a good fit for reporting/CLI.

If you later need richer introspection (e.g. per-project DB size, average messages per session), consider:

  • Either extending this method with optional fields, or
  • Adding a separate, more detailed stats method to avoid overloading the basic API.

No changes required now; current implementation is clean and defensive.

test/test_integration_realistic.py (1)

465-478: Version stored in SQLite is verified appropriately

The test_cache_version_stored changes correctly assert that:

  • A cache exists after processing.
  • The version field in ProjectCache is populated (and, implicitly, sourced from the DB).

This aligns with the new schema’s projects.library_version column and provides a simple regression guard for version persistence.

test/test_cache.py (6)

96-104: Initialization expectations align with SQLite-backed implementation

test_initialization’s new assertion that _project_id is non-None ensures a project row is always present after constructing CacheManager, which matches the SQLite design. This is a useful guard against regressions where projects might not get created correctly.


128-151: Direct schema inspection test is useful and well-structured

test_timestamp_based_cache_structure querying cached_entries.timestamp_key directly is a good low-level check that:

  • Entries are bucketed by exact timestamps.
  • Summary entries are stored under _no_timestamp.

This complements higher-level load tests and gives early signal if the storage format changes.


246-265: clear_cache behavior is thoroughly validated

The updated test_clear_cache correctly checks that:

  • After saving entries, is_file_cached is True.
  • Calling clear_cache() resets _project_id to None.
  • A new CacheManager instance for the same project sees no cached files.

This closely matches the new SQLite semantics where deleting the project row cascades and a subsequent initialization creates a fresh project record.


568-603: Corrupted entry handling test matches load_cached_entries behavior

test_corrupted_cache_data manually inserts a row into cached_files and a bogus entries_json value into cached_entries, then asserts that load_cached_entries returns None.

This matches the production behavior where JSON decoding errors are caught, a warning is printed, and None is returned. The test provides good regression coverage for that error path.


639-671: SQLite schema creation tests cover core tables and indexes well

TestSQLiteSchema’s two tests verify:

  • All expected tables (schema_version, projects, working_directories, cached_files, file_sessions, sessions, cached_entries, tags) are present.
  • All key indexes (esp. on cached_entries, sessions, cached_files, working_directories) exist.

This is excellent defensive coverage for future migrations or refactors that might accidentally drop or rename objects.


827-881: Thread-safety tests provide strong coverage of concurrent writes

TestThreadSafety.test_concurrent_cache_writes and test_thread_local_connection_isolation do a good job of:

  • Stressing concurrent update_session_cache calls across multiple threads, asserting no exceptions and that all sessions are persisted.
  • Verifying that each thread receives a distinct SQLite connection object (via different id(manager._connection) values).

These tests nicely validate the design choice around threading.local and WAL mode.

test/test_cache_integration.py (5)

89-113: CLI --no-cache behavior test correctly updated for SQLite backend

The updated test_cli_no_cache_flag now:

  • Verifies that a normal CLI run creates SQLite-backed cache via CacheManager.get_cached_project_data().
  • Acknowledges that --no-cache skips using the cache during that run, but does not prevent the underlying database file from existing.

This matches the new design and avoids brittle assertions about file presence.


114-140: --clear-cache integration test now validates SQLite state, not filesystem

test_cli_clear_cache_flag’s new logic is good:

  • After an initial run, it checks that get_cached_project_data() is non-None.
  • After invoking --clear-cache, a new CacheManager instance sees a project with zero cached_files.

That’s a clear, end-to-end verification that the CLI is correctly delegating to CacheManager.clear_cache() and that the SQLite-backed cache is reset.


141-173: All-projects caching assertion correctly targets SQLite-backed cache

The updated test_cli_all_projects_caching now asserts, for each synthetic project, that:

  • A CacheManager instance can reconstruct ProjectCache.
  • Each project has at least one cached file in cached_files.

This is a good regression test that --all-projects exercises the new backend across multiple projects.


195-208: Converter integration test verifies cache population without over-specifying internals

test_convert_jsonl_to_html_with_cache now checks:

  • First run with use_cache=True creates HTML.
  • SQLite cache exists and contains at least one cached_files entry for the project.

Second run simply asserts successful conversion, which is sufficient; detailed “cache hit” behavior is better left to unit tests. This strikes a good balance for integration scope.


225-254: Project hierarchy processing test correctly checks per-project caches

test_process_projects_hierarchy_with_cache was updated to:

  • Run process_projects_hierarchy(..., use_cache=True).
  • Verify for each project that a CacheManager can reconstruct non-None ProjectCache.

This ensures the multi-project processing path wires up the new cache backend for every project, not just the HTML outputs.

Comment on lines +459 to +477
# Migrate cached entries from separate JSON file
entry_file = self.cache_dir / f"{Path(file_name).stem}.json"
if entry_file.exists():
try:
with open(entry_file, "r", encoding="utf-8") as f:
entries_by_timestamp = json.load(f)
for timestamp_key, entries in entries_by_timestamp.items():
self._connection.execute(
"""
INSERT INTO cached_entries (cached_file_id, timestamp_key, entries_json)
VALUES (?, ?, ?)
""",
(cached_file_id, timestamp_key, json.dumps(entries)),
)
except Exception as e:
print(
f"Warning: Failed to migrate entries from {entry_file}: {e}"
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Per-file JSON cache migration likely misses legacy *.jsonl.json files

In _migrate_json_cache_if_needed, per-file cache is looked up as:

entry_file = self.cache_dir / f"{Path(file_name).stem}.json"

For a JSON index entry like "test.jsonl", this resolves to cache/test.json, but your new test data (test_json_to_sqlite_migration) writes the legacy per-file cache as cache/test.jsonl.json. As a result, existing *.jsonl.json files will be silently skipped during migration and their timestamp-keyed entries never populate cached_entries, even though cached_files metadata is migrated.

That can leave SQLite with file metadata but no actual entries for previously cached projects, and load_cached_entries will return an empty list rather than falling back to regenerating from the JSONL file.

Consider supporting both legacy naming patterns so old caches are fully migrated:

-                # Migrate cached entries from separate JSON file
-                entry_file = self.cache_dir / f"{Path(file_name).stem}.json"
-                if entry_file.exists():
+                # Migrate cached entries from separate JSON file
+                legacy_entry_files = [
+                    # Legacy pattern: "<file_name>.json" -> e.g. "test.jsonl.json"
+                    self.cache_dir / f"{file_name}.json",
+                    # Alternative pattern: "<stem>.json" -> e.g. "test.json"
+                    self.cache_dir / f"{Path(file_name).stem}.json",
+                ]
+                entry_file = next(
+                    (p for p in legacy_entry_files if p.exists()),
+                    None,
+                )
+
+                if entry_file is not None:
                     try:
                         with open(entry_file, "r", encoding="utf-8") as f:
                             entries_by_timestamp = json.load(f)

This keeps migration robust across both historical and any future per-file cache naming schemes.

Comment on lines +704 to +783
def test_json_to_sqlite_migration(self, temp_project_dir, temp_sqlite_db):
"""Test migration from legacy JSON cache to SQLite."""
# 1. Create legacy JSON cache structure
cache_dir = temp_project_dir / "cache"
cache_dir.mkdir()

# Create index.json with project data
index_data = {
"version": "1.0.0",
"cache_created": "2023-01-01T10:00:00Z",
"last_updated": "2023-01-01T11:00:00Z",
"total_message_count": 50,
"total_input_tokens": 500,
"total_output_tokens": 1000,
"total_cache_creation_tokens": 25,
"total_cache_read_tokens": 10,
"earliest_timestamp": "2023-01-01T10:00:00Z",
"latest_timestamp": "2023-01-01T11:00:00Z",
"sessions": {
"session1": {
"session_id": "session1",
"summary": "Test session",
"first_timestamp": "2023-01-01T10:00:00Z",
"last_timestamp": "2023-01-01T11:00:00Z",
"message_count": 5,
"first_user_message": "Hello",
"total_input_tokens": 100,
"total_output_tokens": 200,
}
},
"cached_files": {
"test.jsonl": {
"file_path": str(temp_project_dir / "test.jsonl"),
"source_mtime": 1672574400.0,
"cached_mtime": 1672574500.0,
"message_count": 5,
"session_ids": ["session1"],
}
},
"working_directories": ["/test/dir"],
}
(cache_dir / "index.json").write_text(json.dumps(index_data), encoding="utf-8")

# Create per-file cache
file_cache_data = {
"2023-01-01T10:00:00Z": [
{
"type": "user",
"uuid": "user1",
"timestamp": "2023-01-01T10:00:00Z",
"sessionId": "session1",
"version": "1.0.0",
"parentUuid": None,
"isSidechain": False,
"userType": "user",
"cwd": "/test",
"message": {"role": "user", "content": "Hello"},
}
]
}
(cache_dir / "test.jsonl.json").write_text(
json.dumps(file_cache_data), encoding="utf-8"
)

# 2. Initialize CacheManager (triggers migration)
cache_manager = CacheManager(temp_project_dir, "1.0.0")

# 3. Verify data in SQLite matches original JSON
cached_data = cache_manager.get_cached_project_data()
assert cached_data is not None
assert cached_data.total_message_count == 50
assert cached_data.total_input_tokens == 500
assert "session1" in cached_data.sessions
assert cached_data.sessions["session1"].summary == "Test session"

# 4. Verify JSON directory deleted
assert not cache_dir.exists(), (
"JSON cache directory should be deleted after migration"
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

JSON→SQLite migration tests are good but assume specific per-file naming

The JSON migration tests are valuable:

  • test_json_to_sqlite_migration validates that project aggregates, sessions, cached_files, and working_directories are migrated and that the legacy JSON cache directory is removed.
  • test_migration_skips_if_already_in_sqlite ensures existing SQLite data is not overwritten by stale JSON, while still deleting the JSON cache.

One subtle but important point: test_json_to_sqlite_migration writes per-file cache as:

(cache_dir / "test.jsonl.json").write_text(...)

This assumes legacy per-file caches were named "<jsonl_filename>.json", e.g. test.jsonl.json. As noted in the cache module review, _migrate_json_cache_if_needed currently only looks for Path(file_name).stem + ".json" (cache/test.json), so these test fixtures won’t actually be picked up by migration, and cached_entries will remain empty.

Once you adjust the migration code to consider both f"{file_name}.json" and f"{Path(file_name).stem}.json", these tests will more accurately reflect real-world legacy naming and ensure entries are migrated, not just metadata.

🤖 Prompt for AI Agents
In test/test_cache.py around lines 704-783, the test writes a per-file legacy
cache as "test.jsonl.json" but the migration implementation only looks for cache
files named "{file_name}.json" (e.g. "test.jsonl.json" vs "test.json"), so
entries are not discovered; update the migration logic in
_migrate_json_cache_if_needed to check both f"{file_name}.json" and
f"{Path(file_name).stem}.json" (i.e., consider both the full filename + ".json"
and the stem + ".json") when locating per-file JSON caches, load whichever
exists, and proceed with migration; ensure both patterns are searched in the
same order and that deletion of the JSON cache directory still occurs after
successful migration.

@daaain
Copy link
Owner

daaain commented Dec 3, 2025

Oh wow, incredible timing, I implemented the same thing yesterday night and just pushed to: #59

They also seem to be pretty similar from a high level, but I've added a bit more tests and replaced a few more bits that depend on cache. Please have a look, let's figure out together what would be the best approach!

@neonwatty
Copy link
Author

Oh wow, incredible timing, I implemented the same thing yesterday night and just pushed to: #59

They also seem to be pretty similar from a high level, but I've added a bit more tests and replaced a few more bits that depend on cache. Please have a look, let's figure out together what would be the best approach!

Yours looks better imo. For example, I didn't include a seperate migration (just added into the cache file).

Closing this one. Looking forward to yours merging, then would be happy to work on tags / toc / etc., from #46 etc.,

Think a web server / app would be great too.

@neonwatty neonwatty closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants