Skip to content

Conversation

@daaain
Copy link
Owner

@daaain daaain commented Dec 3, 2025

Pretty comprehensive cache rewrite where all the bits using cache have been retooled to use SQLite.

The schema is created as migration to make it easier to update when new log types are added by Anthropic.

Also added lots of tests, including integrity to make sure db constraints are working.

Haven't measured timings yet, but feels much faster on my 123K+ messages CC projects dir which yielded a 848MB db 😅

Summary by CodeRabbit

  • New Features

    • Paginated combined transcripts with page navigation and a new --page-size CLI option.
    • Archive/restore flows in the TUI; archived projects/sessions are labeled, visible, and restorable.
  • Improvements

    • Centralized SQLite-backed cache with per-project, per-page and per-session caching and staleness checks for faster incremental HTML.
    • Library-version detection and renderer support for page context and page-level stats.
  • Bug Fixes

    • Consistent UTF-8 I/O and more deterministic rendering readiness.
  • Documentation

    • Added guide for restoring archived sessions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Walkthrough

Centralizes caching into a SQLite-backed CacheManager with migrations, adds HTML/page-level caching and pagination, makes CLI/TUI archive-aware (archive/restore/delete flows), threads page_size through generation, and migrates tests to the DB-backed cache.

Changes

Cohort / File(s) Summary
Core Cache & Models
claude_code_log/cache.py
Replace index.json with SQLite-backed CacheManager; add models (HtmlCacheEntry, PageCacheData, SessionCacheData, ProjectCache), per-entry serialization, SQL CRUD, HTML/page staleness checks, aggregates, and public APIs (is_file_cached, get_cached_project_data, get_all_cached_projects, get_cache_db_path, get_library_version, etc.).
Migrations & Runner
claude_code_log/migrations/*
.../runner.py, .../001_initial_schema.sql, .../002_html_cache.sql, .../003_html_pagination.sql, .../__init__.py
Add migration runner (discover/apply/verify migrations, checksum tracking) and migrations to create projects, cached_files, sessions, messages, html_cache, html_pages, page_sessions, indexes and FKs.
CLI & Project Discovery
claude_code_log/cli.py
Add _discover_projects, archive-aware _launch_tui_with_cache_check, --page-size plumbing, DB-based cache clearing, and use cache-provided working directories for matching.
Converter & Pagination
claude_code_log/converter.py
Add GenerationStats, pagination helpers (_get_page_html_path, _assign_sessions_to_pages, _generate_paginated_html), has_cache_changes, per-page/session generation, and propagate page_size through conversion APIs.
HTML Renderer & Templates
claude_code_log/html/renderer.py, claude_code_log/html/templates/transcript.html, claude_code_log/html/templates/components/page_nav_styles.css
Thread page_info/page_stats into rendering API, add is_html_outdated helper, inject page navigation UI/CSS, and archive styling.
TUI & Archive UI
claude_code_log/tui.py
Make TUI archive-aware: mark archived projects/sessions, merge displays with indicators, add archive/restore/delete flows and confirmation screens, load archived sessions from DB, and update selector/browser signatures.
Utilities
claude_code_log/utils.py
Broaden extract_working_directories to accept mixed entry types, add _is_temp_path to filter temp dirs, sort by timestamp, and avoid circular imports.
Tests — Cache, Pagination & Integrity
test/* (notable: test/test_cache.py, test/test_cache_integration.py, test/test_cache_sqlite_integrity.py, test/test_pagination.py, test/test_html_regeneration.py, test/test_tui.py, ...)
Migrate tests to centralized SQLite cache DB with fixtures, add integrity/pagination/migration tests, enforce UTF‑8 I/O, and cover archive/restore and HTML regeneration.
Snapshots & Styling
test/__snapshots__/*, claude_code_log/html/templates/components/project_card_styles.css, claude_code_log/html/templates/index.html
Update snapshots and templates for pagination and archived styling; add archived badge and conditional archived rendering.
Docs
dev-docs/restoring-archived-sessions.md
Add documentation describing archived sessions, TUI restore/archive/delete workflows, and manual SQL restore guidance.
Type Stubs & Pygments
stubs/pygments/*
Add minimal pygments stubs and tighten Pygments typing/usage across HTML tooling.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI / main
    participant Cache as CacheManager
    participant DB as SQLite (cache.db)
    participant Conv as Converter
    participant Rend as HtmlRenderer

    CLI->>Cache: init(project_dir, db_path)
    Cache->>DB: open connection, enable WAL, run_migrations()
    Cache->>DB: ensure/load project row & aggregates

    CLI->>Cache: has_cache_changes(project_dir, page_size)
    Cache->>DB: query messages, cached_files, html_pages
    DB-->>Cache: cache metadata (timestamps, counts, page info)

    alt up-to-date & page_size unchanged
        CLI->>Conv: skip regeneration
    else regeneration required or page_size changed
        CLI->>Conv: convert_jsonl_to_html(..., page_size)
        Conv->>Cache: load_cached_entries(file/session)
        Cache->>DB: SELECT messages WHERE file_id/session_id
        DB-->>Cache: messages list
        Cache-->>Conv: session_data

        Conv->>Conv: _assign_sessions_to_pages(session_data, page_size)
        loop per page
            Conv->>Rend: generate(messages_subset, page_info, page_stats)
            Rend-->>Conv: HTML output
            Conv->>Cache: update_page_cache(page_meta)
            Cache->>DB: INSERT/UPDATE html_pages, page_sessions, html_cache
        end
        Conv->>Cache: update project aggregates
        Cache->>DB: UPDATE projects totals
    end

    CLI->>Cache: get_cache_stats()
    Cache->>DB: SELECT aggregates
    DB-->>Cache: stats
    Cache-->>CLI: GenerationStats summary
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~180 minutes

Possibly related PRs

Suggested reviewers

  • cboos

Poem

🐇 I swapped files for cozy rows in DB,

Pages stitched with nav so readers roam with glee.
Archived sessions tucked with badges bright,
cache.db hums softly through day and night.
A rabbit hops — paginated delights!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'SQLite cache' directly and clearly describes the main change in the changeset: replacing the in-module index.json cache with a SQLite-backed CacheManager implementation across the entire codebase.
Docstring Coverage ✅ Passed Docstring coverage is 88.87% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
test/test_cache_integration.py (1)

395-411: Test references obsolete per-project cache directory structure.

This test still references project / "cache" and cache_dir.glob("*.json"), which no longer exist with the SQLite migration. The cache is now stored in a single cache.db file at the parent level.

This test should be updated to test SQLite database corruption recovery, or removed if no longer applicable.

Consider updating the test to verify SQLite corruption recovery:

     def test_cache_corruption_recovery(self, setup_test_project):
         """Test recovery from corrupted cache files."""
         project_dir = setup_test_project

         # Create initial cache
         convert_jsonl_to_html(input_path=project_dir, use_cache=True)

-        # Corrupt cache file
-        cache_dir = project_dir / "cache"
-        cache_files = list(cache_dir.glob("*.json"))
-        if cache_files:
-            cache_file = [f for f in cache_files if f.name != "index.json"][0]
-            cache_file.write_text("corrupted json data", encoding="utf-8")
+        # Corrupt SQLite database
+        cache_db = project_dir.parent / "cache.db"
+        if cache_db.exists():
+            cache_db.write_bytes(b"corrupted data")

         # Should recover gracefully
         output = convert_jsonl_to_html(input_path=project_dir, use_cache=True)
         assert output.exists()
test/test_integration_realistic.py (7)

561-583: Test references obsolete per-project cache structure.

test_uses_existing_cache checks for project / "cache" / "index.json" and verifies its mtime, but with SQLite migration, the cache is now stored in cache.db at the parent level.

This test should be updated to work with SQLite or marked as testing legacy behavior that no longer applies.


645-667: Test references obsolete cache/index.json structure.

test_cache_version_mismatch_triggers_rebuild manipulates project / "cache" / "index.json" which no longer exists with SQLite. This test should be updated to test SQLite cache version handling.


669-698: Test references obsolete per-file cache structure.

test_missing_cache_files_regenerated manipulates individual JSON cache files in the cache directory, but SQLite uses a single database file. This test logic no longer applies.


722-753: Test references obsolete cache/index.json for incremental updates.

test_adding_lines_triggers_cache_update checks mtime of project / "cache" / "index.json" to verify cache was updated. With SQLite, should check cache.db or use last_updated from project data.


819-844: Test references obsolete cache structure for new file detection.

test_new_file_detected_and_processed reads and compares cache/index.json content. Should use CacheManager.get_cached_project_data() to verify session count instead.


1162-1180: Test references obsolete cache/index.json for corruption handling.

test_corrupted_cache_index_handled corrupts and checks cache/index.json which no longer exists. Should test SQLite database corruption recovery.


1182-1199: Test references obsolete cache directory structure.

test_missing_cache_directory_handled deletes and checks for project / "cache" directory recreation. With SQLite at the parent level, this test logic no longer applies.

🧹 Nitpick comments (7)
claude_code_log/cli.py (1)

279-284: Consider moving import shutil to the top of the file.

The shutil import is repeated inside conditional blocks. Moving it to the module-level imports would be cleaner and avoid potential repeated import overhead during iteration.

 import logging
 import os
+import shutil
 import sys
 from pathlib import Path

Then remove the inline imports at lines 281 and 299.

Also applies to: 299-302

test/test_cache_sqlite_integrity.py (2)

617-659: Concurrent readers test could be flaky under high load.

The 5-second timeout per thread may not be sufficient in CI environments under load. Consider increasing the timeout or using a more robust synchronization mechanism.

         # Start multiple reader threads
         threads = [threading.Thread(target=read_data) for _ in range(5)]
         for t in threads:
             t.start()
         for t in threads:
-            t.join(timeout=5)
+            t.join(timeout=10)
 
         assert len(errors) == 0, f"Errors occurred: {errors}"
         assert all(results), "Not all reads succeeded"

862-869: time.sleep(1.1) may cause test flakiness.

The 1.1 second sleep assumes mtime resolution of 1 second. On some filesystems (particularly HFS+ on older macOS), mtime resolution is 1 second, but on others it can be finer. The test should still work, but be aware this adds ~1 second to test runtime.

Consider using a mock or explicitly setting the mtime instead of sleeping:

import os
# After writing the file
future_time = time.time() + 2
os.utime(jsonl_file, (future_time, future_time))
claude_code_log/cache_sqlite.py (4)

93-96: Consider logging instead of printing for cache invalidation messages.

Using print() for status messages may pollute stdout in library usage scenarios. Consider using the logging module for consistency with the rest of the codebase.

+import logging
+
+logger = logging.getLogger(__name__)
+
 # In _ensure_project_exists:
-                    print(
-                        f"Cache version incompatible: {cached_version} -> {self.library_version}, invalidating cache"
-                    )
+                    logger.info(
+                        "Cache version incompatible: %s -> %s, invalidating cache",
+                        cached_version,
+                        self.library_version,
+                    )

266-267: Move dateparser import to module level.

The dateparser import is done inside the method. While this may have been intentional for lazy loading, it's imported at the top level in other files per the coding guidelines, and keeping imports consistent aids maintainability.

+import dateparser
 import json
 import sqlite3

Then remove the import at line 267.


279-289: Redundant else branch with identical logic.

Both branches of the if/else for to_dt processing result in the same time adjustment. The else branch can be removed.

         if to_date:
             to_dt = dateparser.parse(to_date)
             if to_dt:
                 if to_date in ["today", "yesterday"] or "days ago" in to_date:
                     to_dt = to_dt.replace(
                         hour=23, minute=59, second=59, microsecond=999999
                     )
-                else:
-                    to_dt = to_dt.replace(
-                        hour=23, minute=59, second=59, microsecond=999999
-                    )

The time is set to end-of-day regardless of the date format, so the condition check is redundant here.


349-369: Consider using executemany for batch inserts.

Inserting entries one at a time in a loop is less efficient than using executemany. For large files with many messages, this could significantly impact performance.

-            # Insert all entries in a batch
-            for entry in entries:
-                serialized = self._serialize_entry(entry, file_id)
-                conn.execute(
-                    """
-                    INSERT INTO messages (
-                        project_id, file_id, type, timestamp, session_id,
-                        ...
-                    )
-                    """,
-                    serialized,
-                )
+            # Insert all entries in a batch
+            serialized_entries = [self._serialize_entry(entry, file_id) for entry in entries]
+            conn.executemany(
+                """
+                INSERT INTO messages (
+                    project_id, file_id, type, timestamp, session_id,
+                    _uuid, _parent_uuid, _is_sidechain, _user_type, _cwd, _version,
+                    _is_meta, _agent_id, _request_id,
+                    input_tokens, output_tokens, cache_creation_tokens, cache_read_tokens,
+                    _leaf_uuid, _level, _operation, content
+                ) VALUES (
+                    :project_id, :file_id, :type, :timestamp, :session_id,
+                    :_uuid, :_parent_uuid, :_is_sidechain, :_user_type, :_cwd, :_version,
+                    :_is_meta, :_agent_id, :_request_id,
+                    :input_tokens, :output_tokens, :cache_creation_tokens, :cache_read_tokens,
+                    :_leaf_uuid, :_level, :_operation, :content
+                )
+                """,
+                serialized_entries,
+            )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adb8d72 and d077277.

📒 Files selected for processing (13)
  • claude_code_log/cache.py (2 hunks)
  • claude_code_log/cache_sqlite.py (1 hunks)
  • claude_code_log/cli.py (3 hunks)
  • claude_code_log/converter.py (2 hunks)
  • claude_code_log/migrations/001_initial_schema.sql (1 hunks)
  • claude_code_log/migrations/__init__.py (1 hunks)
  • claude_code_log/migrations/runner.py (1 hunks)
  • claude_code_log/tui.py (1 hunks)
  • test/test_cache.py (6 hunks)
  • test/test_cache_integration.py (8 hunks)
  • test/test_cache_sqlite_integrity.py (1 hunks)
  • test/test_integration_realistic.py (4 hunks)
  • test/test_project_matching.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

  • test/test_project_matching.py
  • test/test_cache_sqlite_integrity.py
  • test/test_cache_integration.py
  • test/test_integration_realistic.py
  • test/test_cache.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

  • test/test_project_matching.py
  • claude_code_log/tui.py
  • claude_code_log/migrations/__init__.py
  • claude_code_log/cache_sqlite.py
  • test/test_cache_sqlite_integrity.py
  • test/test_cache_integration.py
  • test/test_integration_realistic.py
  • claude_code_log/migrations/runner.py
  • claude_code_log/cli.py
  • test/test_cache.py
  • claude_code_log/cache.py
  • claude_code_log/converter.py
claude_code_log/tui.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Textual for implementing the interactive Terminal User Interface (TUI) in Python

Files:

  • claude_code_log/tui.py
claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

  • claude_code_log/tui.py
  • claude_code_log/cache_sqlite.py
  • claude_code_log/cli.py
  • claude_code_log/cache.py
  • claude_code_log/converter.py
claude_code_log/cli.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Click for CLI interface and argument parsing in Python CLI files

Files:

  • claude_code_log/cli.py
🧠 Learnings (2)
📚 Learning: 2025-11-30T17:16:32.473Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.473Z
Learning: Structure the project with separate modules for parser.py (data extraction), renderer.py (HTML generation), converter.py (high-level orchestration), cli.py (CLI interface), models.py (Pydantic data structures), tui.py (Textual TUI), and cache.py (cache management)

Applied to files:

  • test/test_cache_integration.py
  • test/test_integration_realistic.py
📚 Learning: 2025-11-30T17:16:32.473Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.473Z
Learning: Applies to claude_code_log/models.py : Use Pydantic models for parsing and validating transcript JSON data, including TranscriptEntry (union of UserTranscriptEntry, AssistantTranscriptEntry, SummaryTranscriptEntry), UsageInfo, and ContentItem

Applied to files:

  • claude_code_log/cache.py
🧬 Code graph analysis (9)
test/test_project_matching.py (1)
claude_code_log/cache_sqlite.py (1)
  • get_working_directories (451-465)
claude_code_log/tui.py (3)
test/test_cache.py (1)
  • cache_manager (39-42)
test/test_cache_sqlite_integrity.py (1)
  • cache_manager (37-39)
claude_code_log/cache_sqlite.py (1)
  • get_working_directories (451-465)
claude_code_log/migrations/__init__.py (1)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
test/test_cache_sqlite_integrity.py (2)
claude_code_log/cache.py (1)
  • SessionCacheData (24-37)
test/test_cache.py (2)
  • temp_project_dir (26-29)
  • cache_manager (39-42)
test/test_cache_integration.py (3)
test/test_cache.py (1)
  • cache_manager (39-42)
test/test_cache_sqlite_integrity.py (1)
  • cache_manager (37-39)
claude_code_log/cache_sqlite.py (1)
  • get_cached_project_data (475-547)
test/test_integration_realistic.py (2)
claude_code_log/cache.py (1)
  • get_library_version (69-112)
claude_code_log/cache_sqlite.py (1)
  • get_cached_project_data (475-547)
claude_code_log/cli.py (4)
test/test_cache.py (1)
  • cache_manager (39-42)
test/test_cache_sqlite_integrity.py (1)
  • cache_manager (37-39)
claude_code_log/cache_sqlite.py (2)
  • get_working_directories (451-465)
  • clear_cache (549-557)
claude_code_log/converter.py (1)
  • convert_jsonl_to_html (38-128)
test/test_cache.py (2)
test/test_cache_sqlite_integrity.py (2)
  • cache_manager (37-39)
  • temp_project_dir (28-33)
claude_code_log/cache_sqlite.py (4)
  • is_file_cached (211-232)
  • save_cached_entries (311-372)
  • get_cached_project_data (475-547)
  • clear_cache (549-557)
claude_code_log/converter.py (1)
claude_code_log/cache_sqlite.py (1)
  • get_working_directories (451-465)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.11)
🔇 Additional comments (43)
claude_code_log/migrations/runner.py (6)

1-27: LGTM! Clean utility functions for migration infrastructure.

The helper functions are well-designed: checksum computation uses SHA256, migration number parsing with regex is correct, and the migrations directory discovery is reliable using __file__.


29-56: Consider implications of dropping old _schema_version table.

Dropping the _schema_version table when upgrading from old format will cause all migrations to be reapplied. If data tables already exist, the CREATE TABLE IF NOT EXISTS statements will succeed, but this could cause issues if future migrations have different semantics (e.g., INSERT statements that would duplicate data).

For the current initial schema this is safe since it only uses IF NOT EXISTS. Ensure future migrations account for potential re-application.


135-156: LGTM! Solid migration execution with proper cleanup.

Good use of try/finally for connection cleanup, 30-second timeout for concurrent access, and sorted application order. The PRAGMA foreign_keys = ON is correctly placed before migrations.


59-90: LGTM! Well-structured migration state functions.

Clean separation between applied, available, and pending migrations. The set-based filtering in get_pending_migrations is efficient.


113-133: LGTM! Checksum verification provides good integrity checking.

The verification function properly compares stored vs current checksums and returns actionable warnings.


159-163: LGTM! Handles empty schema version correctly.

Properly returns 0 when no migrations have been applied (row[0] is None).

claude_code_log/migrations/__init__.py (1)

1-5: LGTM! Clean public API for the migrations package.

Properly exposes only run_migrations via __all__, keeping internal functions private.

claude_code_log/migrations/001_initial_schema.sql (4)

5-22: LGTM! Projects table well-designed with appropriate aggregates.

Good use of UNIQUE constraint on project_path and pre-computed token aggregates for efficient querying.


23-37: LGTM! File tracking table properly designed for invalidation.

Using REAL for mtime values is correct for Unix timestamps, and the composite unique constraint prevents duplicate entries per project.


39-61: LGTM! Sessions table supports efficient querying.

The index on cwd (line 61) directly supports the get_working_directories() query shown in the relevant snippets.


63-114: LGTM! Messages table balances normalization with query simplicity.

The flattened schema with prefixed columns for different entry types avoids complex joins while maintaining clarity. The indexes support common access patterns (by timestamp, session, file).

test/test_project_matching.py (3)

29-58: LGTM! Test correctly updated for new get_working_directories() API.

The mock now returns a list directly from get_working_directories(), matching the new SQLite cache interface.


70-89: LGTM! Subdirectory matching test properly updated.

Correctly tests that subdirectory paths match parent project working directories.


114-125: Remove unnecessary CacheManager mock from test.

The test patches CacheManager and mocks get_cached_project_data.return_value = None, but the find_projects_by_cwd implementation doesn't use the cache—it relies solely on path-based matching via convert_project_path_to_claude_dir. The mock is unnecessary and misleading; the test comment about "fallback to project name matching when no cache data available" doesn't reflect the actual implementation. Remove the mock patch to simplify the test and accurately reflect what's being tested.

Likely an incorrect or invalid review comment.

claude_code_log/tui.py (1)

388-393: LGTM! Clean migration to new get_working_directories() API.

The change correctly calls get_working_directories() directly from the cache manager, with proper exception handling preserved.

claude_code_log/converter.py (2)

469-470: LGTM! Correctly uses the new centralized cache API.

The change to retrieve working directories via cache_manager.get_working_directories() aligns with the SQLite cache migration. This method queries distinct cwd values from the sessions table, which is the correct source for this data.


611-611: Consistent with the new cache API pattern.

Same pattern as in _generate_individual_session_files - correctly sources working directories from the cache manager rather than the deprecated cached_project_data.working_directories.

test/test_cache_integration.py (7)

98-113: LGTM! Tests correctly updated for SQLite cache location.

The assertions properly verify that:

  1. The SQLite cache.db is created at the parent (projects) level
  2. Cache data is accessed via CacheManager.get_cached_project_data()
  3. Cache state is validated through the new API

115-139: Test correctly verifies cache clearing via CacheManager API.

The updated docstring and assertions align with the SQLite-based cache behavior where clearing removes data from the database rather than deleting files.


163-173: LGTM! All-projects caching test updated for SQLite.

Correctly verifies the shared cache.db at the projects level and uses CacheManager to validate per-project cache data.


204-212: LGTM! Converter cache test updated correctly.


226-230: LGTM! No-cache test updated for SQLite semantics.

The test correctly verifies that with use_cache=False, the project data remains empty (total_message_count == 0) rather than checking for file absence.


255-265: LGTM! Hierarchy processing test correctly updated.


427-437: LGTM! Version upgrade test updated for SQLite.

Correctly uses get_cached_project_data() to verify project data and version storage.

test/test_integration_realistic.py (3)

242-261: LGTM! Cache clearing test correctly updated for SQLite.

Properly verifies that:

  1. cache.db exists after processing
  2. cache.db is deleted after --clear-cache

415-427: LGTM! Cache creation test correctly migrated to SQLite.

Uses CacheManager.get_cached_project_data() to verify cache contents instead of reading index.json.


465-469: LGTM! Version storage test updated correctly.

Uses the new CacheManager API to retrieve and verify cached version.

claude_code_log/cache.py (2)

2-11: LGTM! Clean module reorganization.

The docstring update clearly explains that this module now provides data models while the actual cache implementation uses SQLite in cache_sqlite.py. The simplified imports are appropriate.


113-125: LGTM! Clean backwards-compatible re-export pattern.

The re-export of SQLiteCacheManager as CacheManager maintains API compatibility for existing consumers. The noqa: E402 comment correctly explains why the import is placed at the bottom (to avoid circular imports between cache.py and cache_sqlite.py).

The __all__ explicitly defines the public API surface, which is good practice.

test/test_cache.py (6)

98-107: LGTM! Initialization tests correctly verify SQLite database location.

The tests properly verify that:

  1. db_path points to parent / "cache.db"
  2. The database file is created on initialization

119-120: LGTM! Uses new is_file_cached() API.

Replaces filesystem-based cache verification with the proper API method.


132-161: LGTM! Direct SQLite verification for internal structure.

Appropriately uses direct database queries to verify the internal storage structure. The query correctly filters by project_id since the database is shared, and properly orders by timestamp with NULLS LAST for summary entries that lack timestamps.


188-201: LGTM! Version compatibility test updated for SQLite.

Uses get_cached_project_data() to verify version storage and compatibility behavior.


258-273: LGTM! Clear cache test properly verifies SQLite state.

The test correctly:

  1. Verifies data exists before clearing
  2. Uses is_file_cached() and get_cached_project_data() to verify cleared state
  3. Checks both cached_files and sessions are empty after clearing

577-587: LGTM! Missing cache entry test updated.

Correctly uses is_file_cached() to check cache state and verifies load_cached_entries() returns None for uncached files.

claude_code_log/cli.py (2)

190-210: LGTM! Clean migration to the new cache API.

The transition from project_cache.working_directories to cache_manager.get_working_directories() is well-implemented. The fallback logic correctly rebuilds the cache when no working directories are found and re-fetches them.


259-268: The review comment is incorrect. The cache.db deletion path is already correct.

When all_projects=True, input_path equals ~/.claude/projects/, which is the projects directory. The SQLiteCacheManager stores the cache at project_path.parent / "cache.db" where project_path is an individual project directory like ~/.claude/projects/myproject/. Therefore, project_path.parent = ~/.claude/projects/, placing the cache at ~/.claude/projects/cache.db.

The deletion code cache_db = input_path / "cache.db" correctly resolves to ~/.claude/projects/cache.db. The suggested diff applying input_path.parent / "cache.db" would incorrectly look for the cache at ~/.claude/cache.db, which is not where it's created.

Likely an incorrect or invalid review comment.

test/test_cache_sqlite_integrity.py (4)

27-39: LGTM! Clean fixture setup.

The fixtures properly create isolated temporary directories and cache managers for testing. The temp_project_dir fixture creates a subdirectory to match expected usage patterns.


89-175: Solid cascade delete test with proper verification.

The test correctly verifies that deleting a project cascades to all related records (cached_files, messages, sessions). The use of raw SQL to verify counts is appropriate for this integrity test.


692-701: Performance assertion threshold looks reasonable.

The 2-second threshold for querying 1000 entries with date filtering is generous enough to avoid flakiness while still catching major performance regressions.


244-284: Well-designed foreign key constraint tests.

These tests properly verify that SQLite foreign key enforcement is working by attempting invalid inserts and expecting IntegrityError. Good coverage of both file_id and project_id constraints.

claude_code_log/cache_sqlite.py (2)

60-70: Connection management is well-implemented.

The use of @contextmanager with proper try/finally ensures connections are always closed. The 30-second timeout, WAL mode, and foreign key enforcement are appropriate for concurrent access patterns.


559-585: Version compatibility logic is sound but breaking_changes dict is empty.

The version compatibility checking infrastructure is well-designed with support for both exact versions and wildcard patterns. The empty breaking_changes dict means all versions are currently compatible, which is correct for an initial implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
claude_code_log/cache_sqlite.py (1)

35-37: CURRENT_SCHEMA_VERSION is unused and can drift from actual schema state

CURRENT_SCHEMA_VERSION is defined but not referenced anywhere; the code relies on the migrations system (run_migrations) for real schema/version tracking. This constant risks going stale and confusing future changes. Consider either wiring it into the migration/version logic or removing it to avoid dead configuration.

claude_code_log/renderer.py (1)

83-92: Project display-name logic aligns with cache-backed working_directories

The updated get_project_display_name:

  • Filters out temp paths via _is_tmp_path before choosing a “best” working directory.
  • Picks the least-nested path (fewest parts), breaking ties by original order.
  • Falls back to transforming project_dir_name (strip leading -, replace - with /) when no usable dirs remain.

This matches how TemplateProject and the new tests expect it to behave, and keeps UI display names stable when running under pytest/macOS temp dirs.

One minor nit: _is_tmp_path currently treats any path containing "pytest-" as temporary. If you ever have real projects whose paths include that substring, you might want to tighten the check (e.g., only when the basename starts with pytest- or when the path is under known temp roots like /tmp/ or /private/var/folders/).

Also applies to: 94-123

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d077277 and 9027134.

📒 Files selected for processing (7)
  • claude_code_log/cache_sqlite.py (1 hunks)
  • claude_code_log/converter.py (5 hunks)
  • claude_code_log/renderer.py (2 hunks)
  • claude_code_log/tui.py (2 hunks)
  • claude_code_log/utils.py (1 hunks)
  • test/__snapshots__/test_snapshot_html.ambr (2 hunks)
  • test/test_project_display_name.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • claude_code_log/tui.py
🧰 Additional context used
📓 Path-based instructions (4)
claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

  • claude_code_log/cache_sqlite.py
  • claude_code_log/utils.py
  • claude_code_log/renderer.py
  • claude_code_log/converter.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

  • claude_code_log/cache_sqlite.py
  • claude_code_log/utils.py
  • test/test_project_display_name.py
  • claude_code_log/renderer.py
  • claude_code_log/converter.py
test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

  • test/test_project_display_name.py
claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Files:

  • claude_code_log/renderer.py
🧠 Learnings (4)
📚 Learning: 2025-11-30T17:16:32.473Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.473Z
Learning: Applies to claude_code_log/models.py : Use Pydantic models for parsing and validating transcript JSON data, including TranscriptEntry (union of UserTranscriptEntry, AssistantTranscriptEntry, SummaryTranscriptEntry), UsageInfo, and ContentItem

Applied to files:

  • claude_code_log/utils.py
📚 Learning: 2025-11-30T17:16:32.473Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.473Z
Learning: Applies to test/test_snapshot_html.py : Use syrupy for HTML snapshot regression testing to detect unintended changes in HTML output

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
📚 Learning: 2025-11-30T17:16:32.473Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.473Z
Learning: Applies to claude_code_log/templates/**/*.html : Use Jinja2 templates for HTML generation, including session navigation with table of contents, message rendering with different content types, and token display for individual messages and session totals

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
📚 Learning: 2025-11-30T17:16:32.473Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.473Z
Learning: Applies to claude_code_log/renderer_timings.py : Implement performance timing instrumentation via the CLAUDE_CODE_LOG_DEBUG_TIMING environment variable to identify performance bottlenecks in rendering phases, with detailed timing for initialization, deduplication, session summary processing, main message loop, and template rendering

Applied to files:

  • claude_code_log/renderer.py
🧬 Code graph analysis (4)
claude_code_log/cache_sqlite.py (3)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (6)
  • AssistantTranscriptEntry (210-213)
  • QueueOperationTranscriptEntry (231-248)
  • SummaryTranscriptEntry (216-220)
  • SystemTranscriptEntry (223-228)
  • UserTranscriptEntry (203-207)
  • parse_transcript_entry (350-433)
claude_code_log/cache.py (4)
  • CachedFileInfo (14-21)
  • ProjectCache (40-66)
  • SessionCacheData (24-37)
  • get_library_version (69-112)
claude_code_log/utils.py (1)
claude_code_log/models.py (1)
  • TextContent (60-62)
test/test_project_display_name.py (1)
claude_code_log/renderer.py (1)
  • get_project_display_name (94-122)
claude_code_log/converter.py (2)
test/test_cache.py (1)
  • cache_manager (39-42)
claude_code_log/cache_sqlite.py (2)
  • get_working_directories (451-465)
  • get_cached_project_data (475-547)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.11)
🔇 Additional comments (5)
test/__snapshots__/test_snapshot_html.ambr (1)

11371-11371: LGTM: Snapshot update reflects test fixture naming.

The snapshot correctly captures the updated page title using a descriptive test identifier (test_multi_session_html0) instead of the generic tmp. This change is consistent across both the <title> tag and the <h1> element, and appears to be an expected side effect of the SQLite cache refactoring.

Based on learnings, syrupy is used for HTML snapshot regression testing to detect unintended changes in HTML output.

Also applies to: 13739-13739

claude_code_log/utils.py (1)

6-6: TextContent-aware length calculation looks correct

Importing TextContent and restricting extract_text_content_length() to sum only TextContent items (or stripped string input) matches how other parts of the code treat ContentItem unions and avoids over-counting tool/thinking/image payloads. No issues from a typing or behavior standpoint.

Also applies to: 103-115

test/test_project_display_name.py (1)

118-140: New tests accurately capture tmp‑path filtering and fallback behavior

The added tests exercise both key branches of get_project_display_name:

  • Prefer non‑tmp working directories when present (pytest/macOS temp paths filtered out).
  • Fall back to project_dir_name transformation when all working_directories look temporary.

This provides good coverage for the new _is_tmp_path logic and the fallback path conversion.

claude_code_log/converter.py (1)

81-87: Working-directory handling now consistently sourced from the cache

Using cache_manager.get_working_directories() in:

  • convert_jsonl_to_html for the main project title,
  • _generate_individual_session_files for per‑session titles, and
  • process_projects_hierarchy for project summaries (both cached and fallback paths),

and ensuring _update_cache_with_session_data(cache_manager, messages) runs in the fallback path, keeps working_directories derived from the same SQLite‑backed session cwd data everywhere. This makes display names and index summaries consistent with the new cache design.

Also applies to: 465-472, 612-613, 641-642, 714-716

claude_code_log/cache_sqlite.py (1)

243-249: SQLite ORDER BY NULLS LAST is supported – no fix required

SQLite has supported NULLS LAST and NULLS FIRST in ORDER BY clauses since SQLite 3.30.0 (October 2019). The project requires Python 3.10+, which bundles SQLite 3.36.0 or newer by default. The code at lines 243–249 and 291–305 uses valid, supported syntax and will not cause runtime SQL errors.

The ORDER BY timestamp NULLS LAST queries are correct and portable to the SQLite versions available in this project's target environment.

Likely an incorrect or invalid review comment.

Comment on lines 569 to 585
cache_ver = version.parse(cache_version)
current_ver = version.parse(self.library_version)

for breaking_version_pattern, min_required in breaking_changes.items():
min_required_ver = version.parse(min_required)

if current_ver >= min_required_ver:
if breaking_version_pattern.endswith(".x"):
major_minor = breaking_version_pattern[:-2]
if str(cache_ver).startswith(major_minor):
return False
else:
breaking_ver = version.parse(breaking_version_pattern)
if cache_ver <= breaking_ver:
return False

return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

packaging.version.parse non-PEP440 InvalidVersion exception behavior

💡 Result:

Short answer: Historically packaging.version.parse returned a LegacyVersion object for many non‑PEP 440 strings; more recently the library moved to strict PEP 440 behavior and now raises packaging.version.InvalidVersion for strings that do not conform. [1][2][3]

Details and sources:

  • Current packaging docs show parse(...) is a thin wrapper that constructs a Version and that Version raises InvalidVersion when the string does not match PEP 440. [1]
  • The LegacyVersion behavior (returning a LegacyVersion for arbitrary strings) used to exist and is discussed in community posts; that behavior has been deprecated and removed in favor of raising InvalidVersion. See the pip/packaging issue discussion about deprecating legacy versions. [2][3]

If you have a specific version string that's being parsed, paste it and I can tell you whether current packaging will return a Version or raise InvalidVersion.

Sources:
[1] packaging — Version Handling documentation. (packaging.pypa.io)
[2] pip/pip / Deprecate legacy versions discussion. (github.com)
[3] StackOverflow and historical examples of LegacyVersion behavior. (stackoverflow.com)

Citations:


🏁 Script executed:

#!/bin/bash
# Find the get_library_version method and see what it returns
rg -A 10 "def get_library_version" claude_code_log/

Repository: daaain/claude-code-log

Length of output: 736


🏁 Script executed:

#!/bin/bash
# Look for existing exception handling around version.parse calls
rg -B 3 -A 3 "version\.parse" claude_code_log/

Repository: daaain/claude-code-log

Length of output: 1397


🏁 Script executed:

#!/bin/bash
# Search for InvalidVersion exception handling patterns
rg "InvalidVersion" claude_code_log/

Repository: daaain/claude-code-log

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Get full get_library_version() implementation
rg -A 30 "def get_library_version" claude_code_log/cache.py

Repository: daaain/claude-code-log

Length of output: 1275


🏁 Script executed:

#!/bin/bash
# Look at the _is_cache_version_compatible method to understand cache_version source
rg -B 5 "_is_cache_version_compatible" claude_code_log/cache_sqlite.py | head -50

Repository: daaain/claude-code-log

Length of output: 547


🏁 Script executed:

#!/bin/bash
# Find where _is_cache_version_compatible is called
rg -B 2 -A 2 "_is_cache_version_compatible\(" claude_code_log/

Repository: daaain/claude-code-log

Length of output: 890


🏁 Script executed:

#!/bin/bash
# Get the complete get_library_version() implementation including final fallback
rg -A 50 "def get_library_version" claude_code_log/cache.py | head -80

Repository: daaain/claude-code-log

Length of output: 1918


Harden cache-version compatibility against malformed version strings

_is_cache_version_compatible() calls version.parse(cache_version) and version.parse(self.library_version) directly. Since get_library_version() explicitly returns "unknown" as a fallback (when package metadata is unavailable), and cache_version comes from the database and could be any string from previous installations, packaging.version.parse() will raise InvalidVersion for non-PEP 440 strings, breaking cache initialization instead of gracefully invalidating the cache.

Wrap the parse calls in a try-except block:

-        cache_ver = version.parse(cache_version)
-        current_ver = version.parse(self.library_version)
+        try:
+            cache_ver = version.parse(cache_version)
+            current_ver = version.parse(self.library_version)
+        except Exception:
+            # On invalid version strings, treat cache as incompatible and rebuild.
+            return False

This ensures cache initialization is robust against malformed version strings from older databases or failed installations.

🤖 Prompt for AI Agents
In claude_code_log/cache_sqlite.py around lines 569 to 585, version.parse is
called directly on cache_version and self.library_version which can be
non-PEP440 (e.g. "unknown" or corrupted DB values) and will raise
InvalidVersion; wrap the parsing in a try/except around both parse calls, catch
packaging.version.InvalidVersion (and Exception as a fallback), and treat any
parse error as an incompatible cache by returning False (i.e., invalidate the
cache) so initialization continues robustly.

@cboos
Copy link
Collaborator

cboos commented Dec 6, 2025

While in general I think it's a good move, I have one concern with the messages table.

This currently "freezes" the model we have, and if we make that evolve, the DB schema will have to evolve as well, with possibly non-trivial changes needed. I'd like to continue (and eventually finish!) some major refactoring in this area, with a clean separation between the parsed content data model and the rendered content data model. And I'm not exactly sure yet where this is going to lead me... So, either we wait a bit for my changes and freeze the messages after that (if you agree on the changes, that is, as I think they will be pretty extensive), or I'll continue my branch in parallel and at the end tell Claude to adapt the cache to the new data model (which actually I trust he'll be able to do anyway). So, both routes are fine to me.

@daaain
Copy link
Owner Author

daaain commented Dec 6, 2025

I'm still working on this as I'm still not happy with the performance, so it's totally fine holding it back until you've done the refactoring.

But also, I set up the migrations to make it possible to evolve the schema. We can even write SQL statements that transform existing data so it wouldn't even be necessary to regenerate the cache in most cases.

@daaain
Copy link
Owner Author

daaain commented Dec 7, 2025

I managed to make unchanged projects evaluated super quick by adding a table that has HTML generation metadata, but now there's a big question highlighted by Claude:

The timing reveals the bottleneck:

Combined transcript (165MB, 171 sessions): 12.8s
- Message processing (markdown + pygments): ~11s
- Template rendering: 1.3s

Single session (5.8MB, 1 session): 0.77s

The problem: When 1 file changes, we regenerate the ENTIRE 165MB combined
HTML, which requires:

  1. Loading all 28K messages from SQLite
  2. Re-rendering markdown + syntax highlighting for ALL messages
  3. Writing 165MB of HTML

The single session file only takes 0.77s - that's the actual work needed.

Potential optimizations:

  1. Skip combined regeneration for small changes - If only 1 session
    changed, maybe don't regenerate combined HTML (or make it optional)
  2. Cache rendered message HTML - Store pre-rendered HTML in the database,
    only re-render messages whose content changed
  3. Lazy combined HTML - Only regenerate combined when explicitly requested,
    or on a separate schedule

This massive combined session HTML is quite laggy in the browser even on my beefy M2 Max Mac, so I'm thinking about introducing pagination.

That would make it impossible to search all sessions ever (at least in the static page itself, we could introduce a server building on the SQLite db), but I'm guessing people would mostly want to look in a certain time interval anyway (like I would know if something happened in the last few months or more than a year ago for example).

So I'm thinking about paginating every 5K(?) messages on session boundaries (so create a new page each time a session goes over the message number limit). Maybe have 5K as the default but make it configurable.

@daaain daaain mentioned this pull request Dec 7, 2025
@daaain
Copy link
Owner Author

daaain commented Dec 7, 2025

I did pagination and it's so much smoother and faster, actually decreased the default page size to 2K messages as 5K was still lots of sessions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
claude_code_log/templates/components/page_nav_styles.css (1)

68-74: Remove unnecessary pseudo-element definitions.

The .page-nav-link.prev::before and .page-nav-link.next::after rules define pseudo-elements with empty content but add no styling. The arrow glyphs come from HTML text ("← Previous", "Next →"), not CSS. These rules appear to be unused boilerplate.

Apply this diff to remove them:

-.page-nav-link.prev::before {
-    content: '';
-}
-
-.page-nav-link.next::after {
-    content: '';
-}
test/test_pagination.py (1)

74-76: Assertion may not verify ordering.

Using sorted(pages[0]) to compare means this test won't catch if the function returns sessions in wrong order. Since test_sessions_sorted_chronologically explicitly tests ordering, this may be intentional for this test case, but consider whether order matters here.

If order doesn't matter for this test, the current assertion is fine. If it should preserve insertion order, use:

-        assert sorted(pages[0]) == ["s1", "s2", "s3"]
+        assert pages[0] == ["s1", "s2", "s3"]
claude_code_log/cli.py (2)

436-441: Consider adding page_size validation.

The --page-size option accepts any integer, including zero or negative values, which could cause issues downstream.

Add a callback or use Click's built-in validation:

 @click.option(
     "--page-size",
     type=int,
     default=2000,
+    callback=lambda ctx, param, value: value if value > 0 else ctx.fail("--page-size must be positive"),
     help="Maximum messages per page for combined transcript (default: 2000). Sessions are never split across pages.",
 )

Alternatively, use click.IntRange(min=1) as the type.


259-268: Direct cache.db deletion may fail if database is in use.

If another process has the SQLite database open, unlink() may fail on some platforms (especially Windows). The current exception handling logs a warning, which is acceptable, but consider using a more graceful approach.

The current warning-based approach is reasonable for a CLI tool. If more robustness is needed, you could add retry logic or instruct users to ensure no other processes are accessing the cache.

claude_code_log/renderer.py (1)

83-92: Temp‑path filtering in get_project_display_name is Unix‑centric.

_is_tmp_path only matches Unix/macOS patterns (/tmp/, /var/folders/, pytest-, etc.). On Windows, typical temp dirs (e.g. %TEMP%, C:\\Users\\...\\AppData\\Local\\Temp) won’t be filtered, so “noisy” working directories may still leak into display names.

If Windows support matters, consider extending _is_tmp_path to:

  • Normalize paths with Path(path).as_posix().lower(); and
  • Also match common Windows temp patterns (e.g. \\AppData\\Local\\Temp\\, \\Temp\\).

Otherwise the logic looks good and should produce much nicer project names when pytest/tmp dirs are present.

Please validate behavior on any non‑Unix environments you care about.

Also applies to: 94-123

claude_code_log/converter.py (1)

867-885: Good incremental session HTML regeneration; consider wiring regen count into stats.

_generate_individual_session_files now:

  • Uses CacheManager.is_html_stale(session_html_name, session_id) plus is_html_outdated to avoid unnecessary per‑session regeneration.
  • Updates html_cache with the pre‑dedup message count (from cache) to keep is_html_stale consistent.
  • Returns regenerated_count.

In process_projects_hierarchy, GenerationStats.sessions_regenerated is currently set from len(stale_sessions) computed before calling convert_jsonl_to_html, not from the actual regenerated_count. If you care about accurate reporting, you might want to plumb the returned count back into stats.

Functionally this is fine; it’s just a stats accuracy improvement.

Also applies to: 934-985

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9027134 and eef048d.

📒 Files selected for processing (11)
  • claude_code_log/cache.py (4 hunks)
  • claude_code_log/cli.py (7 hunks)
  • claude_code_log/converter.py (17 hunks)
  • claude_code_log/migrations/002_html_cache.sql (1 hunks)
  • claude_code_log/migrations/003_html_pagination.sql (1 hunks)
  • claude_code_log/renderer.py (10 hunks)
  • claude_code_log/templates/components/page_nav_styles.css (1 hunks)
  • claude_code_log/templates/transcript.html (1 hunks)
  • test/__snapshots__/test_snapshot_html.ambr (5 hunks)
  • test/test_html_regeneration.py (4 hunks)
  • test/test_pagination.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
claude_code_log/templates/**/*.html

📄 CodeRabbit inference engine (CLAUDE.md)

Use Jinja2 templates for HTML generation, including session navigation with table of contents, message rendering with different content types, and token display for individual messages and session totals

Files:

  • claude_code_log/templates/transcript.html
claude_code_log/cli.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Click for CLI interface and argument parsing in Python CLI files

Files:

  • claude_code_log/cli.py
claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

  • claude_code_log/cli.py
  • claude_code_log/cache.py
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

  • claude_code_log/cli.py
  • test/test_html_regeneration.py
  • test/test_pagination.py
  • claude_code_log/cache.py
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

  • test/test_html_regeneration.py
  • test/test_pagination.py
claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Files:

  • claude_code_log/renderer.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Generate cache-aware HTML output with automatic cache validation and leverage existing cache system for fast loading
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/templates/**/*.html : Use Jinja2 templates for HTML generation, including session navigation with table of contents, message rendering with different content types, and token display for individual messages and session totals

Applied to files:

  • claude_code_log/templates/transcript.html
  • test/__snapshots__/test_snapshot_html.ambr
  • claude_code_log/converter.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/cli.py : Use Click for CLI interface and argument parsing in Python CLI files

Applied to files:

  • claude_code_log/cli.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to test/test_snapshot_html.py : Use syrupy for HTML snapshot regression testing to detect unintended changes in HTML output

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
  • test/test_html_regeneration.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Group messages chronologically by timestamp across all sessions and process entire project hierarchies with linked index pages

Applied to files:

  • claude_code_log/migrations/003_html_pagination.sql
  • claude_code_log/converter.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/models.py : Use Pydantic models for parsing and validating transcript JSON data, including TranscriptEntry (union of UserTranscriptEntry, AssistantTranscriptEntry, SummaryTranscriptEntry), UsageInfo, and ContentItem

Applied to files:

  • claude_code_log/cache.py
  • claude_code_log/converter.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Structure the project with separate modules for parser.py (data extraction), renderer.py (HTML generation), converter.py (high-level orchestration), cli.py (CLI interface), models.py (Pydantic data structures), tui.py (Textual TUI), and cache.py (cache management)

Applied to files:

  • claude_code_log/converter.py
  • claude_code_log/renderer.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/renderer.py : Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Applied to files:

  • claude_code_log/renderer.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/renderer_timings.py : Implement performance timing instrumentation via the CLAUDE_CODE_LOG_DEBUG_TIMING environment variable to identify performance bottlenecks in rendering phases, with detailed timing for initialization, deduplication, session summary processing, main message loop, and template rendering

Applied to files:

  • claude_code_log/renderer.py
🧬 Code graph analysis (2)
test/test_pagination.py (2)
claude_code_log/cache.py (10)
  • CacheManager (168-1184)
  • SessionCacheData (39-52)
  • get_page_count (1173-1184)
  • get_page_size_config (900-914)
  • update_page_cache (1010-1091)
  • get_page_data (916-958)
  • is_page_stale (1093-1144)
  • invalidate_all_pages (1146-1171)
  • get_all_pages (960-1008)
  • get_library_version (119-162)
claude_code_log/converter.py (3)
  • _get_page_html_path (106-113)
  • _assign_sessions_to_pages (116-150)
  • convert_jsonl_to_html (338-513)
claude_code_log/cache.py (4)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (4)
  • AssistantTranscriptEntry (210-213)
  • QueueOperationTranscriptEntry (231-248)
  • SystemTranscriptEntry (223-228)
  • UserTranscriptEntry (203-207)
claude_code_log/converter.py (1)
  • summary (68-103)
claude_code_log/renderer.py (1)
  • is_html_outdated (154-165)
🔇 Additional comments (19)
claude_code_log/templates/transcript.html (1)

20-20: Page navigation UI implementation looks solid.

The conditional rendering approach is clean, and the placement before message content is sensible. The template properly guards against missing page_info/page_stats.

Please verify that the calling code in renderer.py and converter.py always populates page_info and page_stats with the expected structure:

  • page_info (dict): page_number, prev_link, next_link
  • page_stats (dict): message_count, date_range, token_summary (optional)

This guards against KeyError if the Python code is refactored later.

Also applies to: 27-51

claude_code_log/templates/components/page_nav_styles.css (1)

3-66: CSS styling is clean and well-structured.

The flexbox layout, color scheme via CSS variables, and hover transitions all look good. Proper spacing and visual hierarchy throughout.

Ensure that the five CSS variables referenced are defined in global_styles.css or another globally-loaded stylesheet:

  • --system-warning-color
  • --text-muted
  • --border-light
  • --bg-hover
  • --session-bg-dimmed
test/__snapshots__/test_snapshot_html.ambr (5)

4219-4292: Snapshot styles updated for page navigation (Test Session HTML).

The added .page-navigation block and related classes are coherent and self-contained; including them in this snapshot correctly reflects the new paginated transcript UI without affecting existing content.


8974-9047: Consistent page navigation styles in Edge Cases snapshot.

This hunk mirrors the same page-navigation CSS used elsewhere, which keeps the Edge Cases snapshot in sync with the shared pagination styling while remaining non-invasive for the page structure.


11523-11523: More descriptive multi-session <title> text.

Updating the <title> to Claude Transcripts - test_multi_session_html0 makes the snapshot more specific and consistent with the underlying test case naming.


13887-13960: Multi-session snapshot now includes nav styles and matching H1.

The page-navigation CSS matches other snapshots, and the <h1 id="title">Claude Transcripts - test_multi_session_html0</h1> correctly aligns the heading with the updated <title>, keeping the syrupy HTML snapshot aligned with the Jinja templates. Based on learnings, this matches the intended snapshot behavior.

Also applies to: 13965-13967


18780-18853: Page navigation styles added to Test Transcript snapshot.

Adding the shared .page-navigation styles here keeps this snapshot consistent with the global layout and pagination styling, with no impact on existing markup or semantics.

claude_code_log/migrations/003_html_pagination.sql (1)

1-39: Well-structured pagination schema.

The migration correctly implements the pagination tables with appropriate constraints:

  • Foreign keys with CASCADE deletion ensure referential integrity
  • UNIQUE constraints prevent duplicate page numbers and session mappings
  • Indexes cover the expected query patterns (project lookup, page-session joins, session lookup)

The nullable first_timestamp/last_timestamp fields handle edge cases appropriately.

claude_code_log/migrations/002_html_cache.sql (1)

1-18: Clean HTML cache schema for incremental regeneration.

The table design appropriately tracks HTML generation metadata with:

  • Nullable source_session_id distinguishing combined vs session-specific HTML
  • message_count for cache invalidation sanity checks
  • library_version enabling version-based regeneration
test/test_html_regeneration.py (2)

51-57: Good backward-compatible assertion.

The flexible check for either the legacy per-file skip message or the new early-exit message ensures the test works correctly as the implementation evolves.


319-492: Comprehensive test coverage for incremental HTML cache.

The TestIncrementalHtmlCache class provides thorough coverage:

  • test_html_cache_tracks_generation: Validates cache entries for both combined and session HTML
  • test_incremental_regeneration_only_updates_changed_sessions: Confirms only modified sessions are regenerated
  • test_html_cache_detects_library_version_change: Verifies version-based staleness detection
  • test_html_cache_message_count_validation: Tests session staleness when message counts change
test/test_pagination.py (2)

351-394: Well-designed test helper for session message generation.

The _create_session_messages helper creates realistic alternating user/assistant messages with proper timestamps, session IDs, and usage data. This provides good test data for integration tests.


397-544: Thorough integration tests for pagination.

The integration tests cover key scenarios:

  • Small projects without pagination
  • Large projects creating multiple pages
  • Page size change triggering regeneration
  • Very small page sizes respecting session boundaries
  • Navigation links and stats in paginated output
claude_code_log/cli.py (2)

190-210: Good migration to new cache API.

The relative matching logic correctly uses cache_manager.get_working_directories() and handles the case where cache needs to be built by calling convert_jsonl_to_html before re-fetching directories.


586-593: Correct propagation of page_size to process_projects_hierarchy.

The page_size parameter is properly passed through to the hierarchy processing function.

claude_code_log/renderer.py (1)

200-248: Pygments/Markdown caching changes look correct and should reduce render overhead.

The introduction of _get_cached_lexer, _get_markdown_formatter, _get_cached_formatter, and _get_markdown_renderer removes repeated lexer/formatter construction and repeated mistune initialization:

  • _create_pygments_plugin now calls the cached helpers instead of re‑creating lexers/formatters.
  • _highlight_code_with_pygments uses the cached formatter and your precomputed lexer map.

Given the existing timing hooks (_markdown_timings, _pygments_timings), this is a solid and idiomatic optimization for the renderer’s hot path.

Also applies to: 281-307, 690-778

claude_code_log/converter.py (2)

1051-1120: Exception handler can crash when no previous project exists.

In process_projects_hierarchy’s except block:

print(
    f"Warning: Failed to process {project_dir}: {e}\n"
    f"Previous (in alphabetical order) file before error: {project_summaries[-1]}"
    f"\n{traceback.format_exc()}"
)

If the first project in the sorted list fails, project_summaries is still empty and project_summaries[-1] will raise IndexError, masking the original exception and breaking error reporting.

Safer pattern:

-        print(
-            f"Warning: Failed to process {project_dir}: {e}\n"
-            f"Previous (in alphabetical order) file before error: {project_summaries[-1]}"
-            f"\n{traceback.format_exc()}"
-        )
+        previous = project_summaries[-1] if project_summaries else "<none>"
+        print(
+            f"Warning: Failed to process {project_dir}: {e}\n"
+            f"Previous (in alphabetical order) project summary: {previous}"
+            f"\n{traceback.format_exc()}"
+        )

This preserves the diagnostic context without risking a second failure.

Also applies to: 1263-1271

⛔ Skipped due to learnings
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Structure the project with separate modules for parser.py (data extraction), renderer.py (HTML generation), converter.py (high-level orchestration), cli.py (CLI interface), models.py (Pydantic data structures), tui.py (Textual TUI), and cache.py (cache management)

381-402: The review comment references code and methods that do not exist in the repository.

The mentioned file claude_code_log/converter.py, the code snippet shown (lines 381-402 and 435-487), and the referenced methods (_generate_paginated_html, is_page_stale, use_pagination, cache_manager.is_html_stale(), cache_manager.get_stale_sessions()) cannot be found in the codebase. The claude_code_log directory itself does not exist in the repository. The concern about pagination logic and page-level cache staleness checks is therefore not applicable to the current codebase.

Likely an incorrect or invalid review comment.

claude_code_log/cache.py (1)

372-378: Remove incorrect SQLite syntax concern—NULLS LAST is supported.

SQLite has supported the NULLS LAST modifier in ORDER BY clauses since version 3.30.0 (released October 2019). The syntax used at lines 374 and 433 is valid and will execute correctly on modern SQLite versions.

Since the project targets Python 3.10+ and does not specify a legacy SQLite version requirement, the code is compatible with contemporary environments where SQLite 3.30.0+ is standard.

Comment on lines 268 to 681
# Build page_info for navigation
has_prev = page_num > 1
# Pre-enable next link if this page exceeds threshold (anticipating future pages)
# or if there are more pages
page_exceeds_threshold = page_message_count > page_size
has_next = page_num < len(pages) or page_exceeds_threshold

page_info = {
"page_number": page_num,
"prev_link": _get_page_html_path(page_num - 1) if has_prev else None,
"next_link": _get_page_html_path(page_num + 1) if has_next else None,
}

# Build page_stats
date_range = ""
if first_timestamp and last_timestamp:
first_fmt = format_timestamp(first_timestamp)
last_fmt = format_timestamp(last_timestamp)
if first_fmt == last_fmt:
date_range = first_fmt
else:
date_range = f"{first_fmt} - {last_fmt}"
elif first_timestamp:
date_range = format_timestamp(first_timestamp)

token_parts: List[str] = []
if total_input_tokens:
token_parts.append(f"Input: {total_input_tokens:,}")
if total_output_tokens:
token_parts.append(f"Output: {total_output_tokens:,}")
if total_cache_creation_tokens:
token_parts.append(f"Cache Create: {total_cache_creation_tokens:,}")
if total_cache_read_tokens:
token_parts.append(f"Cache Read: {total_cache_read_tokens:,}")
token_summary = " | ".join(token_parts) if token_parts else None

Copy link

@coderabbitai coderabbitai bot Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

has_next can point to a non‑existent next page.

In _generate_paginated_html:

page_exceeds_threshold = page_message_count > page_size
has_next = page_num < len(pages) or page_exceeds_threshold
...
"next_link": _get_page_html_path(page_num + 1) if has_next else None,

Because _assign_sessions_to_pages closes a page once current_count > page_size, it’s possible for the last page to have page_message_count > page_size. In that case:

  • page_num == len(pages)
  • page_exceeds_threshold is True
  • has_next becomes True, but page page_num + 1 is never generated

This yields a “Next” link to a missing HTML file.

Unless you explicitly want a pre‑wired link to a future page (which static HTML can’t satisfy yet), it’s safer to base has_next solely on len(pages):

-        page_exceeds_threshold = page_message_count > page_size
-        has_next = page_num < len(pages) or page_exceeds_threshold
+        has_next = page_num < len(pages)

or, if you want to retain the threshold concept, guard it so it never creates a link beyond len(pages).

🤖 Prompt for AI Agents
In claude_code_log/converter.py around lines 268 to 303, has_next can be True
when page_num == len(pages) because page_exceeds_threshold may be True,
producing a next_link to a non-existent page; change the logic to only create a
next link when the next page actually exists (e.g., set has_next = page_num <
len(pages) or, if keeping the threshold concept, ensure you also check page_num
+ 1 <= len(pages) before assigning next_link) so next_link never points beyond
the last generated page.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an optimisation to not have to regenerate the last page. Do you have a better idea how to handle this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daaain I understand the optimization goal—avoiding regeneration of the last page when new content arrives. However, creating a "next" link to a non-existent page causes a 404 error for users, which is a significant UX issue.

Here are some better approaches:

1. Accept one-page regeneration (recommended)

has_next = page_num < len(pages)

When new messages arrive and trigger a new page, simply regenerate the previous last page to add the next link. The performance cost is minimal (one page), and you get reliable navigation without broken links.

2. Generate a placeholder page
When page_exceeds_threshold is true for the last page, generate an empty page N+1:

if page_num == len(pages) and page_exceeds_threshold:
    # Generate placeholder page with "No content yet" message
    # This keeps the link valid but shows users it's empty

3. Use a conditional indicator
Instead of a clickable link, show different UI when threshold is exceeded on the last page:

if page_num < len(pages):
    # Normal next link
elif page_exceeds_threshold:
    # Show "More content coming soon" text (non-clickable)

My recommendation: Go with option 1. The cost of regenerating one page when new content arrives is negligible compared to the confusion and broken experience of 404 links. Static HTML pages should have reliable navigation—users won't understand why a "Next" link doesn't work.

The current threshold check is useful for detecting when a page is full, but it shouldn't create navigation to pages that don't exist yet. The regeneration on content update is the right trade-off here.


🧠 Learnings used
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Group messages chronologically by timestamp across all sessions and process entire project hierarchies with linked index pages

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, instead of regenerating the page in full, we can maybe simply edit that previous file in place, to update its next page link (like, remove a "last-page" class that would hide it when present).

Besides, instead of a threshold in the number of messages (not necessarily a good indicator of the size), what do you think about the idea of using the "compaction messages" as delimiters? They're sort of guaranteed to occur regularly in long sessions, and after a relatively constant amount of tokens (unless there were manual /compact or /clear commands, well, those could also be used for delimiting pages?).

The only downside I see is that it's less obvious to detect at the SQL level (and potentially brittle), but I think the idea is worth exploring.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a class that can be removed in a targeted way with a simple string search or regex!

Not sure about the compaction messages, we can look at them and have them as an optional page breaking point if that creates a better page size, but I rarely let sessions get to that point, prefer to let Claude generate a handover and paste it into a new context.

With the number of messages, I was thinking if we're taking 1000s on each page, the lengths will average out? But we could count characters too, that would probably be more precise?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think /clear creates a new session?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, /clear creates a new session (and I should probably use that more often :P).

I looked back at some of the long sessions I have, and using the compaction message as a separator is not that a great idea: it happens too often.

Comment on lines 1000 to 1457
silent: bool = True,
page_size: int = 2000,
) -> Path:
"""Process the entire ~/.claude/projects/ hierarchy and create linked HTML files."""
"""Process the entire ~/.claude/projects/ hierarchy and create linked HTML files.
Args:
projects_path: Path to the projects directory
from_date: Optional date filter start
to_date: Optional date filter end
use_cache: Whether to use SQLite cache
generate_individual_sessions: Whether to generate per-session HTML files
silent: If True, suppress verbose per-file logging (show summary only)
page_size: Maximum messages per page for combined transcript pagination
"""
Copy link

@coderabbitai coderabbitai bot Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

silent flag doesn’t fully suppress per‑project logging.

process_projects_hierarchy documents:

silent: If True, suppress verbose per-file logging (show summary only)

But it unconditionally prints per‑project lines:

print(f"  {project_dir.name}: cached ({stats.total_time:.1f}s)")
...
print(f"  {project_dir.name}: {detail} ({stats.total_time:.1f}s)")

and the final summary is also always printed.

If you want silent=True to show only the final summary (or truly nothing), wrap the per‑project prints in if not silent: and decide whether the global summary should also respect silent. For example:

-                print(f"  {project_dir.name}: cached ({stats.total_time:.1f}s)")
+                if not silent:
+                    print(f"  {project_dir.name}: cached ({stats.total_time:.1f}s)")
...
-                print(f"  {project_dir.name}: {detail} ({stats.total_time:.1f}s)")
+                if not silent:
+                    print(f"  {project_dir.name}: {detail} ({stats.total_time:.1f}s)")

This will align behavior with the parameter’s documentation.

Also applies to: 1086-1120, 1287-1304

🤖 Prompt for AI Agents
In claude_code_log/converter.py around lines 1000-1013 (and also address similar
prints at 1086-1120 and 1287-1304): the function documents that silent=True
should suppress verbose per-file logging, but there are unconditional print()
calls for each project and for the final summary. Wrap the per-project print
statements in an if not silent: block so they only run when verbose is desired,
and decide whether the final summary should respect silent (wrap it in if not
silent or add a separate summary_only flag); ensure all similar print
occurrences in the other indicated line ranges are handled the same way to match
the documented behavior.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-project lines are the summaries so this is as intended

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daaain, thank you for the clarification! That makes sense—the per-project lines serve as the summary output, while silent=True suppresses the more granular per-file logging that happens inside convert_jsonl_to_html. The current behavior aligns with the documented intent. I appreciate you explaining the design decision!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@daaain daaain force-pushed the feat/sqlite-cache branch from eef048d to 094489c Compare December 9, 2025 00:18
@daaain
Copy link
Owner Author

daaain commented Dec 9, 2025

This is now shaping up really nicely, did a rebase and fixed some bits that broke, now it takes less than 2 seconds to rescan all my projects which fills the SQLite db to almost 900MB 😅

(claude-code-log) ➜  claude-code-log git:(feat/sqlite-cache) ✗ j cli
uv run claude-code-log 
Processing all projects in /Users/dain/.claude/projects...
  -Users-dain: cached (0.0s)
  -Users-dain-Desktop: cached (0.0s)
  -Users-dain-Obsidian: cached (0.0s)
  -Users-dain-truenas-backup: cached (0.0s)
  -Users-dain-workspace-JSSoundRecorder: cached (0.0s)
  -Users-dain-workspace-browser-llm-script-interaction-extension: cached (0.1s)
  -Users-dain-workspace-claude-code-log: cached (0.3s)
  -Users-dain-workspace-coderabbit-review-helper: cached (0.0s)
  -Users-dain-workspace-danieldemmel-me-next: cached (0.0s)
  ...
Processed 24 projects in 1.8s
Successfully processed 24 projects and created index at /Users/dain/.claude/projects/index.html

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
test/test_cache_integration.py (1)

395-411: Test uses obsolete JSON cache pattern.

This test still references the old per-project JSON cache (cache_dir.glob("*.json")), but the codebase has migrated to SQLite. The test will:

  1. Not find any JSON cache files (since SQLite is used)
  2. Skip the corruption logic entirely via the if cache_files: guard
  3. Pass without actually testing corruption recovery

Update this test to corrupt the SQLite database instead:

     def test_cache_corruption_recovery(self, setup_test_project):
         """Test recovery from corrupted cache files."""
         project_dir = setup_test_project

         # Create initial cache
         convert_jsonl_to_html(input_path=project_dir, use_cache=True)

-        # Corrupt cache file
-        cache_dir = project_dir / "cache"
-        cache_files = list(cache_dir.glob("*.json"))
-        if cache_files:
-            cache_file = [f for f in cache_files if f.name != "index.json"][0]
-            cache_file.write_text("corrupted json data", encoding="utf-8")
+        # Corrupt SQLite cache database
+        cache_db = project_dir.parent / "cache.db"
+        if cache_db.exists():
+            cache_db.write_bytes(b"corrupted database content")

         # Should recover gracefully
         output = convert_jsonl_to_html(input_path=project_dir, use_cache=True)
         assert output.exists()
test/test_integration_realistic.py (4)

561-698: Tests still use obsolete JSON cache pattern.

TestCacheHTMLPersistence references the old cache/index.json pattern throughout:

  • Line 567: cache_file = project / "cache" / "index.json"
  • Line 653: cache_index = project / "cache" / "index.json"
  • Lines 675-698: References cache_dir and per-file JSON caches

These tests will likely fail or be skipped since the SQLite migration removed JSON cache files.

Update these tests to use CacheManager for cache validation:

# Instead of:
cache_index = project / "cache" / "index.json"
cache_data = json.loads(cache_index.read_text())
assert cache_data["version"] == get_library_version()

# Use:
cache_manager = CacheManager(project, get_library_version())
cached_data = cache_manager.get_cached_project_data()
assert cached_data is not None
assert cached_data.version == get_library_version()

719-753: Test uses obsolete JSON cache pattern.

test_adding_lines_triggers_cache_update references cache_index = project / "cache" / "index.json" which no longer exists with SQLite caching. The test will be skipped due to the if not cache_index.exists() guard.

Update to use CacheManager:

-        cache_index = project / "cache" / "index.json"
-        if not cache_index.exists():
-            pytest.skip("Cache not generated by fixture")
-
-        original_cache_mtime = cache_index.stat().st_mtime
+        cache_manager = CacheManager(project, get_library_version())
+        original_data = cache_manager.get_cached_project_data()
+        if not original_data:
+            pytest.skip("Cache not generated by fixture")
+
+        original_message_count = original_data.total_message_count
         # ... after modification ...
-        assert cache_index.stat().st_mtime > original_cache_mtime
+        new_data = cache_manager.get_cached_project_data()
+        assert new_data.total_message_count > original_message_count

819-844: Test uses obsolete JSON cache pattern.

test_new_file_detected_and_processed references cache_index = project / "cache" / "index.json" and reads JSON directly. This needs updating for SQLite.

Similar fix needed - use CacheManager to validate session counts:

-        cache_index = project / "cache" / "index.json"
-        if not cache_index.exists():
-            pytest.skip("Cache not generated by fixture")
-
-        original_cache = json.loads(cache_index.read_text())
-        original_session_count = len(original_cache.get("sessions", {}))
+        cache_manager = CacheManager(project, get_library_version())
+        original_data = cache_manager.get_cached_project_data()
+        if not original_data:
+            pytest.skip("Cache not generated by fixture")
+
+        original_session_count = len(original_data.sessions)
         # ... after adding new file ...
-        new_cache = json.loads(cache_index.read_text())
-        assert len(new_cache.get("sessions", {})) > original_session_count
+        # Re-fetch to get updated data
+        cache_manager = CacheManager(project, get_library_version())
+        new_data = cache_manager.get_cached_project_data()
+        assert len(new_data.sessions) > original_session_count

1162-1199: Tests reference obsolete cache directory structure.

TestErrorHandlingRealistic tests still reference cache/index.json and cache directory structure that no longer exists:

  • Line 1168: cache_index = project / "cache" / "index.json"
  • Line 1188: cache_dir = project / "cache"
  • Line 1198-1199: Assertions on cache_dir and index.json

These tests need updating to validate SQLite cache corruption/recovery rather than JSON files.

♻️ Duplicate comments (3)
claude_code_log/converter.py (2)

1023-1340: silent flag doesn't suppress per-project and summary output.

The docstring on line 1034 states silent=True should "suppress verbose per-file logging (show summary only)", but lines 1119 and 1152 unconditionally print per-project progress, and lines 1334–1340 unconditionally print the global summary. Users passing silent=True still see all this output.

This is the same issue flagged in a previous review and remains unfixed.

Wrap per-project prints in if not silent: guards:

-                print(f"  {project_dir.name}: cached ({stats.total_time:.1f}s)")
+                if not silent:
+                    print(f"  {project_dir.name}: cached ({stats.total_time:.1f}s)")

Apply the same guard to line 1152 and decide whether lines 1334–1340 (global summary) should also respect silent or remain unconditional. Based on the docstring, you may want a separate flag (e.g., show_summary) if you always want the final summary regardless of silent.

Also applies to: 1152-1152, 1334-1340


269-280: has_next logic can point to non-existent pages.

Lines 273–274 set has_next = page_num < len(pages) or page_exceeds_threshold, which means if the last page (page_num == len(pages)) happens to exceed page_size, has_next becomes True and line 279 generates next_link to a page that was never created.

The comment on lines 271–272 about "anticipating future pages" doesn't apply to static HTML—you can't link to pages that don't exist.

This is the same issue flagged in a previous review and remains unfixed.

Fix by removing the threshold check from has_next:

-        # Pre-enable next link if this page exceeds threshold (anticipating future pages)
-        # or if there are more pages
-        page_exceeds_threshold = page_message_count > page_size
-        has_next = page_num < len(pages) or page_exceeds_threshold
+        # Only show next link if there's actually a next page
+        has_next = page_num < len(pages)
claude_code_log/cache.py (1)

1093-1144: Page staleness check still doesn't compare session message counts.

Lines 1131–1139 query message_count from the sessions table but never compare it against page_data.message_count or per-session counts. The TODO comment on lines 1141–1142 confirms this is incomplete. This allows paginated pages to remain marked as "up_to_date" even when sessions have gained or lost messages, resulting in stale HTML.

This is the same issue flagged in a previous review and remains unfixed.

Implement the missing comparison:

         for session_id in page_data.session_ids:
             row = conn.execute(
                 """SELECT message_count FROM sessions
                    WHERE project_id = ? AND session_id = ?""",
                 (self._project_id, session_id),
             ).fetchone()

             if not row:
                 return True, "session_missing"

-            # We need to check if session content changed
-            # For now, just check if session exists
+            # Compare session message count to detect changes
+            # Note: page_data doesn't store per-session counts, so we need a workaround
+            # For now, return stale if any session count changed from when page was created
+            # (This requires storing per-session counts in page cache or re-aggregating here)

Alternatively, store per-session message counts in the page_sessions table and compare them here, or invalidate all pages whenever total_message_count changes at the project level.

🧹 Nitpick comments (6)
claude_code_log/cli.py (1)

436-441: Consider adding validation for --page-size.

The page_size option accepts any integer, including zero or negative values, which could cause issues downstream (e.g., infinite loops or division by zero).

Add a callback to validate the minimum page size:

 @click.option(
     "--page-size",
     type=int,
     default=2000,
+    callback=lambda ctx, param, value: value if value >= 100 else 
+        ctx.fail("--page-size must be at least 100"),
     help="Maximum messages per page for combined transcript (default: 2000). Sessions are never split across pages.",
 )

Alternatively, use click.IntRange(min=100) for cleaner validation.

claude_code_log/renderer.py (3)

83-122: Project-display-name logic and tmp-path filtering look good, with minor optional refinements

Filtering working_directories via _is_tmp_path before picking the least‑nested path is a sensible way to avoid pytest/macOS temp paths influencing display names. The fallback that reverses the encoded project dir name is also reasonable.

If you find index pages end up showing full paths in the fallback case, consider additionally taking Path(display_name).name there so you still get a short repo name, but that’s optional and can be adjusted based on how project_dir_name is actually populated.


200-248: Lexer/formatter caching is a solid performance improvement

Introducing _lexer_cache, _formatter_cache, and _get_markdown_formatter to reuse Pygments lexers and HtmlFormatter instances should significantly reduce per‑message overhead, especially for large transcripts with many code blocks. The cache keys (lang:stripall and (linenos, linenostart)) are straightforward and align with how these objects are configured.

One optional refinement: since _get_cached_lexer already handles ClassNotFound, the except ClassNotFound branch in _highlight_code_with_pygments is now effectively dead and could be removed in a follow‑up to simplify the control flow.


699-773: Optimized _highlight_code_with_pygments matches SQLite/perf goals and integrates with new caches

Switching from get_lexer_for_filename to a one‑time get_all_lexers() pattern/extension map avoids filesystem I/O and antivirus hits, and then relying on _get_cached_lexer plus _get_cached_formatter reuses heavy Pygments objects. The extension‑first lookup and fnmatch fallback preserve the previous behavior while being more efficient.

As noted above, the outer try/except ClassNotFound is now redundant because _get_cached_lexer already falls back to TextLexer, but keeping it doesn’t break anything.

claude_code_log/cache.py (2)

119-163: Broad exception handling hides errors in version detection.

All three version-detection strategies use except Exception: pass, which silently swallows import errors, file I/O issues, and parsing failures. This makes debugging difficult if, for example, the toml library is missing or pyproject.toml is malformed.

Consider either catching more specific exceptions or logging suppressed errors to aid troubleshooting.

     except Exception:
-        pass
+        # Package not installed, will try file-based methods
+        pass

396-396: Move dateparser import to module level.

The import dateparser on line 396 is inside the method. Module-level imports improve readability, enable type-checking tools to detect missing dependencies earlier, and align with the coding guideline that recommends using dateparser for date parsing throughout the project.

Move to the top of the file:

 from pathlib import Path
 from typing import Any, Dict, Generator, List, Optional
+import dateparser

Then remove the import from line 396.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eef048d and 094489c.

📒 Files selected for processing (22)
  • claude_code_log/cache.py (4 hunks)
  • claude_code_log/cli.py (7 hunks)
  • claude_code_log/converter.py (17 hunks)
  • claude_code_log/migrations/001_initial_schema.sql (1 hunks)
  • claude_code_log/migrations/002_html_cache.sql (1 hunks)
  • claude_code_log/migrations/003_html_pagination.sql (1 hunks)
  • claude_code_log/migrations/__init__.py (1 hunks)
  • claude_code_log/migrations/runner.py (1 hunks)
  • claude_code_log/renderer.py (10 hunks)
  • claude_code_log/templates/components/page_nav_styles.css (1 hunks)
  • claude_code_log/templates/transcript.html (1 hunks)
  • claude_code_log/tui.py (2 hunks)
  • claude_code_log/utils.py (0 hunks)
  • test/__snapshots__/test_snapshot_html.ambr (5 hunks)
  • test/test_cache.py (7 hunks)
  • test/test_cache_integration.py (8 hunks)
  • test/test_cache_sqlite_integrity.py (1 hunks)
  • test/test_html_regeneration.py (4 hunks)
  • test/test_integration_realistic.py (4 hunks)
  • test/test_pagination.py (1 hunks)
  • test/test_project_display_name.py (1 hunks)
  • test/test_project_matching.py (2 hunks)
💤 Files with no reviewable changes (1)
  • claude_code_log/utils.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • claude_code_log/migrations/init.py
  • claude_code_log/templates/transcript.html
  • test/test_project_matching.py
  • claude_code_log/migrations/001_initial_schema.sql
  • claude_code_log/templates/components/page_nav_styles.css
  • claude_code_log/migrations/002_html_cache.sql
  • claude_code_log/migrations/runner.py
  • test/test_cache_sqlite_integrity.py
🧰 Additional context used
📓 Path-based instructions (6)
test/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Organize tests into categories with pytest markers to avoid async event loop conflicts: unit tests (no mark), TUI tests (@pytest.mark.tui), browser tests (@pytest.mark.browser), and snapshot tests

Files:

  • test/test_cache_integration.py
  • test/test_project_display_name.py
  • test/test_integration_realistic.py
  • test/test_pagination.py
  • test/test_html_regeneration.py
  • test/test_cache.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use ruff for code formatting and linting, with ruff check --fix for automatic fixes
Use pyright and mypy for type checking in Python code
Target Python 3.10+ with support for modern Python features and type hints

Files:

  • test/test_cache_integration.py
  • test/test_project_display_name.py
  • test/test_integration_realistic.py
  • test/test_pagination.py
  • claude_code_log/cli.py
  • test/test_html_regeneration.py
  • claude_code_log/cache.py
  • test/test_cache.py
  • claude_code_log/tui.py
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
claude_code_log/cli.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Click for CLI interface and argument parsing in Python CLI files

Files:

  • claude_code_log/cli.py
claude_code_log/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use dateparser for natural language date parsing to support date range filtering with expressions like 'today', 'yesterday', 'last week', and relative dates

Files:

  • claude_code_log/cli.py
  • claude_code_log/cache.py
  • claude_code_log/tui.py
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
claude_code_log/tui.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Textual for implementing the interactive Terminal User Interface (TUI) in Python

Files:

  • claude_code_log/tui.py
claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Files:

  • claude_code_log/renderer.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Generate cache-aware HTML output with automatic cache validation and leverage existing cache system for fast loading
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Structure the project with separate modules for parser.py (data extraction), renderer.py (HTML generation), converter.py (high-level orchestration), cli.py (CLI interface), models.py (Pydantic data structures), tui.py (Textual TUI), and cache.py (cache management)

Applied to files:

  • test/test_cache_integration.py
  • test/test_integration_realistic.py
  • claude_code_log/cache.py
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Group messages chronologically by timestamp across all sessions and process entire project hierarchies with linked index pages

Applied to files:

  • claude_code_log/migrations/003_html_pagination.sql
  • claude_code_log/converter.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/templates/**/*.html : Use Jinja2 templates for HTML generation, including session navigation with table of contents, message rendering with different content types, and token display for individual messages and session totals

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
  • claude_code_log/converter.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to test/test_snapshot_html.py : Use syrupy for HTML snapshot regression testing to detect unintended changes in HTML output

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
  • test/test_html_regeneration.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/cli.py : Use Click for CLI interface and argument parsing in Python CLI files

Applied to files:

  • claude_code_log/cli.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/models.py : Use Pydantic models for parsing and validating transcript JSON data, including TranscriptEntry (union of UserTranscriptEntry, AssistantTranscriptEntry, SummaryTranscriptEntry), UsageInfo, and ContentItem

Applied to files:

  • claude_code_log/cache.py
  • claude_code_log/converter.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/renderer.py : Use mistune for quick Markdown rendering with syntax highlighting support in server-side template rendering

Applied to files:

  • claude_code_log/renderer.py
📚 Learning: 2025-11-30T17:16:32.495Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-30T17:16:32.495Z
Learning: Applies to claude_code_log/renderer_timings.py : Implement performance timing instrumentation via the CLAUDE_CODE_LOG_DEBUG_TIMING environment variable to identify performance bottlenecks in rendering phases, with detailed timing for initialization, deduplication, session summary processing, main message loop, and template rendering

Applied to files:

  • claude_code_log/renderer.py
🧬 Code graph analysis (8)
test/test_project_display_name.py (1)
claude_code_log/renderer.py (1)
  • get_project_display_name (94-122)
test/test_integration_realistic.py (1)
claude_code_log/cache.py (2)
  • get_library_version (119-162)
  • get_cached_project_data (604-676)
test/test_pagination.py (2)
claude_code_log/cache.py (10)
  • CacheManager (168-1184)
  • SessionCacheData (39-52)
  • get_page_count (1173-1184)
  • get_page_size_config (900-914)
  • update_page_cache (1010-1091)
  • get_page_data (916-958)
  • is_page_stale (1093-1144)
  • invalidate_all_pages (1146-1171)
  • get_all_pages (960-1008)
  • get_library_version (119-162)
claude_code_log/converter.py (3)
  • _get_page_html_path (107-114)
  • _assign_sessions_to_pages (117-151)
  • convert_jsonl_to_html (339-514)
claude_code_log/cli.py (2)
claude_code_log/cache.py (2)
  • get_working_directories (580-594)
  • CacheManager (168-1184)
claude_code_log/converter.py (1)
  • convert_jsonl_to_html (339-514)
test/test_html_regeneration.py (4)
test/conftest.py (1)
  • test_data_dir (11-13)
test/test_cache.py (1)
  • cache_manager (42-45)
test/test_pagination.py (1)
  • cache_manager (156-159)
claude_code_log/cache.py (2)
  • CacheManager (168-1184)
  • get_library_version (119-162)
test/test_cache.py (1)
claude_code_log/cache.py (4)
  • is_file_cached (340-361)
  • save_cached_entries (440-501)
  • get_cached_project_data (604-676)
  • clear_cache (678-686)
claude_code_log/tui.py (1)
claude_code_log/cache.py (1)
  • get_working_directories (580-594)
claude_code_log/renderer.py (1)
claude_code_log/renderer_timings.py (1)
  • timing_stat (88-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.10)
🔇 Additional comments (32)
claude_code_log/tui.py (1)

8-8: LGTM! Clean migration to SQLite-backed cache.

The changes correctly adapt to the centralized CacheManager:

  • Proper type annotation with List[str] instead of None
  • Direct call to cache_manager.get_working_directories() aligns with the SQLite cache schema
  • Exception handling maintains graceful fallback behavior

Based on the relevant code snippet, get_working_directories() queries distinct cwd values from the sessions table, which is the right approach for the cache migration.

Also applies to: 388-393

test/test_project_display_name.py (1)

118-140: LGTM! Excellent test coverage for temporary path filtering.

The two new tests appropriately verify the _is_tmp_path filtering logic:

  • test_tmp_paths_filtered_out confirms that real project paths are preferred over pytest/macOS temp directories
  • test_only_tmp_paths_falls_back ensures graceful fallback when only temporary paths are available

The test data uses realistic temporary path patterns, and the assertions clearly validate the expected behavior.

test/__snapshots__/test_snapshot_html.ambr (2)

4275-4348: LGTM! Snapshot correctly reflects pagination CSS.

The added page navigation styling across all snapshots correctly captures the expected HTML output for the new pagination feature. The CSS includes proper visual hierarchy, hover states, and responsive layout as described in the PR objectives.

Based on learnings, syrupy snapshot tests detect unintended HTML changes, and these updates align with the pagination implementation.

Also applies to: 9081-9154, 13989-14062, 18933-19006


11574-11574: LGTM! Test identifier improvement.

The title update from "tmp" to "test_multi_session_html0" improves test clarity and maintains consistency between the <title> tag and the H1 heading.

Also applies to: 14067-14067

claude_code_log/migrations/003_html_pagination.sql (1)

1-39: Well-structured pagination schema.

The migration correctly establishes pagination support with proper constraints:

  • Unique constraint on (project_id, page_number) prevents duplicate pages
  • Cascade deletes ensure cleanup when projects are removed
  • Indexes on foreign keys and session_id will optimize common queries

Note: page_sessions.session_id is TEXT without a foreign key to the sessions table. This appears intentional for flexibility (sessions can be recreated without breaking page mappings), but it means orphaned session references are possible if sessions are deleted without page regeneration.

test/test_cache_integration.py (3)

98-114: LGTM - Correct validation of --no-cache behavior.

The test properly verifies that:

  1. SQLite database file is created (infrastructure exists)
  2. Project data exists but has zero messages (cache not populated)

This correctly reflects the new centralized cache architecture where the database exists but caching is bypassed.


115-139: LGTM - Cache clearing test updated correctly.

The test now validates cache clearing through CacheManager's SQL-backed data rather than filesystem artifacts.


427-441: LGTM - Version upgrade test uses SQLite correctly.

The test properly validates version tracking through the CacheManager and cached_data.version property.

test/test_html_regeneration.py (4)

51-57: LGTM - Flexible assertion handles output format changes.

The updated assertion correctly accepts either the legacy per-file skip message or the new early-exit message, making the test resilient to output format changes while still validating the skip behavior.


321-361: LGTM - Comprehensive html_cache tracking test.

The test thoroughly validates that:

  • Combined transcript cache entries have source_session_id = None
  • Session HTML cache entries correctly link to their session IDs
  • Message counts are tracked in cache entries

363-415: Excellent incremental regeneration test.

This test is critical for validating the performance optimization. It properly:

  1. Creates isolated sessions in separate files
  2. Modifies only one session
  3. Asserts the unmodified session's HTML remains unchanged (same mtime)

The 1.1-second sleep ensures filesystem timestamp granularity is respected.


417-492: LGTM - Version mismatch and staleness detection tests.

These tests validate important cache invalidation scenarios:

  • Library version changes trigger regeneration
  • Message count changes mark sessions as stale
test/test_integration_realistic.py (2)

242-261: LGTM - SQLite cache deletion test.

The test correctly validates that:

  1. cache.db is created after processing
  2. cache.db is deleted after --clear-cache

415-427: LGTM - Cache creation test updated for SQLite.

The test properly validates:

  1. SQLite cache.db exists after processing
  2. Per-project data is accessible via CacheManager
  3. Version and sessions structure are present
claude_code_log/cli.py (4)

259-302: LGTM - Cache clearing handles both SQLite and legacy JSON caches.

The implementation properly:

  1. Deletes the centralized SQLite cache.db
  2. Cleans up old per-project JSON cache directories for migration compatibility
  3. Handles both all-projects and single-directory modes

The inline import shutil is acceptable for lazy loading since it's only needed in cleanup paths.


190-210: LGTM - Direct use of get_working_directories().

Replacing project_cache.working_directories with cache_manager.get_working_directories() is cleaner and avoids loading the full project cache just to get directories.


586-593: LGTM - page_size correctly threaded to process_projects_hierarchy.

The parameter is properly passed through to the hierarchy processing function.


635-643: LGTM - page_size correctly threaded to convert_jsonl_to_html.

The parameter is properly passed for single directory/file processing.

test/test_pagination.py (3)

21-135: Session-to-page assignment tests look correct and align with _assign_sessions_to_pages semantics

The test matrix around single/multiple sessions, overflow, chronological ordering, and empty input closely matches the implementation in claude_code_log.converter._assign_sessions_to_pages, including the “overflow allowed but no session splitting” behavior. This gives good confidence that later pagination changes won’t silently break session grouping.


142-346: Solid coverage of page-level cache API behavior

The fixture + CacheManager tests exercise the full page-cache surface (get_page_count, get_page_size_config, update_page_cache, get_page_data, get_all_pages, is_page_stale, invalidate_all_pages) with realistic values. Using a temp project dir and verifying both count and per-page metadata (timestamps, token totals, session ordering) is a good fit for the new SQLite schema.


351-544: Pagination + converter integration tests are well-structured

The integration tests construct small synthetic projects to cover no-pagination, multi-page pagination, page-size changes, tiny page sizes (one session per page), and HTML-level checks for navigation and stats. These scenarios mirror the performance and UX concerns described in the PR objectives and should catch regressions in _generate_paginated_html, page_info, and page_stats wiring.

test/test_cache.py (7)

25-32: Fixture change correctly isolates SQLite DB per test

Creating a project subdirectory and relying on project_dir.parent / "cache.db" ensures each test instance gets its own database file under a unique temporary parent, which avoids cross-test coupling given the centralized cache design.


95-104: Initialization test nicely pins down new db_path contract

Asserting both project_path, library_version, and that db_path == temp_project_dir.parent / "cache.db" (and exists) captures the intended public behavior of CacheManager in the SQLite-backed world. This should make future refactors to cache location safer.


112-124: is_file_cached assertion strengthens save/load coverage

Checking cache_manager.is_file_cached(jsonl_path) immediately after save_cached_entries validates the new mtime-based cache-validity logic, not just round‑trip serialization, which is a good addition to this test.


186-205: Version-mismatch test correctly exercises cache-compatibility behavior

Validating that a CacheManager instantiated with a newer, but compatible, library version still sees the original cached project version (1.0.0) via get_cached_project_data is a good guard against accidentally wiping or rewriting caches on harmless upgrades.


259-277: clear_cache behavior is well specified

Asserting that is_file_cached becomes false and that cached_files and sessions are emptied (while get_cached_project_data() still returns a project record) precisely matches the documented intent of clear_cache to reset project data without deleting the DB itself.


580-587: Error-handling test for missing cache entries is appropriate

The combination of assert not cache_manager.is_file_cached(jsonl_path) and expecting load_cached_entries to return None when the file exists but was never cached aligns with the new SQLite-backed semantics and should prevent regressions in “cold cache” behavior.


599-618: Permission-handling test remains useful despite SQLite migration

Even though the cache is now centralized in cache.db, keeping this test around a read‑only cache directory under the project path is a good defensive check that unexpected filesystem layout or permissions don’t cause CacheManager construction to blow up.

claude_code_log/renderer.py (2)

277-306: Cached mistune renderer + timing hook are wired correctly

Using a module‑level _markdown_renderer via _get_markdown_renderer() avoids repeatedly constructing mistune instances and re‑adding the Pygments plugin, while render_markdown still wraps rendering in timing_stat("_markdown_timings") so the existing timing instrumentation continues to work. This matches the guideline to use mistune for fast markdown rendering with syntax highlighting, and should help with the markdown‑heavy bottleneck identified in the PR description.


3128-3303: generate_html pagination context (page_info/page_stats) is threaded through cleanly

Extending generate_html’s signature with optional page_info and page_stats and passing them directly into the transcript.html template fits well with the new pagination design. Existing call sites that only pass messages, title, and combined_transcript_link remain valid, while paginated callers (e.g., _generate_paginated_html via the converter) can now drive per‑page navigation and stats, as exercised by test/test_pagination.py.

claude_code_log/converter.py (2)

39-105: LGTM!

The GenerationStats dataclass provides clear tracking of cache usage, HTML regeneration, timing, and errors/warnings, with a well-formatted summary output. The design is clean and maintainable.


436-466: Pagination decision logic is well-designed.

Lines 444–451 enable pagination when total_message_count > page_size or existing_page_count > 1. This "sticky" behavior prevents toggling between paginated and non-paginated modes as message counts fluctuate around the threshold, which would otherwise cause unnecessary regeneration and confuse users with inconsistent navigation.

@daaain
Copy link
Owner Author

daaain commented Dec 10, 2025

I'm a bit torn on this below. I myself have "cleanupPeriodDays": 99999, but the default is only 30 days log retention, so keeping deleted logs could be really nice.

But on the other hand, it could be confusing and unwanted behaviour and also adds complexity.

I'm leaning towards not doing it, but wanted to at least have it documented here.


Rearchitecture: SQLite Cache as Source of Truth for Archived Sessions

Problem

Claude Code has cleanupPeriodDays (default 30 days) that auto-deletes
JSONL files. Currently:

  • Cached data for deleted files becomes orphaned and inaccessible
  • TUI requires at least one JSONL file to start
  • Rendering is tightly coupled to filesystem discovery

Solution

Treat SQLite cache as canonical data store. Auto-detect "archived"
sessions (JSONL deleted, cache exists) and render them with visual
indicators.

User Requirements

  • Resume: Disable button for archived sessions (greyed out with
    tooltip)
  • UI treatment: Subtle indicator (📦) next to archived sessions
  • Detection: Auto-detect when JSONL missing but cache data exists

Implementation Plan

Phase 1: Schema & Core Infrastructure

1.1 Add Migration
(claude_code_log/migrations/004_archived_sessions.sql)

ALTER TABLE sessions ADD COLUMN is_archived INTEGER DEFAULT 0;
CREATE INDEX IF NOT EXISTS idx_sessions_archived ON
sessions(is_archived);

1.2 Update SessionCacheData Model (claude_code_log/cache.py)

  • Add is_archived: bool = False field to SessionCacheData dataclass

1.3 Add CacheManager Methods (claude_code_log/cache.py)

  • detect_archived_sessions(jsonl_files: List[Path]) -> List[str]: Find
    sessions whose files no longer exist
  • mark_sessions_archived(session_ids: List[str]) -> None: Set
    is_archived = 1
  • load_archived_session_messages(session_id: str) ->
    List[TranscriptEntry]: Load messages from cache for archived session

Phase 2: Data Loading Flow

2.1 Update ensure_fresh_cache() (claude_code_log/converter.py)

  • After discovering JSONL files, call detect_archived_sessions()
  • Mark newly discovered archived sessions

2.2 Update load_directory_transcripts() (claude_code_log/parser.py)

  • Add include_archived: bool = True parameter
  • Load archived session messages from cache
  • Merge with live messages, sorted by timestamp

Phase 3: TUI Changes (claude_code_log/tui.py)

3.1 Remove JSONL File Requirement

 Update run_session_browser():
 # Allow starting with cached sessions even if no JSONL files
 if not jsonl_files and (not cached_sessions or not
 cached_sessions.sessions):
     print(f"Error: No sessions found in {project_path}")
     return None

3.2 Update populate_table()

  • Add 📦 prefix for archived sessions
  • Use CSS class for subtle visual distinction

3.3 Update action_resume_selected()

  • Check is_archived before attempting resume
  • Show notification: "Cannot resume archived session - source file
    deleted"

3.4 Update Expanded Row Content

  • Show "Archived" badge
  • Explain resume unavailability

Phase 4: HTML Template Changes

4.1 Session Navigation (templates/components/session_nav.html)

  • Add archived CSS class for archived sessions
  • Show 📦 indicator before session ID

4.2 Transcript Template (templates/transcript.html)

  • Add archived banner at top of page for archived sessions

4.3 Index Template (templates/index.html)

  • Show archived session count in project cards

4.4 CSS Styles

.session-link.archived { opacity: 0.8; border-left: 3px solid
var(--warning-color); }
.archived-banner { background: var(--warning-bg); padding: 0.5em;
margin-bottom: 1em; }


Phase 5: Testing

  • Unit tests for archive detection logic
  • TUI tests for archived session display and disabled resume
  • Snapshot tests for HTML output with archived indicator
  • Integration test: delete JSONL, verify session still renders

Files to Modify

File Changes
claude_code_log/migrations/004_archived_sessions.sql New file - schema migration
claude_code_log/cache.py Add is_archived field, detection/marking methods
claude_code_log/converter.py Integrate archive detection in ensure_fresh_cache()
claude_code_log/parser.py Update load_directory_transcripts() for archived sessions
claude_code_log/tui.py Remove JSONL barrier, show indicators, disable resume
claude_code_log/templates/components/session_nav.html Archived indicator
claude_code_log/templates/transcript.html Archived banner
claude_code_log/templates/index.html Archived count in project cards

Migration Strategy

  • Automatic: Migration 004 adds column with default 0
  • On first run post-migration, detect_archived_sessions() marks
    existing orphaned sessions
  • No manual user action required

@daaain
Copy link
Owner Author

daaain commented Dec 10, 2025

On the other hand, we could also use the SQLite db as is to restore deleted JSONL files (until manual cache reset):

Yes, we can fully restore JSONL files from the database!

Here's what we store:

  1. content column: Contains json.dumps(entry.model_dump()) - the
    complete Pydantic model serialisation of each entry. This is
    essentially the exact JSON that was in the original JSONL file.
  2. Indexed columns: We also extract and normalise certain fields
    for query performance (timestamp, session_id, uuid, tokens, etc.),
    but these are redundant with content.

To restore a JSONL file, you'd simply:

  # Query messages for a specific file
  rows = conn.execute(
      "SELECT content FROM messages WHERE file_id = ? ORDER BY 
  timestamp",
      (file_id,)
  )
  # Write each row's content as a JSONL line
  with open(output_path, "w") as f:
      for row in rows:
          f.write(row["content"] + "\n")

The only thing we'd lose is the exact order within same-timestamp
messages and exact whitespace formatting, but the semantic content
would be identical.

This is actually quite reassuring - it means the SQLite cache is a
true backup, not just metadata. A --restore-jsonl CLI command would
be straightforward to implement if you ever wanted that safety
net.

@cboos cboos mentioned this pull request Dec 17, 2025
daaain added 2 commits January 3, 2026 22:15
- Migrate from JSON cache to SQLite for better performance and integrity
- Add database migrations system for schema versioning
- Implement incremental HTML regeneration (only rebuild changed sessions)
- Add pagination support for large projects
- Improve working directories handling
- Use explicit UTF-8 encoding for Windows compatibility
@daaain daaain force-pushed the feat/sqlite-cache branch from 936ac19 to a534b79 Compare January 3, 2026 22:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
claude_code_log/converter.py (1)

980-1032: ensure_fresh_cache and has_cache_changes corrupt cache aggregates when called with date filters.

When from_date/to_date are provided:

  • Both functions treat date filters as invalidation reasons, forcing cache recomputation.
  • ensure_fresh_cache calls load_directory_transcripts(project_dir, cache_manager, from_date, to_date, ...), which loads filtered messages only via load_cached_entries_filtered.
  • _update_cache_with_session_data then calculates aggregates (message counts, timestamps, token totals) based on this subset, overwriting the global cache.

Subsequent calls without date filters will:

  • See cached_project_data present and skip cache update.
  • Use stale, subset-based aggregates in sessions and projects metadata (affecting pagination, HTML staleness checks, and index stats).

The fix requires removing date filters from invalidation logic in both ensure_fresh_cache and has_cache_changes, and always passing None, None to load_directory_transcripts to ensure cache aggregates reflect the full dataset. Date filtering should apply only at the presentation layer (via filter_messages_by_date), not during cache population.

♻️ Duplicate comments (2)
claude_code_log/migrations/runner.py (1)

29-56: Migration application and recording aren’t atomic; risk of “applied but not recorded” state.

apply_migration() runs conn.executescript(content) and only then inserts into _schema_version and commits. If the insert or commit fails (disk full, constraint error, concurrent writer), the migration SQL may be applied while _schema_version doesn’t record it, leaving the DB in an inconsistent state and making future non‑idempotent migrations fragile. This matches a previously raised concern on this file.

To make migration + recording atomic, wrap them in a single transaction or savepoint. For example:

Suggested SAVEPOINT-based fix
 def apply_migration(
     conn: sqlite3.Connection, version: int, migration_path: Path
 ) -> None:
     """Apply a single migration and record it in _schema_version."""
     content = migration_path.read_text(encoding="utf-8")
     checksum = _compute_checksum(content)
 
-    # Execute the migration SQL
-    conn.executescript(content)
-
-    # Record the migration
-    conn.execute(
-        """
-        INSERT INTO _schema_version (version, filename, applied_at, checksum)
-        VALUES (?, ?, ?, ?)
-        """,
-        (version, migration_path.name, datetime.now().isoformat(), checksum),
-    )
-    conn.commit()
+    savepoint = f"migration_{version}"
+    try:
+        conn.execute(f"SAVEPOINT {savepoint}")
+        # Execute the migration SQL
+        conn.executescript(content)
+        # Record the migration
+        conn.execute(
+            """
+            INSERT INTO _schema_version (version, filename, applied_at, checksum)
+            VALUES (?, ?, ?, ?)
+            """,
+            (version, migration_path.name, datetime.now().isoformat(), checksum),
+        )
+        conn.execute(f"RELEASE SAVEPOINT {savepoint}")
+    except Exception as e:
+        conn.execute(f"ROLLBACK TO SAVEPOINT {savepoint}")
+        conn.execute(f"RELEASE SAVEPOINT {savepoint}")
+        raise RuntimeError(
+            f"Failed to apply/record migration {version}: {e}. "
+            "Database may be in inconsistent state."
+        ) from e

Please double-check the exact SQLite executescript transaction semantics against your target Python/SQLite versions before adopting this, as behavior can differ slightly across versions.

Also applies to: 92-111

claude_code_log/cache.py (1)

1095-1146: Paginated pages don’t become stale when session message counts change.

is_page_stale() fetches message_count from sessions for each session_id on the page but only checks whether the row exists. If a session gains or loses messages, html_pages.message_count and the HTML file stay unchanged, and the page is treated as "up_to_date". This can leave paginated combined transcripts out of sync with the SQLite cache.

You’re already computing page-level message_count in update_page_cache(), so a simple consistency check here would fix it.

One possible fix is to compare the current summed session counts against the stored page message count:

Suggested consistency check in is_page_stale
     # Check if any session on this page has changed
     with self._get_connection() as conn:
-        for session_id in page_data.session_ids:
-            row = conn.execute(
-                """SELECT message_count FROM sessions
-                       WHERE project_id = ? AND session_id = ?""",
-                (self._project_id, session_id),
-            ).fetchone()
-
-            if not row:
-                return True, "session_missing"
-
-            # We need to check if session content changed
-            # For now, just check if session exists
+        total_message_count = 0
+        for session_id in page_data.session_ids:
+            row = conn.execute(
+                """SELECT message_count FROM sessions
+                       WHERE project_id = ? AND session_id = ?""",
+                (self._project_id, session_id),
+            ).fetchone()
+
+            if not row:
+                return True, "session_missing"
+
+            total_message_count += row["message_count"] or 0
+
+        if total_message_count != page_data.message_count:
+            return True, "session_updated"
 
     return False, "up_to_date"

If you prefer to count messages after deduplication instead, you could also adjust _generate_paginated_html() so that page_message_count is computed from a consistent source and stored into html_pages.message_count, then compare against that here.

🧹 Nitpick comments (20)
claude_code_log/converter.py (2)

415-481: GenerationStats is a good foundation, but currently underused.

The dataclass looks well-structured for per-project reporting, but most fields (e.g., cache_time, render_time, combined_regenerated) aren’t populated yet, and summary() isn’t called. That’s fine for now but consider either wiring these fields into process_projects_hierarchy() or trimming unused attributes to reduce drift.


1294-1432: Return value of _generate_individual_session_files is unused; consider wiring into stats or dropping.

The helper now returns regenerated_count, but callers (e.g., convert_jsonl_to) ignore it and GenerationStats.sessions_regenerated is driven instead from a stale-session pre-scan. Either:

  • Use the returned count to update GenerationStats.sessions_regenerated, or
  • Make _generate_individual_session_files return None again to avoid confusion.
claude_code_log/html/templates/components/page_nav_styles.css (1)

1-74: LGTM - Well-structured pagination styling.

The CSS is clean and maintainable, with:

  • Semantic class naming following consistent patterns
  • Proper use of CSS variables for theming consistency
  • Modern flexbox layouts with responsive wrapping
  • Smooth hover transitions for better UX
  • Appropriate visual hierarchy through sizing and colors

Minor: Consider clarifying or removing empty pseudo-elements.

Lines 68-73 define ::before and ::after pseudo-elements with empty content:

.page-nav-link.prev::before {
    content: '';
}

.page-nav-link.next::after {
    content: '';
}

If these are placeholders for future icons or styling, consider adding a comment explaining the intent. Otherwise, they can be removed to simplify the stylesheet.

🔎 Optional: Add clarifying comment
+/* Placeholder pseudo-elements for potential arrow icons or spacing */
 .page-nav-link.prev::before {
     content: '';
 }

 .page-nav-link.next::after {
     content: '';
 }
test/test_pagination.py (1)

142-346: Page cache API tests are well structured but assume ordering semantics

The TestPageCacheMethods suite exercises read/write, staleness reasons, invalidation, and get_all_pages() in a way that tightly matches the new API. Note that the assertions on all_pages[0].page_number == 1 and [1].page_number == 2 implicitly require get_all_pages() to return rows ordered by page_number; keep that ordering guaranteed in the implementation or document it, since tests and callers now rely on it.

test/test_html_regeneration.py (2)

49-57: Skip-message assertions correctly handle both legacy and new messages

Relaxing the checks to accept either per-file “skipping regeneration” or the new “All HTML files are current” early-exit message makes the tests robust across the refactor while still verifying that regeneration is skipped. The approach of scanning mock_print.call_args_list by substring is fine for this use case.

Also applies to: 106-115


321-492: Incremental HTML cache tests thoroughly validate html_cache behavior

The TestIncrementalHtmlCache tests exercise combined vs per-session cache entries, selective regeneration based on per-session changes, version mismatches, and message-count-based staleness. They accurately mirror the CacheManager.get_html_cache / is_html_stale behavior and look correct. The 1.1s sleeps are appropriate given the 1s mtime tolerance, but will add some runtime; only worth revisiting if the suite becomes slow.

test/test_cache_sqlite_integrity.py (3)

91-177: SQL cascade, token-sum, and FK constraint tests are well targeted

The cascade-delete, token aggregation, and foreign-key tests exercise exactly the right invariants (projects → cached_files/messages/sessions, messages.file_id/project_id validity, and sum(input_tokens/output_tokens) correctness). Using _get_connection and _project_id is acceptable for this “white-box” integrity module, just be aware these tests will need updating if internal table names or relationships change.

Also applies to: 179-244, 246-286


288-402: Serialization round‑trip tests cover complex content types effectively

The round‑trip test for ToolUse, ToolResult, and Thinking content ensures that content JSON and the create_transcript_entry path preserve entry types and content item types. If you want even stronger guarantees later, you could also assert key fields (e.g., tool_use_id, thinking text) match, but the current checks are already valuable.


610-711: WAL, concurrency, and performance tests are reasonable but mildly timing‑sensitive

The WAL mode check and concurrent reader test validate important SQLite configuration choices and multi-threaded read safety. The “large dataset” performance test with 1k rows and a 2s upper bound is conservative enough for most CIs, but if you ever see flakes under load, consider relaxing the threshold or asserting only on successful completion plus a sanity bound on result size.

test/test_integration_realistic.py (1)

845-860: UTF‑8 encoding usage is now consistent for new writes

The updated write_text/open(..., encoding="utf-8") calls for JSONL and HTML writes make the tests more robust across platforms and match the rest of the codebase’s encoding discipline. No behavior changes beyond safer I/O.

Also applies to: 902-903, 959-961, 1507-1508, 1523-1540

claude_code_log/cli.py (3)

186-255: Relative‑match discovery now correctly leverages cached working directories

The changes in _find_relative_matches to use CacheManager.get_working_directories() (and to lazily build the cache via convert_jsonl_to_html(..., silent=True) when empty) are consistent with the new SQLite cache design and should make CWD matching more accurate once sessions are cached. This does mean the first run for uncached projects can be a bit heavier, but that’s expected for priming the cache.


257-315: _clear_caches now handles both SQLite and legacy JSON caches

For --all-projects, deleting the shared cache.db and then cleaning per‑project cache directories is the right migration path from JSON to SQLite. For a single directory, using CacheManager.clear_cache() and then removing any legacy cache directory preserves other projects’ DB data while cleaning the one project. The behavior matches the updated tests.


459-487: The --page-size option is correctly wired, but consider adding input validation

The --page-size parameter is properly threaded through both process_projects_hierarchy() and convert_jsonl_to(), allowing CLI users to tune pagination of combined transcripts. Currently, it accepts any integer without validation. Consider constraining it to positive values with click.IntRange(min=1) to avoid unexpected behavior from zero or negative inputs, which would cause every session to be on its own page or trigger odd edge cases in the pagination logic.

claude_code_log/html/renderer.py (1)

475-535: Pagination metadata plumbing into the renderer is backward‑compatible

Extending HtmlRenderer.generate and generate_html with page_info and page_stats and adding them to the Jinja context cleanly supports paginated transcripts without breaking existing call sites (new params are optional and appended). The docstrings match the new behavior, and generate_session still calls generate via keyword args so it remains unaffected.

Also applies to: 604-628

test/test_cache_integration.py (3)

88-114: --no-cache CLI test captures intended “no new data written” semantics

test_cli_no_cache_flag now asserts that after --clear-cache and a subsequent run with --no-cache, the shared cache.db exists but the project’s total_message_count remains 0. This matches CacheManager.clear_cache() plus a non-caching run and is a reasonable contract for “no-cache” mode.


115-140: Converter/CLI cache-on vs cache-off tests align with SQLite behavior but hard-code version

The tests that assert cache presence (cache.db + cached_files entries) vs emptiness (total_message_count == 0) correctly reflect the difference between use_cache=True and use_cache=False. They currently construct CacheManager(..., "1.0.0"); to reduce coupling to _is_cache_version_compatible rules, you might consider using get_library_version() (or patching it) here as you do elsewhere, but it’s not strictly required given the current compatibility logic.

Also applies to: 196-231


232-266: Hierarchy and version‑upgrade integration tests exercise the right cache touchpoints

test_process_projects_hierarchy_with_cache and test_cache_version_upgrade_scenario validate that project hierarchies populate cache.db and that a pre-existing project row (created under an “old” version) doesn’t prevent later runs under a “new” version from proceeding. If you later extend _is_cache_version_compatible with more breaking ranges, these tests will be valuable to catch regressions.

Also applies to: 431-442

claude_code_log/utils.py (2)

59-66: Consider adding Windows temp path patterns if cross-platform support is needed.

The current patterns cover macOS and Unix temp directories. If the tool runs on Windows, you may want to add patterns like "/AppData/Local/Temp/" or use tempfile.gettempdir() for a more robust check.

🔎 Optional: Platform-agnostic temp path detection
+import tempfile
+
 def _is_temp_path(path_str: str) -> bool:
     """Check if a path is a temporary/test path that should be filtered out."""
     temp_patterns = [
         "/private/var/folders/",  # macOS temp
         "/tmp/",  # Unix temp
         "/var/folders/",  # macOS temp (alternate)
+        "/AppData/Local/Temp/",  # Windows temp
     ]
-    return any(pattern in path_str for pattern in temp_patterns)
+    # Also check against system temp directory
+    system_temp = tempfile.gettempdir()
+    return any(pattern in path_str for pattern in temp_patterns) or path_str.startswith(system_temp)

81-104: Consider extracting duplicate fallback logic.

The fallback conversion logic (removing leading - and replacing - with /) appears twice: lines 87-90 and 100-104. This could be extracted to a small helper for maintainability.

🔎 Optional: Extract fallback logic
+def _convert_project_dir_to_path(project_dir_name: str) -> str:
+    """Convert a project directory name to a display path."""
+    display_name = project_dir_name
+    if display_name.startswith("-"):
+        display_name = display_name[1:].replace("-", "/")
+    return display_name
+
+
 def get_project_display_name(
     project_dir_name: str, working_directories: Optional[list[str]] = None
 ) -> str:
     ...
     if working_directories:
         real_dirs = [wd for wd in working_directories if not _is_temp_path(wd)]

         if not real_dirs:
-            display_name = project_dir_name
-            if display_name.startswith("-"):
-                display_name = display_name[1:].replace("-", "/")
-            return display_name
+            return _convert_project_dir_to_path(project_dir_name)
         ...
     else:
-        display_name = project_dir_name
-        if display_name.startswith("-"):
-            display_name = display_name[1:].replace("-", "/")
-        return display_name
+        return _convert_project_dir_to_path(project_dir_name)
test/test_cache.py (1)

434-516: Consider extracting the repeated patched_method logic to a helper.

The same patched_method implementation is duplicated across three tests (test_breaking_change_exact_version_incompatible, test_breaking_change_pattern_matching, test_multiple_breaking_changes) with only the breaking_changes dict varying. Extracting to a parameterized helper would reduce duplication.

🔎 Optional: Extract version check helper
def _create_version_checker(cache_manager, breaking_changes):
    """Create a version compatibility checker with custom breaking changes."""
    def patched_method(cache_version):
        if cache_version == cache_manager.library_version:
            return True
        from packaging import version
        cache_ver = version.parse(cache_version)
        current_ver = version.parse(cache_manager.library_version)
        for breaking_version_pattern, min_required in breaking_changes.items():
            min_required_ver = version.parse(min_required)
            if current_ver >= min_required_ver:
                if breaking_version_pattern.endswith(".x"):
                    major_minor = breaking_version_pattern[:-2]
                    if str(cache_ver).startswith(major_minor):
                        return False
                else:
                    breaking_ver = version.parse(breaking_version_pattern)
                    if cache_ver <= breaking_ver:
                        return False
        return True
    return patched_method

Then in tests:

cache_manager._is_cache_version_compatible = _create_version_checker(
    cache_manager, {"0.3.3": "0.3.4"}
)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 936ac19 and a534b79.

📒 Files selected for processing (28)
  • claude_code_log/cache.py
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/html/renderer.py
  • claude_code_log/html/templates/components/page_nav_styles.css
  • claude_code_log/html/templates/transcript.html
  • claude_code_log/migrations/001_initial_schema.sql
  • claude_code_log/migrations/002_html_cache.sql
  • claude_code_log/migrations/003_html_pagination.sql
  • claude_code_log/migrations/__init__.py
  • claude_code_log/migrations/runner.py
  • claude_code_log/renderer.py
  • claude_code_log/tui.py
  • claude_code_log/utils.py
  • test/__snapshots__/test_snapshot_html.ambr
  • test/__snapshots__/test_snapshot_markdown.ambr
  • test/test_cache.py
  • test/test_cache_integration.py
  • test/test_cache_sqlite_integrity.py
  • test/test_html_regeneration.py
  • test/test_integration_realistic.py
  • test/test_message_types.py
  • test/test_pagination.py
  • test/test_performance.py
  • test/test_project_display_name.py
  • test/test_project_matching.py
  • test/test_sidechain_agents.py
  • test/test_tui.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • claude_code_log/migrations/003_html_pagination.sql
  • test/test_performance.py
  • claude_code_log/migrations/002_html_cache.sql
  • test/snapshots/test_snapshot_html.ambr
  • claude_code_log/migrations/001_initial_schema.sql
  • claude_code_log/migrations/init.py
  • test/test_project_matching.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • test/test_message_types.py
  • test/test_pagination.py
  • test/test_cache_integration.py
  • claude_code_log/tui.py
  • claude_code_log/cli.py
  • test/test_integration_realistic.py
  • test/test_project_display_name.py
  • test/test_cache_sqlite_integrity.py
  • test/test_tui.py
  • test/test_sidechain_agents.py
  • test/test_html_regeneration.py
  • claude_code_log/html/renderer.py
  • claude_code_log/renderer.py
  • claude_code_log/migrations/runner.py
  • claude_code_log/cache.py
  • test/test_cache.py
  • claude_code_log/utils.py
  • claude_code_log/converter.py
claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Files:

  • claude_code_log/renderer.py
🧠 Learnings (6)
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • test/__snapshots__/test_snapshot_markdown.ambr
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : When adding new message types or modifying CSS class generation in renderer.py, ensure the timeline's message type detection logic in timeline.html is updated accordingly to maintain feature parity

Applied to files:

  • claude_code_log/html/templates/transcript.html
  • claude_code_log/renderer.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/renderer.py : When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Applied to files:

  • claude_code_log/html/templates/transcript.html
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : Use Playwright to test browser runtime features and message filtering consistency in the timeline component

Applied to files:

  • claude_code_log/html/templates/transcript.html
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_message_types.py
  • test/test_pagination.py
  • test/test_cache_integration.py
  • test/test_integration_realistic.py
  • test/test_project_display_name.py
  • test/test_cache_sqlite_integrity.py
  • test/test_tui.py
  • test/test_sidechain_agents.py
  • test/test_html_regeneration.py
  • test/test_cache.py
📚 Learning: 2025-11-30T23:24:07.840Z
Learnt from: cboos
Repo: daaain/claude-code-log PR: 54
File: claude_code_log/renderer.py:2912-2945
Timestamp: 2025-11-30T23:24:07.840Z
Learning: In claude_code_log/renderer.py, the agentId field is currently only set on Task tool_result messages, not on tool_use messages, because the agentId is generated after the tool_use is logged. The _reorder_sidechain_template_messages function relies on this to avoid duplicate sidechain insertion.

Applied to files:

  • test/test_sidechain_agents.py
🧬 Code graph analysis (13)
test/test_message_types.py (1)
claude_code_log/converter.py (1)
  • load_transcript (111-288)
test/test_pagination.py (2)
claude_code_log/cache.py (2)
  • SessionCacheData (39-52)
  • get_library_version (119-162)
claude_code_log/converter.py (2)
  • _get_page_html_path (483-490)
  • _assign_sessions_to_pages (493-527)
test/test_cache_integration.py (1)
claude_code_log/cache.py (2)
  • CacheManager (168-1186)
  • get_cached_project_data (604-676)
claude_code_log/tui.py (1)
claude_code_log/cache.py (1)
  • get_working_directories (580-594)
claude_code_log/cli.py (2)
claude_code_log/cache.py (3)
  • get_working_directories (580-594)
  • CacheManager (168-1186)
  • clear_cache (678-686)
claude_code_log/converter.py (1)
  • convert_jsonl_to_html (716-740)
test/test_integration_realistic.py (1)
claude_code_log/cache.py (3)
  • CacheManager (168-1186)
  • get_library_version (119-162)
  • get_cached_project_data (604-676)
test/test_project_display_name.py (1)
claude_code_log/utils.py (1)
  • get_project_display_name (69-104)
test/test_cache_sqlite_integrity.py (1)
claude_code_log/cache.py (9)
  • SessionCacheData (39-52)
  • save_cached_entries (440-501)
  • update_session_cache (503-536)
  • _get_connection (190-199)
  • load_cached_entries (363-378)
  • get_cached_project_data (604-676)
  • get_cache_stats (718-748)
  • get_working_directories (580-594)
  • is_file_cached (340-361)
test/test_html_regeneration.py (1)
claude_code_log/cache.py (3)
  • get_library_version (119-162)
  • get_html_cache (752-774)
  • is_html_stale (808-869)
claude_code_log/renderer.py (2)
claude_code_log/html/renderer.py (1)
  • is_outdated (587-598)
claude_code_log/markdown/renderer.py (1)
  • is_outdated (884-901)
claude_code_log/cache.py (4)
claude_code_log/factories/transcript_factory.py (1)
  • create_transcript_entry (219-238)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (5)
  • AssistantTranscriptEntry (175-178)
  • QueueOperationTranscriptEntry (203-220)
  • SummaryTranscriptEntry (181-186)
  • SystemTranscriptEntry (189-200)
  • UserTranscriptEntry (168-172)
claude_code_log/renderer.py (2)
  • get (121-132)
  • is_html_outdated (2308-2320)
test/test_cache.py (3)
test/test_cache_sqlite_integrity.py (2)
  • cache_manager (37-39)
  • temp_project_dir (28-33)
test/test_pagination.py (2)
  • cache_manager (156-159)
  • temp_project_dir (143-146)
claude_code_log/cache.py (4)
  • is_file_cached (340-361)
  • save_cached_entries (440-501)
  • get_cached_project_data (604-676)
  • clear_cache (678-686)
claude_code_log/utils.py (1)
claude_code_log/cache.py (1)
  • SessionCacheData (39-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.11)
🔇 Additional comments (31)
test/__snapshots__/test_snapshot_markdown.ambr (1)

321-327: Snapshot header rename is consistent with test name.

The updated header better reflects the multi-session markdown test identifier; no further changes needed.

test/test_tui.py (1)

90-96: Explicit UTF‑8 encoding on test files is a good hardening.

Using encoding="utf-8" for both the JSONL fixture and the non-directory test file avoids platform-default surprises when data contains non-ASCII.

Also applies to: 796-801

test/test_message_types.py (1)

151-186: Missing-file tests correctly pin load_transcript behavior.

These tests exercise both noisy and silent modes, asserting an empty list and appropriate warning behavior; they match the new FileNotFoundError handling in load_transcript.

test/test_sidechain_agents.py (1)

19-35: UTF‑8 encoding on sidechain fixtures is appropriate.

Specifying encoding="utf-8" on all test data reads/writes makes these sidechain/agent tests robust to locale differences and non-ASCII content.

Also applies to: 66-80, 103-115, 131-145, 176-178, 285-299

claude_code_log/converter.py (3)

152-160: Graceful handling of deleted JSONL files is correct and matches tests.

Catching FileNotFoundError in load_transcript and returning an empty list (with an optional warning) handles racey session cleanup without failing the whole conversion, and aligns with the new tests.


530-714: Pagination pipeline is consistent; only staleness logic (in CacheManager) needs tightening.

The _get_page_html_path, _assign_sessions_to_pages, and _generate_paginated_html functions cleanly group sessions into pages without splitting sessions, reuse generate_html with page_info/page_stats, and update html_pages/page_sessions via update_page_cache. With the is_page_stale fix in the cache layer, this should give you efficient, per-page incremental regeneration.


1434-1774: Project‑hierarchy summary structure looks good; printing respects “summary only” intent but not full silence.

process_projects_hierarchy() now tracks per-project GenerationStats, surfaces warnings/errors, and prints a concise global summary, which is useful. Note that silent currently suppresses per‑file logging (via convert_jsonl_to) but not per‑project status lines or the final summary, which matches the docstring (“show summary only”) but not a “no output at all” interpretation. If you ever need a fully quiet mode, a second flag (e.g., quiet) may be clearer.

claude_code_log/tui.py (2)

8-8: LGTM - Import addition aligns with type annotation usage.

The List import is used for the type hint at line 524, making the code more explicit about the expected type.


524-526: LGTM - Clean migration to CacheManager API.

The refactor improves code clarity by:

  • Using a more precise type annotation (List[str] instead of implicit None)
  • Calling get_working_directories() directly instead of accessing via cached project data
  • Maintaining exception handling to preserve fallback behavior

The implementation correctly aligns with the new CacheManager-based architecture.

claude_code_log/html/templates/transcript.html (2)

20-20: LGTM - Standard CSS inclusion for pagination styling.

The include follows the established pattern for component-based CSS organization in this template.


27-51: LGTM - Well-structured pagination UI.

The page navigation block is properly implemented with:

  • Conditional rendering via {% if page_info %} for backward compatibility
  • Clean semantic HTML structure with appropriate CSS classes
  • Nested conditionals for optional elements (stats, prev/next links)
  • Consistent with the template's existing patterns
claude_code_log/renderer.py (1)

2308-2320: LGTM - Clean convenience wrapper for HTML staleness checks.

The function provides a clean public API for checking HTML file versions:

  • Local import pattern correctly avoids circular dependencies
  • Simple delegation to HtmlRenderer.is_outdated()
  • Clear docstring explaining return values

The implementation is straightforward and correct. This likely supports the HTML caching infrastructure introduced in the PR.

test/test_project_display_name.py (2)

118-128: LGTM - Test correctly validates temporary path filtering.

The test ensures that common temporary paths (pytest, macOS temp directories) are properly filtered out, with the real project directory being selected for the display name. This aligns with the new filtering logic in utils.py.


130-140: LGTM - Test correctly validates fallback behavior.

The test ensures proper fallback to project directory name conversion when all working directories are temporary paths. This validates the edge case handling in the filtering logic.

test/test_pagination.py (3)

24-35: _get_page_html_path tests accurately codify filename contract

The three tests cover page 1 and higher-page naming and directly mirror the converter implementation; no issues here.


40-140: Pagination assignment tests match the new overflow semantics

The _assign_sessions_to_pages scenarios (single/multiple sessions, large sessions, overflow, chronological ordering, empty input) line up with the implementation (add-then-check, allow overflow to keep sessions intact). This is a solid, readable test matrix.


351-549: Integration pagination tests give good end‑to‑end coverage

The converter integration tests cover no‑pagination, multi‑page generation, page_size reconfiguration, tiny page sizes, navigation links, and stats rendering. They closely reflect the intended semantics of _assign_sessions_to_pages and the new HTML navigation, and look correct.

test/test_html_regeneration.py (1)

448-492: Message-count based staleness detection test matches cache semantics

test_html_cache_message_count_validation correctly drives ensure_fresh_cache to update session metadata and then asserts that is_html_stale flags the session as session_updated once the DB message_count diverges from the cached HTML count. This is a precise check of the intended invariant.

test/test_cache_sqlite_integrity.py (2)

453-574: Ordering and NULL‑token aggregate tests match cache design

The timestamp-ordering and NULL-token handling tests correctly validate that load_cached_entries uses ORDER BY timestamp NULLS LAST and that aggregates use COALESCE(SUM(...), 0) semantics. These directly align with the SQL in CacheManager and look sound.


887-923: Migration integrity tests nicely pin down schema-version behavior

The _schema_version tests ensure migrations record checksums and are applied exactly once, which is critical for long-term cache stability. The assertions on version>0, .sql filenames, and 64-char checksums are appropriate.

test/test_integration_realistic.py (2)

238-251: CLI clear‑cache integration tests correctly target the new SQLite cache

TestCLIWithProjectsDir.test_clear_cache_with_projects_dir now asserts that cache.db is created under the projects dir after processing and removed after --clear-cache. This matches _clear_caches behavior (delete shared DB for --all-projects) and is a good high‑level safeguard.

Also applies to: 266-268


421-434: Real‑data cache tests validate both DB presence and version recording

TestCacheWithRealData.test_cache_creation_all_projects and test_cache_version_stored now assert that a shared cache.db exists and that CacheManager.get_cached_project_data() returns a non‑None structure with a non‑empty version equal to get_library_version(). This ties the realistic-data path neatly to the new SQLite cache semantics.

Also applies to: 473-478

claude_code_log/utils.py (2)

7-10: LGTM! Correct use of TYPE_CHECKING for circular import avoidance.

The TYPE_CHECKING guard correctly allows type hints without runtime circular dependency issues.


191-226: LGTM! Solid implementation for multi-type entry handling.

The late import pattern correctly avoids circular dependencies at runtime while maintaining type safety via TYPE_CHECKING. The duck-typing approach with getattr and hasattr handles the union type gracefully.

Minor observation: Line 218's getattr(entry, "timestamp", "") is slightly redundant since hasattr already confirmed the attribute exists, but it's a safe defensive pattern.

test/test_cache.py (7)

26-33: LGTM! Good test isolation pattern.

Creating a project subdirectory ensures each test gets a unique db_path (at parent/cache.db), preventing test interference when using the shared SQLite database.


108-113: LGTM! Clear validation of SQLite database location.

The test properly verifies the database is created at the expected parent-level path.


138-167: LGTM! Direct SQLite verification is appropriate for testing internal storage.

The test correctly queries the database to verify message storage structure. The NULLS LAST syntax is valid in SQLite 3.30+ (per retrieved learnings). Good isolation via project_id filtering.


169-187: Mtime-based test may be flaky on fast filesystems.

The time.sleep(1.1) works in most cases, but some filesystems (especially in CI with containers) may have lower mtime resolution or caching that could cause intermittent failures.

Consider whether you've observed any flakiness in CI. If so, a more robust approach would be to directly manipulate the cached mtime in the database or use os.utime() to explicitly set a different modification time.


189-207: LGTM! Version compatibility test is well-structured.

The test correctly validates that compatible version upgrades preserve the cache data without requiring re-creation.


263-285: LGTM! Comprehensive cache clearing test.

Good test structure with explicit precondition verification before the clear operation, and thorough postcondition checks on both file cache status and project data.


593-603: LGTM! Clear test for uncached file handling.

The test properly distinguishes between a file existing on disk and having a cache entry.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
test/__snapshots__/test_snapshot_html.ambr (1)

14150-14236: LGTM: Consistent snapshot updates with minor suggestion.

The CSS and title updates maintain consistency across the snapshot. The empty page navigation block correctly reflects a test case without pagination controls.

Optional consideration: The comment at line 14225 mentions "revealed via in-place editing when new page is created." This approach requires that the HTML structure and class names remain stable. Consider ensuring the in-place editing logic is resilient to template changes, or document the contract between the template and the editing logic.

claude_code_log/converter.py (2)

483-491: Add input validation for page_number.

_get_page_html_path doesn't validate that page_number is positive. Passing 0 or negative values would produce incorrect filenames like "combined_transcripts_0.html" or "combined_transcripts_-1.html".

🔎 Suggested validation
 def _get_page_html_path(page_number: int) -> str:
     """Get the HTML filename for a given page number.
 
     Page 1 is combined_transcripts.html, page 2+ are combined_transcripts_N.html
     """
+    if page_number < 1:
+        raise ValueError(f"Page number must be >= 1, got {page_number}")
     if page_number == 1:
         return "combined_transcripts.html"
     return f"combined_transcripts_{page_number}.html"

537-572: Consider validating page_size parameter.

_assign_sessions_to_pages doesn't validate that page_size > 0. A zero or negative page_size would cause logical issues (sessions would never trigger page closure, or every session would close a page).

🔎 Suggested validation
 def _assign_sessions_to_pages(
     sessions: Dict[str, SessionCacheData], page_size: int
 ) -> List[List[str]]:
     """Assign sessions to pages, never splitting sessions across pages.
 
     Args:
         sessions: Dict mapping session_id to SessionCacheData
         page_size: Maximum messages per page (overflow allowed to keep sessions intact)
 
     Returns:
         List of pages, each containing a list of session_ids
     """
+    if page_size < 1:
+        raise ValueError(f"page_size must be >= 1, got {page_size}")
     pages: List[List[str]] = []
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a534b79 and 46b976c.

📒 Files selected for processing (5)
  • claude_code_log/converter.py
  • claude_code_log/html/templates/components/page_nav_styles.css
  • claude_code_log/html/templates/transcript.html
  • test/__snapshots__/test_snapshot_html.ambr
  • test/test_pagination.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • claude_code_log/html/templates/components/page_nav_styles.css
  • claude_code_log/html/templates/transcript.html
  • test/test_pagination.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • claude_code_log/converter.py
🧠 Learnings (2)
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : Use Playwright to test browser runtime features and message filtering consistency in the timeline component

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.10)
🔇 Additional comments (17)
test/__snapshots__/test_snapshot_html.ambr (4)

4290-4368: LGTM: Well-structured page navigation CSS.

The CSS styling for paginated transcripts is well-organized with clear hierarchy, appropriate visual feedback (hover effects, transitions), and thoughtful details like the glassmorphic background and shadow effects. The approach aligns with the pagination feature described in the PR objectives.


9172-9250: LGTM: Expected snapshot duplication.

The CSS block is correctly duplicated in this test case snapshot, ensuring the "Edge Cases" test captures the complete expected HTML output with page navigation styles.


11721-11721: LGTM: Improved test specificity.

Updating the title to include the specific test project name improves snapshot clarity and makes test failures easier to identify.


19169-19255: LGTM: Complete snapshot coverage.

The final test case snapshot correctly includes the page navigation CSS, ensuring all HTML generation test cases capture the pagination feature additions.

claude_code_log/converter.py (13)

152-160: Good defensive handling of race conditions.

The try/except block correctly handles the case where a file is deleted between directory globbing and opening (e.g., during Claude Code session cleanup). Returning an empty list is appropriate, and the warning message is clear and conditionally suppressed.


415-481: LGTM - Well-structured stats tracking.

The GenerationStats dataclass is well-designed with appropriate defaults and helper methods. The summary() method produces clear, multi-line output for per-project reporting.


690-703: Pagination approach successfully addresses previous concerns.

The implementation uses the is_last_page flag (line 692) with the next_link always provided (line 697). The template can hide the next link with a "last-page" CSS class when is_last_page is True. When a new page is created, _enable_next_link_on_previous_page() removes this class (lines 702-703), revealing the link without full page regeneration.

This elegant solution avoids 404 errors while minimizing regeneration cost, as discussed in the past review comments.


770-786: LGTM - Consistent page_size parameter addition.

The page_size parameter with a default of 2000 is consistently added to both convert_jsonl_to_html and convert_jsonl_to, maintaining backward compatibility.

Also applies to: 799-814


850-870: Effective early-exit optimization for unchanged projects.

The cache-aware short-circuit correctly skips expensive message loading when:

  1. Cache is fresh (cache_was_updated == False)
  2. No date filters applied
  3. Combined HTML is current (not stale and not outdated)
  4. Either individual sessions aren't being generated OR no sessions are stale

This optimization significantly improves performance for large multi-project hierarchies, as noted in PR objectives (e.g., "rescan of 24 projects processed in ~1.8s").


905-975: Well-designed pagination mode selection.

The pagination decision logic (lines 913-921) is sound:

  • Enables pagination for HTML directory mode without date filters
  • Uses pagination when messages exceed page_size OR when existing pages exist
  • The existing_page_count > 1 check prevents mode flip-flopping, maintaining pagination once enabled even if message count drops

The dual code paths (paginated vs. single-file) are appropriately guarded and handle cache updates correctly.


993-1024: LGTM - Fast cache change detection.

has_cache_changes() provides efficient mtime-based invalidation checking without loading messages. The logic correctly handles:

  • Missing cache (return True)
  • Date filters (return True)
  • Modified files (check via cache_manager)
  • Stale cache detection (zero messages but files exist)

The function complements ensure_fresh_cache() by providing a fast pre-check before expensive message loading.


1026-1037: LGTM - Improved documentation clarity.

The updated docstring clearly explains the return value and relationship with has_cache_changes(), helping developers understand the two-phase cache checking pattern (fast mtime check → expensive message loading).


1340-1478: LGTM - Enhanced session file generation with return value.

The function improvements are well-designed:

  1. Returns int (regenerated session count) for statistics tracking
  2. Adds silent parameter for suppressing per-session output
  3. Filters sessions to only tracked ones (line 1380): session_ids = session_ids & set(session_data.keys())
  4. Uses cache-aware staleness checks (is_html_stale) for incremental regeneration
  5. Updates html_cache after regeneration with accurate message counts from cache

The cache integration enables fine-grained HTML regeneration, as intended by the SQLite cache refactor.


1480-1536: LGTM - Enhanced hierarchy processing with comprehensive statistics.

The refactored process_projects_hierarchy adds valuable improvements:

  1. Per-project timing and statistics via GenerationStats
  2. silent flag (default True) for suppressing per-file verbosity while keeping per-project summaries (aligns with past review discussion)
  3. page_size parameter support (default 2000)
  4. Aggregated metrics: total projects, projects with updates, total sessions

The initialization section is well-organized and sets up the necessary tracking structures.


1550-1621: Efficient dual-path processing with correct cache routing.

The fast/slow path logic is well-implemented:

Fast path (lines 1581-1586):

  • Skips regeneration for unchanged projects
  • Reports cached status with timing

Slow path (lines 1587-1621):

  • Updates cache and regenerates output
  • Tracks file updates and session regenerations
  • Reports detailed progress

Cache staleness check (lines 1560-1571):
Correctly distinguishes paginated vs. non-paginated projects:

  • Paginated projects: checks is_page_stale(1, page_size) for page 1
  • Non-paginated projects: checks is_html_stale(output_path.name, None) for single-file cache

This dual-cache-table approach aligns with the PR's pagination architecture.


1623-1782: Robust project data collection with cache-first approach.

The implementation correctly prioritizes cached data while maintaining a fallback:

Primary path (lines 1623-1680):

  • Uses cached_project_data for efficient aggregates
  • Filters sessions to exclude warmup-only and agent-only sessions (line 1673-1674)
  • Builds project summary from cache without re-parsing messages

Fallback path (lines 1682-1771):

  • Prints warning about cache absence (rare case)
  • Loads and processes messages directly
  • Ensures cache is updated with session data (line 1689-1691), including working_directories
  • Manually calculates aggregates

Exception handling (lines 1773-1782):

  • Records error in stats
  • Prints detailed warning with traceback
  • Continues processing remaining projects (fail-safe behavior)

This defensive design ensures the multi-project scan completes even if individual projects fail.


1784-1822: LGTM - Clear summary reporting with appropriate index regeneration.

Index generation (lines 1784-1798):

  • Regenerates when renderer outdated, date filters applied, or any cache updated
  • any_cache_updated flag ensures index reflects new data even when no files changed in some projects

Summary output (lines 1799-1821):

  • Reports errors/warnings per project (lines 1807-1811)
  • Provides global summary with elapsed time, update count, and index status
  • Output is always shown (consistent with silent flag design: suppress per-file logs, show per-project summaries)

The comprehensive reporting helps users understand what work was performed, which is especially valuable for large multi-project scans.

Two fixes for cache invalidation issues:

1. Filter agent files in modification check
   - get_modified_files() checked all .jsonl including agent-*.jsonl
   - But load_directory_transcripts() excludes agent files
   - Result: unreferenced agent files always marked as "modified"

2. Skip archived sessions in staleness check (preserve data)
   - When JSONL files are deleted, cached sessions remain
   - Instead of pruning, skip them in get_stale_sessions()
   - Show "N archived" count in output for visibility
   - Preserves data for potential future archive/restore features

Output now shows:
  project-name: cached, 3 archived (0.0s)
@cboos
Copy link
Collaborator

cboos commented Jan 5, 2026

So, as I was testing #76, I also could observe the behavior of the new cache and paginated combined transcripts, and all seems to work great! The cache.db takes up less space than the combined cache/*.json, but it's still quite large. I wonder, have you experimented with some form of compression of the raw JSON content? I see you have content JSON NOT NULL, but I don't see any use of json_* SQL functions, so it could as well have been a content BLOB NOT NULL with the content compressed with zlib or similar... Maybe the savings could be significant.

The only concern I have is that we're using the ~/.claude/projects as our playground... At the very least, we should name the db file claude-code-log-cache.db, I think.

@cboos cboos mentioned this pull request Jan 5, 2026
* Add archived project support and session deletion

Features:
- Show fully archived projects (cached but no JSONL files) in TUI and HTML index
- Projects with [ARCHIVED] badge in project selector and greyed-out styling in index
- Archived projects automatically open in archived view mode
- Delete sessions from cache via 'd' key with confirmation dialog
- Restore JSONL now produces compact JSON format matching original files
- Project list refreshes after restore to update archived status

Implementation:
- Add get_all_cached_projects() to discover archived projects from cache.db
- Add delete_session() and delete_project() methods to CacheManager
- Update CLI to discover and track archived projects, refresh on return
- Update SessionBrowser to handle is_archived_project flag
- Add DeleteConfirmScreen modal for delete confirmation

Tests:
- Add tests for compact JSON export, session/project deletion
- Add tests for get_all_cached_projects with active/archived detection
- Add TUI test for archived project loading in archived mode

* Simplify views to show both archived and current + implement delete as well as archive and restore
@daaain
Copy link
Owner Author

daaain commented Jan 10, 2026

That is a very good point about naming the cache a more unique claude-code-log-cache.db!

I'll also switch the message.content column to JSONB to see how much smaller it is, but zlib just on that column isn't a bad idea either as it shouldn't have much performance impact on day-to-day usage as it will only be used to (re)generate the HTML / Markdown files.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
test/test_cache_integration.py (1)

98-114: --no-cache test is self-fulfilling due to CacheManager initialization side effects; cannot verify the flag actually prevents caching.

CacheManager.__init__ unconditionally calls _init_database() (which runs migrations and creates cache.db) and _ensure_project_exists() (which inserts/updates the project record). When the test instantiates CacheManager after running --no-cache, it creates the cache database regardless of what the --no-cache flag did. The test cannot distinguish between cache.db created by the flag (indicating a bug) and cache.db created by the subsequent CacheManager instantiation (expected).

Check that cache.db does not exist or has not been modified before instantiating CacheManager to verify that --no-cache actually prevented cache creation. Alternatively, mock or isolate the cache path to avoid side effects from initialization.

claude_code_log/cli.py (3)

225-294: Relative matching shouldn’t generate HTML as a side effect.

In _find_relative_matches, calling convert_jsonl_to_html(project_dir, silent=True) (Line 242) can generate/overwrite outputs just to populate cache metadata. Prefer ensure_fresh_cache(project_dir, cache_manager, silent=True) (cache-only) to populate working_directories.


498-526: --page-size should be validated (> 0) before use.

A value of 0 or negative will produce odd pagination behavior. Add a click constraint or an early check (and fail fast with a clear error).

Proposed patch
@@
 @click.option(
     "--page-size",
     type=int,
     default=2000,
     help="Maximum messages per page for combined transcript (default: 2000). Sessions are never split across pages.",
 )
@@
 def main(
@@
     page_size: int,
@@
 ) -> None:
@@
     try:
+        if page_size <= 0:
+            raise click.BadParameter("--page-size must be a positive integer", param_hint="--page-size")

Also applies to: 668-735


54-106: Pass projects_dir to _launch_tui_with_cache_check or provide it to CacheManager explicitly.

CacheManager(project_path, ...) incorrectly assumes the cache database is at project_path.parent / "cache.db". For archived projects, project_path may be the original stored path (e.g., /Users/dain/workspace/project1), making project_path.parent point to the wrong location. The actual shared database is always at projects_dir / "cache.db". Either pass projects_dir as a parameter to _launch_tui_with_cache_check and forward it to CacheManager, or modify CacheManager to accept an explicit db_path parameter.

claude_code_log/converter.py (1)

1528-1973: Fast-path project processing is promising, but it depends on page/session staleness being correct.

Given is_page_stale() currently doesn’t detect session changes, the “nothing to do” path (Lines 1663-1671) risks skipping required page regeneration. Once page staleness compares real content (message counts / timestamps), this path should become reliable.

🤖 Fix all issues with AI agents
In @claude_code_log/cache.py:
- Around line 380-439: In load_cached_entries_filtered, the SQL date bounds are
built with from_dt.isoformat()/to_dt.isoformat() which can mismatch stored
timestamps that end with 'Z'; normalize the bounds to UTC 'Z' form before
appending to params (e.g. ensure from_dt/to_dt are timezone-aware UTC via
from_dt = from_dt.astimezone(timezone.utc) or set tzinfo=timezone.utc, then
produce a bound like from_dt.isoformat().replace('+00:00','Z') or use strftime
to output YYYY-MM-DDTHH:MM:SS[.fff]Z) so lexicographic SQL comparisons match
stored JSON values; update imports to include datetime.timezone and adjust where
from_dt/to_dt are parsed and appended to params, leaving the rest of
load_cached_entries_filtered and the use of _get_file_id/_deserialize_entry
unchanged.
- Around line 1326-1370: delete_session currently removes messages, html_cache
and sessions but leaves cached_files entries intact so is_file_cached() can
still return True and load_cached_entries() yields empty results; update
delete_session to also invalidate the file cache by deleting rows from
cached_files for the same project_id and source_session_id (i.e., execute
"DELETE FROM cached_files WHERE project_id = ? AND source_session_id = ?" using
self._project_id and session_id) so future runs will reparse the JSONL and
rebuild the session; ensure this deletion happens before committing and
alongside the existing deletions.

In @claude_code_log/converter.py:
- Around line 910-942: When use_pagination is true but cached_data is None,
session_data is set to {} which leads _generate_paginated_html to produce empty
pages; instead, detect missing cached_data and derive session_data from messages
(e.g., group messages by session id or date and build the same session
ordering/metadata shape expected by _generate_paginated_html) before calling
_generate_paginated_html so pagination is based on messages rather than an empty
cache; keep the cache_manager.get_cached_project_data() call but fallback to
building session metadata from messages and pass that session_data into
_generate_paginated_html.

In @claude_code_log/tui.py:
- Around line 1036-1041: Replace bare except: pass blocks in the UI/cache update
sections with explicit exception handling: catch Exception as e (not bare
except), log the exception via the module logger or self.logger with context
(e.g., when calling self.populate_table() / self.update_stats()), and add a
short comment explaining why suppression is acceptable; where possible narrow
the except to specific errors (e.g., AttributeError, OSError) for the
cache-building and UI update blocks and re-raise unexpected exceptions. Ensure
each occurrence (the archived-project UI update around
self.populate_table()/self.update_stats(), the cache building block, and the
other UI update block) is updated to log e and include a brief comment stating
why errors can be ignored if widgets are unmounted.
🟡 Minor comments (12)
test/test_cache_integration.py-835-848 (1)

835-848: get_all_cached_projects test may fail if returned paths are normalized/realpathed.

Comparing str(active_dir) to returned strings is brittle if production code uses Path.resolve(), os.path.realpath, or different path separators. Prefer normalizing both sides (e.g., Path(p).resolve()).

Proposed fix
- project_paths = {p[0] for p in projects}
- assert str(active_dir) in project_paths
- assert str(archived_dir) in project_paths
+ project_paths = {Path(p[0]).resolve() for p in projects}
+ assert active_dir.resolve() in project_paths
+ assert archived_dir.resolve() in project_paths

Committable suggestion skipped: line range outside the PR's diff.

test/__snapshots__/test_snapshot_html.ambr-464-487 (1)

464-487: Archived card dimming via opacity: 0.6 affects all child text visibility; consider a more targeted visual treatment.

When opacity: 0.6 is applied to .project-card.archived, it creates a CSS stacking context that dims all child elements—not just the background. This includes the project name (blue link), stats text, and the badge, making them harder to read. Use an alternative approach such as adjusting only background-color and keeping text at full opacity, or apply opacity only to a pseudo-element to preserve text contrast.

test/test_tui.py-1621-1682 (1)

1621-1682: Restore test is good; ensure JSONL content is parseable if future code reads it.

export_session_to_jsonl is mocked to return ['{"type":"user"}'] (Line 1649), which is likely not parseable by current Pydantic models if any code path later loads it. Consider returning a minimal valid entry (sessionId/timestamp/message/etc.) to keep the test robust against refactors.

claude_code_log/cli.py-28-52 (1)

28-52: _discover_projects(): good direction; ensure stable ordering + no duplicates.

Consider returning project_dirs in a deterministic order (e.g., sorted) after appending archived paths, and guarding against duplicate paths differing only by string form (symlink/resolve differences).

claude_code_log/converter.py-998-1029 (1)

998-1029: has_cache_changes() includes agent files; ensure this matches intended invalidation policy.

Elsewhere you explicitly exclude agent-*.jsonl from modified-file checks. Here it uses glob("*.jsonl") (Line 1012). If this function is used for gating work, it can reintroduce “agent-only change triggers work” behavior.

test/test_tui.py-923-974 (1)

923-974: Archived-project test setup contradicts its own premise (creates a JSONL file).

test_archived_project_loads_archived_sessions says “no JSONL files” but creates session-123.jsonl (Lines 929-932). This can mask behavior differences in code paths that key off glob("*.jsonl"). Prefer creating only the directory and relying on cache mocks.

test/test_tui.py-90-106 (1)

90-106: Fixture split by session is good; ensure “summary” placement matches real data.

Writing one JSONL per session is a solid realism improvement. One concern: the fixture writes "type": "summary" into session-456.jsonl regardless of sessionId (Line 102-105). If summaries can live outside the session file in real datasets, this coupling may hide bugs in session discovery/summary association.

claude_code_log/cache.py-1389-1434 (1)

1389-1434: Archived project detection is OK, but can get expensive.

get_all_cached_projects() calls glob("*.jsonl") per project path (Line 1421). On large caches this can become a bottleneck. Consider storing has_jsonl/is_archived as a derived DB field updated during scans, or batching filesystem checks.

test/test_tui.py-16-16 (1)

16-16: Good: ProjectSelector is now exercised, but this is a public-surface signal.

Since tests now import ProjectSelector directly, consider whether claude_code_log.tui intends to expose it as part of its supported API surface (or if tests should import from an internal module path instead).

test/test_tui.py-1684-1935 (1)

1684-1935: Several ProjectSelector tests write JSON without explicit encoding.

A few Path.write_text(...) calls omit encoding="utf-8" (e.g., Lines 1694, 1698, 1741-1742, 1803, 1834, 1919). For consistency with the rest of the suite (and Windows stability), set encoding explicitly.

Proposed patch
-            (project1 / "session-1.jsonl").write_text('{"type":"user"}\n')
+            (project1 / "session-1.jsonl").write_text('{"type":"user"}\n', encoding="utf-8")
@@
-            (project2 / "session-2.jsonl").write_text('{"type":"user"}\n')
+            (project2 / "session-2.jsonl").write_text('{"type":"user"}\n', encoding="utf-8")
@@
-            jsonl1.write_text('{"type":"user"}\n')
-            jsonl2.write_text('{"type":"user"}\n')
+            jsonl1.write_text('{"type":"user"}\n', encoding="utf-8")
+            jsonl2.write_text('{"type":"user"}\n', encoding="utf-8")
@@
-            jsonl.write_text('{"type":"user"}\n')
+            jsonl.write_text('{"type":"user"}\n', encoding="utf-8")
@@
-            (project_path / "session-1.jsonl").write_text('{"type":"user"}\n')
+            (project_path / "session-1.jsonl").write_text('{"type":"user"}\n', encoding="utf-8")
claude_code_log/cache.py-119-163 (1)

119-163: Narrow exception handling for better diagnostics; tomllib suggestion not applicable here.

Since toml is already a declared dependency and the project targets Python 3.10+, using tomllib (only available in 3.11+) would add conditional logic without benefit. However, the broad except Exception blocks in this function mask the actual error (e.g., toml.TomlDecodeError, FileNotFoundError, import errors). Catch only expected exceptions so that unexpected failures surface and are easier to debug.

claude_code_log/cache.py-189-205 (1)

189-205: Move PRAGMA journal_mode=WAL to initialization with environment-aware error handling.

Setting PRAGMA journal_mode = WAL on every connection (line 195) is redundant since WAL mode is persistent at the database level. Set it once during _init_database() instead, with graceful error handling for environments that don't support WAL (read-only filesystems, network mounts, etc.).

🧹 Nitpick comments (8)
test/test_cache_integration.py (2)

149-174: Write JSONL with explicit encoding="utf-8" and newline="\n" everywhere (cross-platform stability).

New tests mix explicit UTF-8 in some places and default encoding in many others; JSONL fixtures with non-ASCII content will intermittently fail on Windows/macOS locales.

Proposed fix (apply pattern everywhere you write JSONL)
- with open(jsonl_file, "w") as f:
+ with open(jsonl_file, "w", encoding="utf-8", newline="\n") as f:
     for entry in sample_jsonl_data:
         f.write(json.dumps(entry) + "\n")

Also applies to: 205-213, 226-231, 242-266, 325-328, 457-461, 491-497, 519-522, 548-551, 576-579, 599-602, 627-633, 658-661, 706-709, 736-739, 784-787, 823-826


445-616: Archive/restore workflow: good coverage, but restoration should preserve JSONL file invariants explicitly.

Consider writing restored JSONL with encoding="utf-8" + newline="\n" (and ideally ensuring a trailing newline) to avoid platform diffs; also consider asserting that restore preserves ordering guarantees expected by the loader (timestamp + stable tie-breaker if applicable).

claude_code_log/html/templates/components/project_card_styles.css (1)

116-138: Avoid opacity for archived cards (it reduces text contrast); prefer non-destructive styling.

Consider removing opacity and instead adjust border/background (or add a subtle grayscale) so the text remains fully legible.

Possible tweak
 .project-card.archived {
-    opacity: 0.6;
-    background-color: #f5f5f522;
+    background-color: #f5f5f522;
+    border-left-color: #88888855;
 }
 
 .project-card.archived:hover {
-    opacity: 0.8;
+    /* keep full text contrast */
+    background-color: #f5f5f533;
 }
test/__snapshots__/test_snapshot_html.ambr (1)

4318-4396: Page navigation CSS is duplicated across contexts; consider centralizing and avoid no-op pseudo-elements.

The .page-navigation block matches the paginated combined transcript UI, but it’s repeated in multiple snapshots—if this originates from templates, consider consolidating into a shared stylesheet/component to avoid style drift. Also, .page-nav-link.prev::before and .page-nav-link.next::after set content: '' (Line 4385-4391 and repeats), which is a no-op; either remove those rules or render meaningful arrow content/labels in HTML/CSS.

Also applies to: 9200-9278, 14178-14256, 19197-19275

test/test_tui.py (1)

1192-1531: Archive/Delete modal flow coverage is strong; add a few “state changed” assertions.

The keypress-driven modal tests are valuable. A small improvement: after confirming archive/delete, also assert the in-memory session lists update (e.g., session removed from app.sessions / app.archived_sessions) to ensure UI state stays consistent with filesystem/cache operations.

claude_code_log/cli.py (1)

296-348: Cache cleanup update is good; consider also cleaning derived outputs tied to cache.db deletion.

Deleting the shared cache.db is correct for “clear all”. If HTML/page caches depend on DB state, consider whether you also want to remove combined_transcripts*.html / session-*.html during --clear-cache --all-projects (or at least message that outputs may now be stale).

claude_code_log/converter.py (1)

420-486: GenerationStats looks fine; tiny nit: default_factory can be simplified.

field(default_factory=list) is enough (no need for lambda: []).

claude_code_log/tui.py (1)

8-8: Optional: Modernize type hints.

Line 8 imports List from typing, but line 1192 is the only usage (List[str]). Modern Python supports list[str] directly.

♻️ Proposed modernization
-from typing import Any, ClassVar, List, Optional, cast
+from typing import Any, ClassVar, Optional, cast

And at line 1192:

-        working_directories: List[str] = []
+        working_directories: list[str] = []
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43f07af and 4a7ce7a.

📒 Files selected for processing (10)
  • claude_code_log/cache.py
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/html/templates/components/project_card_styles.css
  • claude_code_log/html/templates/index.html
  • claude_code_log/tui.py
  • dev-docs/restoring-archived-sessions.md
  • test/__snapshots__/test_snapshot_html.ambr
  • test/test_cache_integration.py
  • test/test_tui.py
✅ Files skipped from review due to trivial changes (1)
  • dev-docs/restoring-archived-sessions.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • claude_code_log/cache.py
  • test/test_cache_integration.py
  • test/test_tui.py
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/tui.py
🧠 Learnings (5)
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : When adding new message types or modifying CSS class generation in renderer.py, ensure the timeline's message type detection logic in timeline.html is updated accordingly to maintain feature parity

Applied to files:

  • claude_code_log/html/templates/index.html
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/renderer.py : When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Applied to files:

  • claude_code_log/html/templates/index.html
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : Use Playwright to test browser runtime features and message filtering consistency in the timeline component

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • test/__snapshots__/test_snapshot_html.ambr
  • test/test_tui.py
  • claude_code_log/cli.py
  • claude_code_log/tui.py
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_cache_integration.py
  • test/test_tui.py
🧬 Code graph analysis (3)
test/test_cache_integration.py (1)
claude_code_log/cache.py (8)
  • CacheManager (168-1386)
  • get_cached_project_data (604-676)
  • get_archived_sessions (939-978)
  • export_session_to_jsonl (980-1011)
  • load_session_entries (1013-1036)
  • delete_session (1326-1369)
  • delete_project (1371-1386)
  • get_all_cached_projects (1389-1433)
test/test_tui.py (1)
claude_code_log/tui.py (5)
  • ProjectSelector (35-394)
  • SessionBrowser (919-1803)
  • run_session_browser (1825-1868)
  • populate_table (108-168)
  • populate_table (1102-1179)
claude_code_log/tui.py (1)
claude_code_log/cache.py (10)
  • CacheManager (168-1386)
  • get_library_version (119-162)
  • get_cached_project_data (604-676)
  • clear_cache (678-686)
  • export_session_to_jsonl (980-1011)
  • SessionCacheData (39-52)
  • get_archived_sessions (939-978)
  • get_working_directories (580-594)
  • load_session_entries (1013-1036)
  • delete_session (1326-1369)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.10)
🔇 Additional comments (17)
claude_code_log/html/templates/index.html (1)

62-70: Archived marker wiring looks consistent (class + badge + hint gated).

Template change is minimal and should compose cleanly with existing project card rendering.

test/test_cache_integration.py (2)

434-437: No changes needed—test correctly assumes CacheManager initialization creates the project row.

The test is correct. CacheManager.__init__ calls _ensure_project_exists(), which always creates a project record in the database if one doesn't exist (via _create_project()). This happens synchronously during initialization, not during conversion/scan. Therefore, get_cached_project_data() will return the project row with the correct version immediately after CacheManager is instantiated, without requiring any additional cache population step.


115-140: No issue found; test correctly asserts cache state.

After --clear-cache, clear_cache() deletes the old project data and creates a new project record. This means get_cached_project_data() always returns a non-None ProjectCache (with empty cached_files and sessions dicts). The test correctly asserts this state: the cache exists but contains no files or sessions.

test/__snapshots__/test_snapshot_html.ambr (2)

1679-1685: Index snapshot changes for “open combined transcript” hint look consistent.

The whitespace/hint placement around .transcript-link-hint appears intentional and shouldn’t affect semantics.

Also applies to: 1698-1704


11749-11749: Title update in snapshot looks intentional and consistent with the new fixture name.

<title> changing to Claude Transcripts - test_multi_session_html0 aligns with the updated snapshot content.

test/test_tui.py (2)

812-813: Good: explicit UTF-8 when writing files.

Keeps tests stable across platforms/locales.


976-1190: Remove this comment — the assertion pattern is not brittle.

The code inserts plain Python strings into DataTable cells (not Rich.Text objects), and [ARCHIVED] is literal text, not Rich markup. When get_row_at() returns these plain strings, str() will reliably preserve the literal [ARCHIVED] prefix. The concern about markup representation changing does not apply here since rendering happens at display time, and these tests check the cell values at insertion time, which remain stable.

Likely an incorrect or invalid review comment.

claude_code_log/cache.py (2)

1436-1445: Export list looks consistent with new API.

__all__ includes the new models and discovery helper as expected.


440-498: All constraints and SQL features are properly implemented.

The migrations confirm:

  • cached_files enforces UNIQUE(project_id, file_name) for the INSERT OR REPLACE at line 440
  • html_cache enforces UNIQUE(project_id, html_path) for the INSERT OR REPLACE at line 791
  • html_pages enforces UNIQUE(project_id, page_number) for the INSERT OR REPLACE at line 1178

The NULLS LAST syntax used in ORDER BY clauses (lines 374, 433, 996, 1032) is supported in SQLite 3.30.0+ (October 2019). Python 3.10+ (required by pyproject.toml) ships with sqlite3 3.36.0+, which provides full support for this syntax.

claude_code_log/converter.py (3)

157-165: Good: handles JSONL deletion race cleanly.

Catching FileNotFoundError around open is the right fix for transient deletions during scans.


542-577: Paging assignment seems OK (session boundary preserved).

Overflow-per-session behavior is consistent with “never split sessions” intent.


1345-1483: Per-session incremental regeneration is a good step forward.

The move to is_html_stale() per session + updating html_cache will reduce unnecessary regeneration.

claude_code_log/tui.py (5)

1681-1694: Verify project archive state transition logic.

After restoring a single session, line 1685-1686 sets self.is_archived_project = False. This flag appears to indicate whether the project directory contains any JSONL files (all archived) vs. some JSONL files exist (mixed or current).

The logic is correct if is_archived_project means "project directory has zero JSONL files" (since we just created one). However, if it means "all sessions are archived," then restoring one session when others remain archived would incorrectly flip the flag.

The subsequent load_sessions() call will recompute session states, so this may work correctly in practice. Consider adding a comment clarifying the intended semantics of is_archived_project to prevent future confusion.


1076-1092: Good handling of test environment edge case.

The comment at lines 1077-1078 clearly explains why archived session detection is conditional on valid_session_ids (JSONL files) being present. This handles test environments where cached sessions exist without corresponding JSONL files by treating them as current rather than archived, allowing tests to proceed normally.

This design choice is well-documented and makes sense in context.


1044-1050: Correct implementation of agent file filtering.

Lines 1045-1049 properly exclude agent-*.jsonl files from cache invalidation checks, as these files are loaded via session references rather than directly. This prevents false cache invalidation and aligns with the PR objectives.

The inline comment clearly explains the rationale.


232-252: Excellent use of check_action for contextual command availability.

Both ProjectSelector.check_action (lines 232-252) and SessionBrowser.check_action (lines 1621-1638) properly restrict actions based on the current item's state (archived vs. current). This provides good UX by:

  • Preventing archive actions on already-archived items
  • Preventing restore actions on current items
  • Preventing resume actions on archived sessions (no JSONL to resume from)

The conditional action availability makes the interface more intuitive and prevents error states.

Also applies to: 1621-1638


519-917: Well-designed confirmation screens with context-aware options.

The five new modal confirmation screens (Archive/Delete/Restore for both sessions and projects) follow consistent design patterns and provide excellent UX:

  1. Smart option presentation: Delete screens show different options based on archived state:

    • Archived items: Only "delete cache" (permanent warning)
    • Current items: "cache only" vs "both JSONL + cache" options
  2. Appropriate visual styling: Warning/error/success colors match action severity

  3. Consistent interaction patterns: y/n/Enter/Escape bindings work intuitively

The contextual messaging helps users understand the consequences of their actions, especially the permanence of deleting archived items that have no JSONL backup.

@daaain
Copy link
Owner Author

daaain commented Jan 10, 2026

I read up a bit on it and JSONB support is relatively recent in version 3.45.0 (2024-01-15) and between SQLite shipped in some Python installers (but not all) and OS versions, I don't want to get into the weeds, so will try zlib.

@daaain
Copy link
Owner Author

daaain commented Jan 10, 2026

Oh, huge difference with the zlib compressed message content JSON: 431MB instead of 1.3GB!

image

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
claude_code_log/migrations/001_initial_schema.sql (1)

6-21: Redundant index on unique column.

idx_projects_path on line 21 is unnecessary since project_path already has a UNIQUE constraint (line 8), which implicitly creates an index in SQLite.

♻️ Suggested fix
 );

-CREATE INDEX IF NOT EXISTS idx_projects_path ON projects(project_path);
test/test_pagination.py (1)

66-76: Assertion may be too permissive.

Using sorted(pages[0]) on line 76 masks the actual ordering of sessions. Since sessions should be in chronological order (as tested in test_sessions_sorted_chronologically), this test should verify the exact order rather than just membership.

♻️ Suggested fix
         assert len(pages) == 1
-        assert sorted(pages[0]) == ["s1", "s2", "s3"]
+        # Sessions should be in chronological order
+        assert pages[0] == ["s1", "s2", "s3"]
test/test_cache_integration.py (1)

395-411: Test may be testing obsolete cache format.

This corruption test targets the old JSON-based cache directory (project/cache/*.json), but the PR migrates to SQLite. The test passes because it gracefully handles the missing/empty cache directory, but it no longer tests actual cache corruption recovery.

Consider either:

  1. Removing this test as the JSON cache format is deprecated
  2. Updating it to test SQLite database corruption scenarios
♻️ Suggested update for SQLite corruption test
     def test_cache_corruption_recovery(self, setup_test_project):
-        """Test recovery from corrupted cache files."""
+        """Test recovery from corrupted SQLite cache database."""
         project_dir = setup_test_project

         # Create initial cache
         convert_jsonl_to_html(input_path=project_dir, use_cache=True)

-        # Corrupt cache file
-        cache_dir = project_dir / "cache"
-        cache_files = list(cache_dir.glob("*.json"))
-        if cache_files:
-            cache_file = [f for f in cache_files if f.name != "index.json"][0]
-            cache_file.write_text("corrupted json data", encoding="utf-8")
+        # Corrupt SQLite database
+        db_path = project_dir.parent / "claude-code-log-cache.db"
+        if db_path.exists():
+            db_path.write_bytes(b"corrupted database content")

         # Should recover gracefully
         output = convert_jsonl_to_html(input_path=project_dir, use_cache=True)
         assert output.exists()
claude_code_log/cache.py (1)

1398-1454: Silent exception handling may hide issues.

The get_all_cached_projects function silently catches all exceptions on line 1439 and returns an empty list. While this prevents crashes, it could hide legitimate issues (e.g., database corruption, permission errors).

Consider logging the exception for debugging purposes.

♻️ Suggested improvement
+    import logging
+
     try:
         conn = sqlite3.connect(db_path, timeout=30.0)
         # ... existing code ...
-    except Exception:
-        pass
+    except Exception as e:
+        logging.debug(f"Failed to query cached projects from {db_path}: {e}")

     return result
claude_code_log/converter.py (2)

1586-1606: Remove redundant import.

json is already imported at the top of the file (line 4), so the local import on line 1594 is unnecessary.

♻️ Suggested fix
 def _get_cleanup_period_days() -> Optional[int]:
     """Read cleanupPeriodDays from Claude Code settings.

     Checks ~/.claude/settings.json for the cleanupPeriodDays setting.

     Returns:
         The configured cleanup period in days, or None if not set/readable.
     """
-    import json
-
     settings_path = Path.home() / ".claude" / "settings.json"
     if not settings_path.exists():
         return None

645-656: Simplify token access pattern for consistency with rest of codebase.

The fields in UsageInfo are Optional[int] with default None, so the second or 0 is necessary to handle None values. However, the getattr call with default 0 is redundant—the attribute always exists on UsageInfo instances. Use usage.input_tokens or 0 directly (as done elsewhere in this file, lines ~515-525), instead of getattr(usage, "input_tokens", 0) or 0.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a7ce7a and 3101323.

📒 Files selected for processing (9)
  • claude_code_log/cache.py
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • claude_code_log/migrations/001_initial_schema.sql
  • claude_code_log/tui.py
  • test/test_cache.py
  • test/test_cache_integration.py
  • test/test_integration_realistic.py
  • test/test_pagination.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/test_integration_realistic.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • test/test_pagination.py
  • test/test_cache.py
  • claude_code_log/cache.py
  • test/test_cache_integration.py
  • claude_code_log/converter.py
  • claude_code_log/cli.py
  • claude_code_log/tui.py
🧠 Learnings (3)
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_pagination.py
  • test/test_cache.py
  • test/test_cache_integration.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/renderer.py : When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Applied to files:

  • claude_code_log/converter.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • claude_code_log/cli.py
  • claude_code_log/tui.py
🧬 Code graph analysis (4)
test/test_pagination.py (2)
claude_code_log/cache.py (7)
  • SessionCacheData (40-53)
  • get_page_count (1315-1326)
  • get_page_size_config (1042-1056)
  • update_page_cache (1152-1233)
  • get_page_data (1058-1100)
  • is_page_stale (1235-1286)
  • get_all_pages (1102-1150)
claude_code_log/converter.py (2)
  • _get_page_html_path (488-495)
  • _assign_sessions_to_pages (542-576)
claude_code_log/cache.py (4)
claude_code_log/factories/transcript_factory.py (1)
  • create_transcript_entry (219-238)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (4)
  • AssistantTranscriptEntry (175-178)
  • SummaryTranscriptEntry (181-186)
  • SystemTranscriptEntry (189-200)
  • UserTranscriptEntry (168-172)
claude_code_log/renderer.py (2)
  • get (121-132)
  • is_html_outdated (2308-2320)
test/test_cache_integration.py (1)
claude_code_log/cache.py (7)
  • get_cached_project_data (616-688)
  • get_archived_sessions (951-990)
  • export_session_to_jsonl (992-1013)
  • load_session_entries (1015-1038)
  • delete_session (1328-1378)
  • delete_project (1380-1395)
  • get_all_cached_projects (1398-1442)
claude_code_log/cli.py (3)
claude_code_log/cache.py (5)
  • CacheManager (169-1395)
  • get_all_cached_projects (1398-1442)
  • get_library_version (120-163)
  • get_cached_project_data (616-688)
  • get_modified_files (608-614)
claude_code_log/converter.py (1)
  • ensure_fresh_cache (1132-1184)
claude_code_log/tui.py (2)
  • run_session_browser (1827-1870)
  • run_project_selector (1808-1824)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.10)
🔇 Additional comments (46)
claude_code_log/migrations/001_initial_schema.sql (1)

24-114: Well-structured schema with appropriate normalization.

The schema design is solid:

  • Proper foreign key relationships with ON DELETE CASCADE ensure data integrity
  • Token counts are denormalized at session level for efficient aggregation queries
  • Indexes cover the primary query patterns (by timestamp, project, file, session, uuid)
  • Using BLOB for compressed content is appropriate for the zlib-compressed JSON storage
test/test_pagination.py (3)

142-346: LGTM!

Fixtures and TestPageCacheMethods provide good coverage of the page cache API:

  • Empty state handling
  • Page creation and retrieval
  • Staleness detection with page size changes
  • Invalidation and cleanup

351-550: LGTM!

The _create_session_messages helper and TestPaginationIntegration class provide excellent end-to-end coverage including:

  • Single page vs multi-page scenarios
  • Page size configuration changes
  • Navigation link presence
  • Statistics in generated HTML

553-736: LGTM!

The remaining test classes provide good coverage:

  • TestNextLinkInPlaceUpdate: Verifies in-place HTML modification for next links
  • TestPaginationNextLinkVisibility: Ensures correct visibility state across pages
  • TestPaginationFallbackWithoutCache: Critical test ensuring the system degrades gracefully when cache data is unavailable
test/test_cache_integration.py (4)

85-191: LGTM!

The CLI integration tests correctly verify the new SQLite cache behavior:

  • Cache database creation at parent level
  • Cache clearing semantics
  • Multi-project caching with --all-projects
  • Date filtering with cached data

193-359: LGTM!

The converter and performance integration tests appropriately verify:

  • Cache creation and utilization during conversion
  • Hierarchy processing with multi-project caching
  • Graceful handling of cache bypass with use_cache=False

The performance test correctly focuses on completion rather than timing assertions, which would be flaky in CI.


445-853: LGTM!

Excellent test coverage for the archived sessions functionality:

  • Detection of archived sessions after JSONL deletion
  • Partial archival scenarios
  • Full export/restore workflow with data integrity verification
  • Message ordering preservation
  • Session and project deletion with cascade behavior
  • Edge cases like empty sessions and non-existent entries

855-913: LGTM!

The TestGetAllCachedProjects class properly verifies:

  • Discovery of both active and archived projects with correct is_archived flags
  • Graceful handling of empty directories (no cache DB)
  • Graceful handling of nonexistent directories
claude_code_log/cli.py (6)

28-52: LGTM!

The _discover_projects function correctly:

  • Identifies active projects (directories with JSONL files)
  • Identifies archived projects from cache (in DB but no JSONL files)
  • Prevents duplicates by checking against active project paths
  • Returns both lists for downstream use

54-106: LGTM!

The updated _launch_tui_with_cache_check properly differentiates:

  • Archived projects: Load directly from cache, error if no sessions found
  • Active projects: Check for modified files, rebuild cache if needed

The user feedback messages are clear and informative.


225-293: LGTM!

The refactoring to use get_working_directories() directly instead of loading the full ProjectCache is more efficient for this use case. The logic remains correct for relative path matching.


296-354: LGTM!

The _clear_caches function now properly handles:

  • Deletion of the centralized SQLite cache database
  • Cleanup of legacy per-project JSON cache directories
  • Both --all-projects mode and single directory mode

This ensures clean migration from the old cache format.


498-503: LGTM!

The --page-size option is well-documented with a sensible default (2000 messages) and clear help text explaining that sessions are never split across pages.


553-640: LGTM!

The TUI flow updates correctly:

  • Re-discover projects on each loop iteration (handles restores/changes)
  • Pass archived_projects set to project selector
  • Determine is_archived state before launching session browser
  • Handle the "back_to_projects" navigation flow
test/test_cache.py (4)

26-34: LGTM!

The fixture correctly creates a project subdirectory so each test gets a unique database path (temp_dir/project/ results in temp_dir/claude-code-log-cache.db), ensuring test isolation.


141-170: LGTM!

The test_message_storage_with_timestamps test properly validates the SQLite storage structure by:

  • Querying the database directly
  • Filtering by project_id for test isolation
  • Using ORDER BY timestamp NULLS LAST (supported since SQLite 3.30.0 per learnings)
  • Verifying correct timestamp ordering and NULL handling for summaries

266-328: Excellent edge case test!

This test catches an important bug where timestamps with 'Z' suffix could be incorrectly excluded due to string comparison issues with .999999 microseconds format. The detailed assertion message helps diagnose failures.


330-704: LGTM!

The remaining test classes properly validate:

  • Library version detection with multiple fallback strategies
  • Cache version compatibility with breaking change patterns
  • Error handling for missing cache entries (SQLite-aware assertions)

The version compatibility tests are thorough, covering exact matches, patterns, and multiple breaking changes.

claude_code_log/cache.py (11)

56-86: LGTM!

The new HtmlCacheEntry and PageCacheData models are well-defined with appropriate fields for tracking HTML generation state and pagination metadata.


120-163: LGTM!

The get_library_version function provides robust version detection with three fallback strategies:

  1. Package metadata (for installed packages)
  2. importlib.resources + toml (for development)
  3. File-relative path (final fallback)

Each strategy is properly wrapped in try/except to gracefully fall through.


169-261: LGTM!

The CacheManager initialization is well-designed:

  • Uses context manager pattern for safe connection handling
  • Enables foreign keys and WAL mode for integrity and concurrency
  • Runs migrations automatically on initialization
  • Properly handles version incompatibility by clearing stale cache data

263-328: LGTM!

The serialization/deserialization methods are well-implemented:

  • Properly extracts type-specific fields for each TranscriptEntry variant
  • Uses zlib.compress for efficient BLOB storage (addresses PR concern about DB size)
  • Uses compact JSON format (separators=(",", ":"))
  • Deserialization leverages the create_transcript_entry factory for clean type dispatch

343-364: LGTM!

The is_file_cached method correctly handles:

  • Missing project ID (returns False)
  • Non-existent files (returns False)
  • Modification time comparison with 1-second tolerance (handles filesystem precision differences)

383-450: LGTM!

The load_cached_entries_filtered method properly handles:

  • Date parsing with dateparser for flexible user input
  • Normalization to UTC 'Z' format for consistent string comparison with stored timestamps
  • NULL timestamp handling (summaries are always included)
  • SQL-based filtering for efficiency

This addresses the edge case tested in test_filtered_loading_with_z_suffix_boundary.


515-688: LGTM!

The session and project management methods are well-implemented:

  • update_session_cache: Uses INSERT OR REPLACE for clean upsert semantics
  • update_project_aggregates: Updates denormalized counts efficiently
  • get_working_directories: Simple distinct query for CWD matching
  • get_cached_project_data: Properly assembles the full ProjectCache from multiple tables

762-925: LGTM!

The HTML cache methods provide comprehensive staleness detection:

  • is_html_stale: Checks library version, file existence, embedded version, and message count changes
  • get_stale_sessions: Properly skips archived sessions to avoid perpetual regeneration attempts
  • Uses the is_html_outdated helper from renderer for embedded version checking

951-1038: LGTM!

The archived session methods provide the foundation for the restore workflow:

  • get_archived_sessions: Returns session data for sessions whose JSONL files were deleted
  • export_session_to_jsonl: Returns compact JSON strings ready for file restoration
  • load_session_entries: Deserializes entries for HTML/Markdown rendering of archived sessions

1040-1326: LGTM!

The page cache methods support the pagination feature:

  • update_page_cache: Maintains page metadata and session mappings with proper upsert logic
  • is_page_stale: Detects configuration changes, version mismatches, and missing sessions
  • invalidate_all_pages: Returns paths for downstream cleanup when re-pagination is needed
  • Session ordering is preserved through the session_order column

1328-1395: LGTM!

The delete methods properly clean up related data:

  • delete_session: Removes messages, HTML cache entries, cached_files entry, and session record
  • delete_project: Relies on CASCADE delete for clean removal

The explicit deletion in delete_session (rather than relying only on CASCADE) ensures the cached_files entry is removed, which is critical for is_file_cached() behavior as tested in test_delete_session_invalidates_file_cache.

claude_code_log/converter.py (8)

420-486: LGTM!

The GenerationStats dataclass is well-structured with proper use of field(default_factory=...) for mutable defaults. The summary method provides useful diagnostic output.


542-577: LGTM!

The pagination logic correctly ensures sessions are never split across pages, allowing pages to overflow the limit rather than breaking session continuity.


676-862: LGTM!

The paginated HTML generation handles edge cases well: page size changes trigger full regeneration, orphan pages are cleaned up, and empty projects still get a single page. The staleness checks avoid unnecessary regeneration.


157-166: LGTM!

Good defensive handling for the race condition where a file may be deleted between discovery and opening. This is a common issue with log files that can be cleaned up by external processes.


1099-1130: LGTM!

The fast cache invalidation check correctly defers expensive message loading while conservatively detecting when updates are needed.


1007-1024: LGTM!

The pagination decision logic correctly handles both the initial case (messages exceed page_size) and the hysteresis case (existing pages should be maintained even if message count decreases).


1686-1760: LGTM!

The refactored stats aggregation provides clear fast/slow paths, correctly handles both paginated and non-paginated projects, and tracks archived session counts appropriately.


1973-2028: No changes needed—CacheManager already handles archived projects without filesystem access.

CacheManager stores the project path as a string identifier in the SQLite database and performs all operations (initialization, cache lookup, working directory retrieval) purely through database queries. It never accesses the filesystem for the project_path, so non-existent directories pose no issue for archived projects. The exception handling at lines 1976–1983 provides additional safety if any unexpected errors occur.

Likely an incorrect or invalid review comment.

claude_code_log/tui.py (9)

100-168: LGTM!

The archived project handling in ProjectSelector correctly shows indicators, skips cache building for archived projects, and properly strips the indicator when resolving selections.


519-917: LGTM!

The confirmation modal screens follow consistent patterns, properly scope CSS, and implement appropriate keybindings. The conditional behavior for archived vs non-archived items is correctly handled.


1078-1095: Potential inconsistency in archived session detection.

When valid_session_ids is empty (no JSONL files), all cached sessions are treated as "current" rather than "archived". The comment mentions test environments, but this contradicts the archival model where sessions without JSONL files should be archived.

Consider whether this special case is intentional or if it should instead mark all sessions as archived when no JSONL files exist (similar to is_archived_project handling above).


1827-1864: LGTM!

The run_session_browser function properly handles archived projects that may not have JSONL files or even the project directory, falling back to cached data when available.


1623-1640: LGTM!

The check_action method correctly gates actions based on session state - archive and resume only work for current sessions, restore only works for archived sessions.


1135-1181: LGTM!

The merged display of current and archived sessions maintains chronological ordering and provides clear visual distinction with the [ARCHIVED] prefix.


309-340: The clear_cache() method exists in the CacheManager class and is correctly implemented. No action required.


1654-1671: The export_session_to_jsonl() method exists in the CacheManager class and has the correct signature. It returns List[str] containing decompressed JSON strings, which matches the usage in the code—iterating over the list and writing each string as a line in the JSONL file is correct.


1780-1784: delete_session() method exists and works as expected.

The CacheManager.delete_session() method is implemented at claude_code_log/cache.py:1328 with the correct signature def delete_session(self, session_id: str) -> bool:. It returns True on successful deletion and False if the session is not found or the project ID is invalid. The code correctly captures and uses the boolean return value.

@daaain
Copy link
Owner Author

daaain commented Jan 10, 2026

Update: the new db is actually 592MB after I realised that there were some sessions that I only had in the old cache and had to restore 🙀 this is an amazing coincidence, I've been working on a new app that uses Claude Code as inference backend and a rogue local installation must have cleared up my old sessions as I was experimenting with it!

image

Anyway, bit less than half size is still pretty good going!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
test/test_cli.py (1)

84-94: Helper function is well-structured but could specify encoding.

The create_project_with_jsonl helper writes JSONL without explicit encoding. For consistency with other test files and to prevent potential encoding issues on Windows, consider specifying UTF-8 encoding.

Suggested fix
 def create_project_with_jsonl(
     projects_dir: Path, name: str, jsonl_data: list[dict]
 ) -> Path:
     """Helper to create a project directory with JSONL file."""
     project_dir = projects_dir / name
     project_dir.mkdir(exist_ok=True)
     jsonl_file = project_dir / "session-1.jsonl"
-    with open(jsonl_file, "w") as f:
+    with open(jsonl_file, "w", encoding="utf-8") as f:
         for entry in jsonl_data:
             f.write(json.dumps(entry) + "\n")
     return project_dir
test/test_renderer_timings.py (2)

12-19: Empty test body for default state.

The test_debug_timing_disabled_by_default test has no assertions. While the docstring explains the limitation, consider either adding an assertion or marking it as a documentation-only test.

Suggested approach
     def test_debug_timing_disabled_by_default(self):
         """DEBUG_TIMING is False by default."""
-        # Import with fresh module state
-
-        # Note: We can't easily test the default since the module is already loaded
-        # This test just documents the expected default behavior
-        # The actual value depends on environment at import time
+        # Note: We can't easily test the true default since the module is already
+        # loaded. This test verifies the behavior when env var is unset.
+        import os
+        import importlib
+        import claude_code_log.renderer_timings as rt
+        
+        # Ensure env var is not set
+        os.environ.pop("CLAUDE_CODE_LOG_DEBUG_TIMING", None)
+        importlib.reload(rt)
+        
+        # Should default to False when env var is not set
+        assert rt.DEBUG_TIMING is False

165-204: Consider timing reliability for timing_stat tests.

The test uses time.sleep(0.01) and asserts duration >= 0.01. While generally reliable, this could be flaky under heavy system load. Consider using a slightly larger margin or a more lenient assertion.

Suggested improvement for test reliability
     def test_tracks_operation_timing_when_enabled(
         self, monkeypatch: pytest.MonkeyPatch
     ):
         """Tracks operation timing when DEBUG_TIMING enabled."""
         monkeypatch.setenv("CLAUDE_CODE_LOG_DEBUG_TIMING", "1")

         import importlib
         import claude_code_log.renderer_timings as rt

         importlib.reload(rt)
         rt._timing_data.clear()
         rt._timing_data["_test_timings"] = []
         rt._timing_data["_current_msg_id"] = "msg-123"

         with rt.timing_stat("_test_timings"):
             time.sleep(0.01)

         assert len(rt._timing_data["_test_timings"]) == 1
         duration, msg_id = rt._timing_data["_test_timings"][0]
-        assert duration >= 0.01
+        assert duration >= 0.009  # Allow small margin for system load
         assert msg_id == "msg-123"
test/test_migrations.py (1)

186-200: Potential race condition in pending migrations test.

The test closes the connection after _ensure_schema_version_table, then calls run_migrations(db_path) which creates a new connection, then reconnects to check pending. While this works, the pattern of reconnecting could miss edge cases.

Consider using consistent connection handling
     def test_none_pending_after_all_applied(self, tmp_path: Path):
         """No migrations pending after all applied."""
         db_path = tmp_path / "test.db"
-        conn = sqlite3.connect(db_path)
 
         # Run all migrations
         run_migrations(db_path)
 
         # Reconnect and check
         conn = sqlite3.connect(db_path)
-        pending = get_pending_migrations(conn)
-        assert len(pending) == 0
-
-        conn.close()
+        try:
+            pending = get_pending_migrations(conn)
+            assert len(pending) == 0
+        finally:
+            conn.close()
claude_code_log/cache.py (3)

442-452: Consider the to_date time handling.

When to_date is a specific time (e.g., "2025-06-08 15:00"), the code still sets it to end-of-day (23:59:59). This may be intentional for inclusive day filtering, but if precise time filtering is desired, you might want to skip the replacement for timestamps that already include a time component.


521-544: Consider executemany for batch insertion.

The loop inserts entries one at a time. For large files with many messages, using executemany with a list of parameter tuples could improve performance. This is a minor optimization and may not be critical given the current use patterns.


1459-1481: Silent exception handling may hide issues.

The broad except Exception: pass at line 1480-1481 silently swallows all errors, including potential database corruption or schema issues. Consider logging the exception for debugging purposes.

Proposed improvement
-    except Exception:
-        pass
+    except Exception as e:
+        import logging
+        logging.debug(f"Failed to query cached projects: {e}")
 
     return result
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3101323 and 479e71b.

📒 Files selected for processing (9)
  • claude_code_log/cache.py
  • claude_code_log/cli.py
  • test/conftest.py
  • test/test_cache.py
  • test/test_cache_integration.py
  • test/test_cache_sqlite_integrity.py
  • test/test_cli.py
  • test/test_migrations.py
  • test/test_renderer_timings.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • test/test_cache_sqlite_integrity.py
  • test/test_cli.py
  • test/test_cache_integration.py
  • claude_code_log/cli.py
  • test/test_cache.py
  • test/test_renderer_timings.py
  • claude_code_log/cache.py
  • test/conftest.py
  • test/test_migrations.py
🧠 Learnings (5)
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_cache_sqlite_integrity.py
  • test/test_cli.py
  • test/test_cache_integration.py
  • test/test_cache.py
  • test/test_renderer_timings.py
  • test/conftest.py
  • test/test_migrations.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • claude_code_log/cli.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : Use Playwright to test browser runtime features and message filtering consistency in the timeline component

Applied to files:

  • test/test_renderer_timings.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : When adding new message types or modifying CSS class generation in renderer.py, ensure the timeline's message type detection logic in timeline.html is updated accordingly to maintain feature parity

Applied to files:

  • test/test_renderer_timings.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/renderer.py : When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Applied to files:

  • test/test_renderer_timings.py
🧬 Code graph analysis (5)
test/test_cache_sqlite_integrity.py (2)
claude_code_log/cache.py (9)
  • SessionCacheData (41-54)
  • save_cached_entries (483-544)
  • update_session_cache (546-579)
  • _get_connection (222-231)
  • load_cached_entries (397-412)
  • load_cached_entries_filtered (414-481)
  • get_cache_stats (761-791)
  • get_working_directories (623-637)
  • is_file_cached (374-395)
claude_code_log/models.py (9)
  • AssistantMessageModel (131-141)
  • AssistantTranscriptEntry (175-178)
  • TextContent (55-59)
  • ThinkingContent (107-110)
  • ToolResultContent (99-104)
  • ToolUseContent (92-96)
  • UsageInfo (81-89)
  • UserMessageModel (123-128)
  • UserTranscriptEntry (168-172)
test/test_cache.py (2)
test/test_cache_integration.py (2)
  • parent (118-119)
  • exists (121-122)
claude_code_log/cache.py (2)
  • CacheManager (190-1426)
  • get_all_cached_projects (1429-1483)
test/test_renderer_timings.py (2)
claude_code_log/renderer_timings.py (4)
  • set_timing_var (24-32)
  • log_timing (36-84)
  • timing_stat (88-110)
  • report_timing_statistics (113-134)
claude_code_log/renderer.py (1)
  • get (121-132)
claude_code_log/cache.py (4)
claude_code_log/factories/transcript_factory.py (1)
  • create_transcript_entry (219-238)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (5)
  • AssistantTranscriptEntry (175-178)
  • QueueOperationTranscriptEntry (203-220)
  • SummaryTranscriptEntry (181-186)
  • SystemTranscriptEntry (189-200)
  • UserTranscriptEntry (168-172)
claude_code_log/renderer.py (2)
  • get (121-132)
  • is_html_outdated (2308-2320)
test/test_migrations.py (1)
claude_code_log/migrations/runner.py (10)
  • _compute_checksum (16-18)
  • _ensure_schema_version_table (29-56)
  • _parse_migration_number (21-26)
  • apply_migration (92-110)
  • get_applied_migrations (59-65)
  • get_available_migrations (68-82)
  • get_current_version (159-163)
  • get_pending_migrations (85-89)
  • run_migrations (135-156)
  • verify_migrations (113-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.11)
🔇 Additional comments (53)
test/conftest.py (2)

4-9: LGTM! Clean TYPE_CHECKING pattern for type hints.

The TYPE_CHECKING block correctly imports CacheManager only for type annotation purposes, avoiding runtime import overhead and potential circular dependency issues.


17-51: Well-designed fixtures for test isolation.

The three new fixtures provide proper SQLite database isolation for parallel test execution:

  • isolated_cache_dir creates a unique project directory
  • isolated_db_path provides a unique database path
  • isolated_cache_manager combines both with local import to avoid circular dependencies

The local import of CacheManager inside the fixture body is the correct approach when the type is only needed in a TYPE_CHECKING block.

test/test_cli.py (5)

21-41: Good fixture design for CLI test isolation.

The ProjectsSetup class and cli_projects_setup fixture properly encapsulate both the projects directory and isolated database path, using environment variable override for cache isolation. This enables parallel test execution without database conflicts.


106-162: Comprehensive project discovery tests.

The TestDiscoverProjects class thoroughly tests:

  • Active project discovery with JSONL files
  • Archived project detection from cache
  • Empty directory handling

Good coverage of the discovery logic including edge cases.


165-241: Cache clearing tests are well-structured.

Tests properly verify:

  • Single project cache clearing
  • All projects cache clearing (database deletion)
  • No-op behavior for single file paths

The assertions correctly check cache state before and after operations.


313-429: Good CLI integration test coverage.

The main command tests cover key scenarios including help output, default behavior, cache/HTML clearing flags, format options, no-cache mode, and custom output paths. The assertions appropriately verify both exit codes and expected side effects.


432-469: Error handling tests appropriately verify graceful degradation.

Tests verify the CLI handles edge cases without crashing:

  • Invalid format options
  • Empty project directories
  • Malformed JSONL content

Using exit_code in (0, 1) is reasonable for tests where the exact behavior may vary based on error handling paths.

test/test_renderer_timings.py (3)

20-59: Good coverage of DEBUG_TIMING environment variable.

Tests correctly verify that various truthy values (1, true, yes, TRUE) enable the debug timing flag. The module reload pattern ensures fresh state for each test.


62-89: Thorough tests for set_timing_var behavior.

Tests verify both enabled and disabled states, ensuring variables are stored only when DEBUG_TIMING is active.


207-291: Comprehensive report_timing_statistics tests.

Tests verify output format, sorting behavior, empty handling, and the 10-slowest limit. The string position assertions for sorting order are clever and effective.

test/test_cache_sqlite_integrity.py (9)

29-32: Local fixture properly overrides conftest fixture.

The local cache_manager fixture with explicit db_path ensures test isolation while using the same pattern as the conftest fixtures.


84-176: Thorough cascade delete verification.

The test properly verifies that deleting a project cascades to all related records (files, messages, sessions). Direct SQL verification ensures the cascade behavior works at the database level.


247-286: Good foreign key constraint tests.

Tests verify that SQLite enforces referential integrity by expecting sqlite3.IntegrityError for invalid foreign key references.


289-406: Excellent serialization round-trip test coverage.

Tests verify that complex content types (tool_use, tool_result, thinking) survive JSON serialization and deserialization. The field-by-field comparison is appropriate since default values may differ.


629-636: WAL mode verification is straightforward and correct.

Simple test confirms WAL journal mode is enabled as expected.


639-688: Concurrent reader test is well-designed.

The test verifies that multiple reader threads can access the database simultaneously without blocking. Using 5 threads with a 5-second timeout is reasonable for this verification.


690-736: Performance test uses appropriate scale.

Using 1000 entries (reduced from 10k per comment) balances coverage with test execution time. The 2-second threshold is reasonable for filtered loading.


900-928: Consider potential flakiness in mtime detection test.

The test uses time.sleep(1.1) to ensure different mtime. While this should be sufficient on most systems, filesystem mtime resolution varies. The existing approach is reasonable but worth noting.


930-968: Migration integrity tests are comprehensive.

Tests verify checksum storage, SHA256 format, and idempotent migration application. These are important for ensuring database schema evolution works correctly.

test/test_cache_integration.py (6)

17-23: Clean container class for test setup data.

ProjectSetup provides a simple container for bundling related test context.


96-139: Well-designed Path-like wrapper for backward compatibility.

ProjectWithCache implements the essential Path-like interface (__fspath__, __truediv__, glob, iterdir, etc.) to maintain compatibility with existing tests while carrying the additional db_path context. This is a pragmatic approach to avoid rewriting all dependent tests.


526-728: Comprehensive archived sessions integration tests.

Excellent coverage of the archive workflow:

  • Session archival when JSONL files are deleted
  • Partial archival with some files remaining
  • Export to JSONL for restoration
  • Loading session entries for rendering
  • Full archive/restore workflow end-to-end
  • Message order preservation
  • Empty/non-existent session handling
  • Compact JSON export format verification

These tests thoroughly validate the new archive functionality.


810-891: Critical test for delete_session cache invalidation.

The test_delete_session_invalidates_file_cache test verifies an important behavior: that deleting a session also removes the cached_files entry. The detailed docstring explains the bug this prevents. Good defensive testing.


936-994: Good coverage for get_all_cached_projects function.

Tests verify:

  • Finding both active and archived projects
  • Empty directory handling
  • Non-existent directory handling

The is_archived flag verification correctly checks project state based on JSONL file presence.


476-492: Test targets outdated JSON cache structure; SQLite backend doesn't create cache files in project directory.

The test assumes a cache subdirectory with JSON files at project_dir / "cache", but the current implementation stores all cache data in a SQLite database (claude-code-log-cache.db). The cache_dir.glob("*.json") call will return an empty list, making the if cache_files: condition always False. This test doesn't actually exercise SQLite corruption recovery and should be updated to test the actual cache backend.

Likely an incorrect or invalid review comment.

test/test_migrations.py (6)

23-44: Thorough migration filename parsing tests.

Tests cover standard format, multiple underscores, and invalid formats with appropriate error handling verification.


47-67: Good checksum computation tests.

Verifies consistency, content sensitivity, and SHA256 output format (64 hex characters).


70-113: Schema version table tests cover upgrade path.

Tests verify both fresh table creation and upgrade from old format (adding checksum column). This ensures smooth migration from previous schema versions.


232-268: Clever approach to testing modified migration warnings.

Using a fake checksum to simulate a modified migration file effectively tests the verification logic without actually modifying real migration files.


271-315: Comprehensive run_migrations tests.

Tests verify:

  • All pending migrations are applied
  • Core tables are created
  • Multiple runs are idempotent
  • Database file is created if missing

This covers the essential migration runner behaviors.


318-367: Good current version tests.

Tests verify zero for fresh database, highest version detection (even with out-of-order insertion), and alignment with available migrations after execution.

test/test_cache.py (7)

26-33: Good fixture update for SQLite isolation.

Creating a project subdirectory ensures each test gets a unique parent directory for the shared cache database, enabling proper test isolation.


98-116: Updated initialization and path tests reflect SQLite backend.

Tests correctly verify that the SQLite database is created at the parent level (parent/claude-code-log-cache.db).


141-170: Direct SQL verification is appropriate for storage tests.

The test uses ORDER BY timestamp NULLS LAST which is supported in SQLite 3.30.0+ (per retrieved learnings). This approach validates the internal storage structure directly.


266-328: Excellent edge case test for Z suffix timestamp boundary.

This test addresses a specific bug where timestamps ending with 'Z' (e.g., 23:59:59Z) could be incorrectly excluded due to string comparison issues with .999999 microsecond suffixes. The detailed docstring explains the problem being tested.


330-352: Clear cache test updated for SQLite semantics.

The test correctly verifies that cache clearing removes files and sessions from the SQLite database while maintaining the project record.


657-704: Error handling tests are thorough.

Tests cover missing cache entries, missing JSONL files, and directory permission issues with appropriate fallback behavior.


707-792: Comprehensive env var precedence tests.

The TestCachePathEnvVar class thoroughly tests the cache path resolution priority:

  1. Explicit db_path parameter (highest)
  2. CLAUDE_CODE_LOG_CACHE_PATH environment variable
  3. Default location (parent/claude-code-log-cache.db)

Tests also verify that get_all_cached_projects respects the same precedence rules.

claude_code_log/cli.py (7)

20-25: LGTM!

Import changes correctly bring in the new SQLite-based cache utilities needed for the refactored cache system.


33-56: LGTM!

The _discover_projects function correctly identifies active projects (with JSONL files) and archived projects (cached but without JSONL files). The logic properly differentiates between them using the cache's is_archived flag and cross-referencing with active project paths.


59-111: LGTM!

The archived project handling is well-implemented. The function correctly:

  • Skips JSONL file operations for archived projects
  • Loads session data directly from cache for archived projects
  • Provides appropriate error messaging when archived projects have no cached sessions
  • Maintains existing cache refresh logic for non-archived projects

239-260: LGTM!

Using get_working_directories() instead of loading the full project cache is more efficient for this specific use case. The fallback logic to build cache and re-fetch directories is preserved correctly.


306-352: LGTM!

The cache clearing logic properly handles the migration from JSON-based to SQLite-based caching:

  • Deletes the centralized SQLite database for all-projects mode
  • Cleans up legacy JSON cache directories for backwards compatibility
  • Uses appropriate error handling with warnings for non-critical failures

503-508: LGTM!

The --page-size CLI option is well-defined with a sensible default of 2000 messages per page. The help text clearly explains that sessions are never split across pages, which is an important constraint for pagination.


558-642: LGTM!

The TUI flow correctly handles archived projects throughout:

  • Re-discovers projects on each loop iteration to reflect changes from restore operations
  • Consistently passes archived_projects set to the project selector
  • Correctly determines is_archived status via set membership before launching TUI
  • All three TUI branch cases (single project, single match, multiple projects) are handled uniformly
claude_code_log/cache.py (8)

57-87: LGTM!

The HtmlCacheEntry and PageCacheData Pydantic models are well-structured with appropriate fields for tracking HTML generation metadata and paginated transcript pages. The models include all necessary information for cache invalidation decisions (timestamps, message counts, library version, session IDs).


170-184: LGTM!

The get_cache_db_path function correctly implements the configuration priority (env var > default) with a unique, descriptive filename as discussed in the PR.


221-231: LGTM!

The connection management is well-implemented:

  • Context manager ensures connections are properly closed
  • WAL mode enables concurrent reads while writing
  • Foreign keys enforcement maintains referential integrity
  • 30-second timeout handles contention gracefully

294-359: LGTM!

The serialization approach is well-designed:

  • Extracts queryable fields (timestamp, session_id, tokens) into indexed columns
  • Stores full content as zlib-compressed JSON for space efficiency
  • Deserialization correctly uses the factory pattern from relevant code snippets
  • Compact JSON format (separators=(",", ":")) minimizes storage overhead

863-863: Lazy import avoids circular dependency.

The from .renderer import is_html_outdated is placed inside the method to avoid circular imports. This is a reasonable pattern, but consider documenting this dependency relationship.


1486-1496: LGTM!

The __all__ export list properly exposes all public entities needed by consumers of this module.


121-164: No action required. The toml package is already listed in the project dependencies (toml>=0.10.2), so the function's use of it in fallback scenarios is safe and properly supported.

Likely an incorrect or invalid review comment.


1393-1398: The filename assumption is correct. Session JSONL files are indeed named {session_id}.jsonl throughout the codebase (verified in claude_code_log/tui.py lines 376, 1667, 1725, 1775), so the delete_session method's deletion logic will properly clean up the corresponding cached_files entry without orphaning it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @claude_code_log/cache.py:
- Around line 1388-1438: The delete_session method fails to remove page_sessions
entries pointing to the removed session; update the delete_session function to
also execute a DELETE FROM page_sessions WHERE project_id = ? AND session_id = ?
using self._project_id and session_id (before committing), so page_sessions no
longer reference the deleted session; ensure this new delete runs alongside the
existing deletes (messages, html_cache, cached_files, sessions) and that
self._update_last_updated(conn) and conn.commit() remain after all deletions.
🧹 Nitpick comments (5)
claude_code_log/cache.py (5)

121-164: Consider consolidating duplicate toml imports.

The import toml statement appears twice within the function (lines 135 and 152). While this works due to Python's import caching, consolidating the import at the top of the function or using a single try-except block would be cleaner.

♻️ Suggested refactor
 def get_library_version() -> str:
     """Get the current library version from package metadata or pyproject.toml."""
     # First try to get version from installed package metadata
     try:
         from importlib.metadata import version as get_version
 
         return get_version("claude-code-log")
     except Exception:
         # Package not installed or other error, continue to file-based detection
         pass
 
+    # Import toml once for file-based detection methods
+    try:
+        import toml
+    except ImportError:
+        return "unknown"
+
     # Second approach: Use importlib.resources for more robust package location detection
     try:
         from importlib import resources
-        import toml
 
         # Get the package directory and navigate to parent for pyproject.toml
         package_files = resources.files("claude_code_log")

254-256: Avoid printing to stdout in library code.

Using print() for version mismatch notification may interfere with non-CLI consumers of this library. Consider using logging module or allowing callers to handle this via a callback/return value.

♻️ Suggested refactor
+import logging
+
+logger = logging.getLogger(__name__)
+
 # In _ensure_project_exists:
                 # Check version compatibility
                 if not self._is_cache_version_compatible(cached_version):
-                    print(
+                    logger.info(
                         f"Cache version incompatible: {cached_version} -> {self.library_version}, invalidating cache"
                     )

442-452: Simplify redundant conditional branches.

Both branches of the if-else at lines 445-452 set the same time (23:59:59.999999), making the condition unnecessary.

♻️ Suggested simplification
         if to_date:
             to_dt = dateparser.parse(to_date)
             if to_dt:
-                if to_date in ["today", "yesterday"] or "days ago" in to_date:
-                    to_dt = to_dt.replace(
-                        hour=23, minute=59, second=59, microsecond=999999
-                    )
-                else:
-                    to_dt = to_dt.replace(
-                        hour=23, minute=59, second=59, microsecond=999999
-                    )
+                # Always set to end of day for "to" date bound
+                to_dt = to_dt.replace(
+                    hour=23, minute=59, second=59, microsecond=999999
+                )

527-547: Consider using executemany() for batch inserts.

Inserting entries one at a time in a loop is functional but slower for large files. Using executemany() with a list of serialized entries would improve performance.

♻️ Batch insert optimization
             # Delete existing messages for this file
             conn.execute("DELETE FROM messages WHERE file_id = ?", (file_id,))
 
-            # Insert all entries in a batch
-            for entry in entries:
-                serialized = self._serialize_entry(entry, file_id)
-                conn.execute(
-                    """
-                    INSERT INTO messages (
-                        project_id, file_id, type, timestamp, session_id,
-                        ...
-                    ) VALUES (
-                        :project_id, :file_id, :type, :timestamp, :session_id,
-                        ...
-                    )
-                    """,
-                    serialized,
-                )
+            # Insert all entries in a batch using executemany
+            serialized_entries = [self._serialize_entry(entry, file_id) for entry in entries]
+            conn.executemany(
+                """
+                INSERT INTO messages (
+                    project_id, file_id, type, timestamp, session_id,
+                    _uuid, _parent_uuid, _is_sidechain, _user_type, _cwd, _version,
+                    _is_meta, _agent_id, _request_id,
+                    input_tokens, output_tokens, cache_creation_tokens, cache_read_tokens,
+                    _leaf_uuid, _level, _operation, content
+                ) VALUES (
+                    :project_id, :file_id, :type, :timestamp, :session_id,
+                    :_uuid, :_parent_uuid, :_is_sidechain, :_user_type, :_cwd, :_version,
+                    :_is_meta, :_agent_id, :_request_id,
+                    :input_tokens, :output_tokens, :cache_creation_tokens, :cache_read_tokens,
+                    :_leaf_uuid, :_level, :_operation, :content
+                )
+                """,
+                serialized_entries,
+            )

1509-1510: Silent exception swallowing may hide issues.

The bare except Exception: pass silently ignores all errors including database corruption, permission issues, or unexpected failures. Consider logging the exception for debugging purposes.

♻️ Add logging for exceptions
+import logging
+
+logger = logging.getLogger(__name__)
+
     except Exception:
-        pass
+        logger.debug("Failed to load cached projects: %s", exc_info=True)
 
     return result
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 479e71b and 4695f79.

📒 Files selected for processing (3)
  • claude_code_log/cache.py
  • claude_code_log/tui.py
  • test/test_pagination.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • claude_code_log/cache.py
  • test/test_pagination.py
  • claude_code_log/tui.py
🧠 Learnings (2)
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_pagination.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • claude_code_log/tui.py
🧬 Code graph analysis (1)
claude_code_log/cache.py (4)
claude_code_log/factories/transcript_factory.py (1)
  • create_transcript_entry (219-238)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (5)
  • AssistantTranscriptEntry (175-178)
  • QueueOperationTranscriptEntry (203-220)
  • SummaryTranscriptEntry (181-186)
  • SystemTranscriptEntry (189-200)
  • UserTranscriptEntry (168-172)
claude_code_log/renderer.py (2)
  • get (121-132)
  • is_html_outdated (2308-2320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.11)
🔇 Additional comments (18)
claude_code_log/cache.py (1)

1319-1332: LGTM - SQL parameterization is correctly applied.

The IN clause placeholders are safely generated from the list length, and actual session IDs are passed as bound parameters, preventing SQL injection.

test/test_pagination.py (6)

1-19: LGTM - Clean imports and test setup.

The imports are well-organized, pulling in necessary fixtures and the specific functions under test (_get_page_html_path, _assign_sessions_to_pages).


37-139: Good test coverage for pagination logic edge cases.

The tests comprehensively cover:

  • Single/multiple sessions below threshold
  • Overflow behavior when sessions exceed page size
  • Chronological ordering
  • Empty input handling
  • Complex multi-page scenarios

155-159: Fixture patches version but passes same value to CacheManager.

The cache_manager fixture patches get_library_version to return mock_version, then passes the same mock_version to CacheManager. This is correct but the patch may be unnecessary since the version is explicitly passed. However, keeping the patch is safer in case internal code paths call get_library_version().


276-301: Test for session_missing correctly isolates the check.

The test creates a page cache entry but doesn't add the session to the sessions table, correctly testing that is_page_stale detects missing sessions. The mock of is_html_outdated properly isolates the staleness check.


549-594: Good integration test coverage for pagination scenarios.

Tests correctly:

  • Create realistic JSONL data with helper function
  • Verify single vs. multiple page generation based on page_size
  • Check for presence of expected HTML files

705-770: Thorough edge case testing for next link updates.

Tests properly cover:

  • Removing last-page class when enabling next link
  • No modification when already visible
  • Graceful handling of missing files and invalid page numbers
claude_code_log/tui.py (11)

65-78: LGTM - Clean initialization with optional archived_projects parameter.

The default None with fallback to empty set handles the optional parameter correctly, maintaining backward compatibility.


287-293: Good error handling with individual file tracking.

The try-except per file with tracking of succeeded/failed files provides good UX - partial failures are reported clearly without stopping the entire operation.


340-342: Cache deletion order is correct.

Cache is cleared first regardless of option (cache_only or both), which is the desired behavior. File deletion follows if requested.


412-424: Good idempotency check before restoration.

The code correctly filters out sessions that already exist as JSONL files, preventing accidental overwrites and providing clear feedback when nothing needs restoration.


1117-1122: Good exclusion of agent files from session loading.

Agent files (agent-*.jsonl) are correctly excluded since they're loaded via session references, not as standalone sessions.


1694-1711: Well-designed action gating based on session state.

The check_action method correctly:

  • Allows archive/resume only for current sessions (with JSONL)
  • Allows restore only for archived sessions (cache-only)
  • Returns True for unhandled actions as a safe default

1735-1741: LGTM - Safe file writing with directory creation.

The code ensures the project directory exists before writing, handles the case where the project directory was deleted, and writes messages correctly with newline separators.


1836-1867: Delete confirmation callback handles all cases correctly.

The callback:

  • Only attempts JSONL deletion for "both" option
  • Uses CacheManager.delete_session() for cache cleanup
  • Reports what was actually deleted
  • Clears selection and refreshes UI

1599-1611: Correct dual-path loading for archived vs. current sessions.

The code properly distinguishes:

  • Archived sessions: loads messages from cache via load_session_entries()
  • Current sessions: loads messages from JSONL files via load_directory_transcripts()

This ensures archived sessions can still be rendered to HTML/Markdown.


1898-1935: Good handling of archived projects in entry point.

The run_session_browser function correctly:

  • Allows archived projects to load from cache even if the directory doesn't exist
  • Attempts cache lookup before showing "does not exist" error
  • Checks for cached sessions when no JSONL files are found for archived projects

286-316: Non-atomic file deletion is acceptable for TUI context.

The file deletion loop isn't atomic (if interrupted, some files may be deleted but project not marked archived). This is acceptable because:

  1. Atomic deletion of multiple files isn't possible without a transaction manager
  2. The partial state is recoverable - user can archive again
  3. The code correctly only marks as archived if ALL files deleted

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/test_cache_integration.py (1)

476-492: Stale test: references old JSON-based cache structure.

This test looks for project_dir / "cache" / "*.json" files, but the new implementation uses SQLite. The test will silently pass without corrupting anything since cache_files will be empty, making the if cache_files: block a no-op.

🔧 Update test for SQLite corruption recovery
     def test_cache_corruption_recovery(self, setup_test_project):
         """Test recovery from corrupted cache files."""
-        project_dir = setup_test_project
+        project_dir = setup_test_project.project_dir
+        db_path = setup_test_project.db_path
 
         # Create initial cache
         convert_jsonl_to_html(input_path=project_dir, use_cache=True)
 
-        # Corrupt cache file
-        cache_dir = project_dir / "cache"
-        cache_files = list(cache_dir.glob("*.json"))
-        if cache_files:
-            cache_file = [f for f in cache_files if f.name != "index.json"][0]
-            cache_file.write_text("corrupted json data", encoding="utf-8")
+        # Corrupt SQLite database
+        assert db_path.exists()
+        with open(db_path, "r+b") as f:
+            f.seek(100)  # Skip SQLite header
+            f.write(b"corrupted data here")
 
-        # Should recover gracefully
-        output = convert_jsonl_to_html(input_path=project_dir, use_cache=True)
-        assert output.exists()
+        # Should recover gracefully (may recreate database)
+        try:
+            output = convert_jsonl_to_html(input_path=project_dir, use_cache=True)
+            assert output.exists()
+        except Exception:
+            # Some form of error handling is acceptable
+            pass
🧹 Nitpick comments (3)
claude_code_log/cache.py (3)

444-452: Redundant conditional branches - both execute identical code.

The if and else branches both set to_dt to end of day (23:59:59.999999). The condition check serves no purpose since both paths do the same thing.

♻️ Simplify by removing the dead branch
             if to_dt:
-                if to_date in ["today", "yesterday"] or "days ago" in to_date:
-                    to_dt = to_dt.replace(
-                        hour=23, minute=59, second=59, microsecond=999999
-                    )
-                else:
-                    to_dt = to_dt.replace(
-                        hour=23, minute=59, second=59, microsecond=999999
-                    )
+                to_dt = to_dt.replace(
+                    hour=23, minute=59, second=59, microsecond=999999
+                )

528-547: Consider executemany for batch inserts.

Individual execute() calls in a loop work correctly but are less efficient for large entry counts. Using executemany() with a list of pre-serialized dictionaries would reduce Python↔SQLite round-trips.

♻️ Batch insert optimization
-            for entry in entries:
-                serialized = self._serialize_entry(entry, file_id)
-                conn.execute(
-                    """
-                    INSERT INTO messages (
-                        project_id, file_id, type, timestamp, session_id,
-                        _uuid, _parent_uuid, _is_sidechain, _user_type, _cwd, _version,
-                        _is_meta, _agent_id, _request_id,
-                        input_tokens, output_tokens, cache_creation_tokens, cache_read_tokens,
-                        _leaf_uuid, _level, _operation, content
-                    ) VALUES (
-                        :project_id, :file_id, :type, :timestamp, :session_id,
-                        :_uuid, :_parent_uuid, :_is_sidechain, :_user_type, :_cwd, :_version,
-                        :_is_meta, :_agent_id, :_request_id,
-                        :input_tokens, :output_tokens, :cache_creation_tokens, :cache_read_tokens,
-                        :_leaf_uuid, :_level, :_operation, :content
-                    )
-                    """,
-                    serialized,
-                )
+            serialized_entries = [self._serialize_entry(entry, file_id) for entry in entries]
+            conn.executemany(
+                """
+                INSERT INTO messages (
+                    project_id, file_id, type, timestamp, session_id,
+                    _uuid, _parent_uuid, _is_sidechain, _user_type, _cwd, _version,
+                    _is_meta, _agent_id, _request_id,
+                    input_tokens, output_tokens, cache_creation_tokens, cache_read_tokens,
+                    _leaf_uuid, _level, _operation, content
+                ) VALUES (
+                    :project_id, :file_id, :type, :timestamp, :session_id,
+                    :_uuid, :_parent_uuid, :_is_sidechain, :_user_type, :_cwd, :_version,
+                    :_is_meta, :_agent_id, :_request_id,
+                    :input_tokens, :output_tokens, :cache_creation_tokens, :cache_read_tokens,
+                    :_leaf_uuid, :_level, :_operation, :content
+                )
+                """,
+                serialized_entries,
+            )

1516-1517: Silent except Exception: pass may hide important errors.

Database corruption, permission errors, or schema issues will be silently ignored, making debugging difficult. Consider logging the exception or at least the exception type.

♻️ Add minimal error logging
+    import logging
+    logger = logging.getLogger(__name__)
+
     try:
         conn = sqlite3.connect(actual_db_path, timeout=30.0)
         # ... existing code ...
     except Exception:
-        pass
+        logger.debug("Failed to read cached projects from %s", actual_db_path, exc_info=True)

Or if logging isn't available, at least narrow the exception types:

-    except Exception:
-        pass
+    except (sqlite3.Error, OSError):
+        # Database unavailable or corrupted - return empty list
+        pass
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4695f79 and dd4c596.

📒 Files selected for processing (2)
  • claude_code_log/cache.py
  • test/test_cache_integration.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • claude_code_log/cache.py
  • test/test_cache_integration.py
🧠 Learnings (1)
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_cache_integration.py
🧬 Code graph analysis (1)
claude_code_log/cache.py (4)
claude_code_log/factories/transcript_factory.py (1)
  • create_transcript_entry (219-238)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (5)
  • AssistantTranscriptEntry (175-178)
  • QueueOperationTranscriptEntry (203-220)
  • SummaryTranscriptEntry (181-186)
  • SystemTranscriptEntry (189-200)
  • UserTranscriptEntry (168-172)
claude_code_log/renderer.py (2)
  • get (121-132)
  • is_html_outdated (2308-2320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.11)
🔇 Additional comments (4)
claude_code_log/cache.py (1)

1-87: Well-structured SQLite cache implementation.

The migration to SQLite with zlib compression, WAL mode, and proper foreign key constraints is a solid architectural choice. The data models are clean and the overall design supports the archival, pagination, and version compatibility features well.

test/test_cache_integration.py (3)

526-696: Excellent test coverage for archived sessions workflow.

The tests comprehensively cover the archive-restore lifecycle: caching → file deletion → archive detection → export → restore → verification. The test at line 648 (test_full_archive_and_restore_workflow) is particularly valuable as an end-to-end integration test.


1001-1059: Good coverage for get_all_cached_projects.

Tests properly verify the function handles:

  • Mixed active/archived projects with correct is_archived flags
  • Empty directories (no cache database)
  • Nonexistent directories

96-139: Clever Path-like wrapper for backward compatibility.

The ProjectWithCache class implementing __fspath__, __truediv__, and other Path-like methods allows existing tests to continue working while providing access to db_path when needed. This is a pragmatic approach to incremental migration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @claude_code_log/cache.py:
- Around line 241-266: Replace the print() in _ensure_project_exists with the
library logger and include a UTC timestamp: obtain or use the module/class
logger (e.g., self.logger or logging.getLogger(__name__)), call logger.warning
or logger.info instead of print, and prepend or format the message with
datetime.datetime.now(datetime.timezone.utc).isoformat() to produce an explicit
UTC timestamp; make the same replacement for the other print occurrences
referenced around lines 269-296 (the blocks that call
_is_cache_version_compatible, _clear_project_data, and _create_project) so all
cache-invalidation messages use the logger + UTC timestamp consistently.
- Around line 1087-1102: The get_page_size_config() implementation returns the
oldest page because the query orders by page_number ASC; update the SQL in
get_page_size_config() (and keep LIMIT 1) to order by page_number DESC so it
selects the most recent page's page_size_config, and adjust the docstring only
if you choose a different “most recent” definition (e.g., use updated_at) to
ensure the comment matches the query.
- Around line 417-477: The bug in load_cached_entries_filtered() is that
tz-aware parsed datetimes are being formatted with a "Z" without converting them
to UTC, shifting filter bounds; change the handling of from_dt and to_dt so that
if dt.tzinfo is not None you call dt = dt.astimezone(timezone.utc) (and if
dt.tzinfo is None keep the existing replace(tzinfo=timezone.utc) behavior), then
format to_bound/from_bound from that UTC-converted dt; update the branches that
set from_bound and to_bound (and ensure timezone is imported where needed) so
comparisons use true UTC "Z" timestamps.
🧹 Nitpick comments (5)
test/test_cache_integration.py (4)

25-50: Make JSONL I/O encoding consistent (avoid locale-dependent failures).
A lot of the new test scaffolding correctly isolates the DB path, but many JSONL writes still omit encoding="utf-8" (some places include it, others don’t). This can break on non-UTF-8 default locales and is hard to debug.

Proposed fix (apply pattern throughout file)
-    with open(jsonl_file, "w") as f:
+    with open(jsonl_file, "w", encoding="utf-8") as f:
         for entry in sample_jsonl_data:
             f.write(json.dumps(entry) + "\n")

Also applies to: 141-156, 214-251, 270-310, 311-353, 355-425, 443-507, 527-1060


96-140: Avoid passing ProjectWithCache where a Path is expected (or fully delegate).
ProjectWithCache is a partial Path façade; this is brittle if production code uses additional Path APIs (e.g., .resolve(), .suffix, .with_name(), etc.). Prefer consistently passing setup_test_project.project_dir into CacheManager / converter APIs, and keep the wrapper only for tests that truly need it.

Safer direction (example changes)
-        project_dir = setup_test_project
+        project_dir = setup_test_project.project_dir
...
-        cache_manager = CacheManager(project_dir, "1.0.0")
+        cache_manager = CacheManager(project_dir, "1.0.0")

Also applies to: 188-213, 252-269, 297-310, 426-442


355-425: “Performance” assertion is a no-op; consider asserting cache behavior instead.
assert second_run_time >= 0 will always pass and doesn’t validate caching. If you want a stable integration assertion, consider checking something deterministic (e.g., that cache contains files/sessions, or that “stale sessions” is empty on second run).


500-507: Catching only FileNotFoundError can hide real regressions.
If empty projects are an accepted input, assert the expected output behavior; otherwise, assert the exception. The current try/except makes it too easy for unrelated failures to slip by.

claude_code_log/cache.py (1)

124-168: Use tomllib with fallback to toml for Python 3.10 compatibility, and log failures.

Since the project supports Python 3.10+ (not 3.11+), tomllib cannot be the sole choice, but attempting it first with a fallback to the already-required toml dependency is sound. More importantly, all three exception handlers currently swallow errors silently, masking version detection failures. Add logging using the available logger to make failures visible for debugging.

Suggested fix
 def get_library_version() -> str:
     """Get the current library version from package metadata or pyproject.toml."""
     # First try to get version from installed package metadata
     try:
         from importlib.metadata import version as get_version
 
         return get_version("claude-code-log")
     except Exception:
-        # Package not installed or other error, continue to file-based detection
-        pass
+        logger.debug("Failed to detect version from package metadata.", exc_info=True)
 
     # Second approach: Use importlib.resources for more robust package location detection
     try:
         from importlib import resources
-        import toml
+        try:
+            import tomllib as toml  # py>=3.11
+        except ImportError:  # pragma: no cover
+            import toml  # type: ignore[no-redef]
 
         # Get the package directory and navigate to parent for pyproject.toml
         package_files = resources.files("claude_code_log")
         # Convert to Path to access parent reliably
         package_root = Path(str(package_files)).parent
         pyproject_path = package_root / "pyproject.toml"
 
         if pyproject_path.exists():
-            with open(pyproject_path, "r", encoding="utf-8") as f:
+            with open(pyproject_path, "rb") as f:
                 pyproject_data = toml.load(f)
             return pyproject_data.get("project", {}).get("version", "unknown")
     except Exception:
-        pass
+        logger.debug("Failed to detect version from package files.", exc_info=True)
 
     # Final fallback: Try to read from pyproject.toml using file-relative path
     try:
-        import toml
+        try:
+            import tomllib as toml  # py>=3.11
+        except ImportError:  # pragma: no cover
+            import toml  # type: ignore[no-redef]
 
         project_root = Path(__file__).parent.parent
         pyproject_path = project_root / "pyproject.toml"
 
         if pyproject_path.exists():
-            with open(pyproject_path, "r", encoding="utf-8") as f:
+            with open(pyproject_path, "rb") as f:
                 pyproject_data = toml.load(f)
             return pyproject_data.get("project", {}).get("version", "unknown")
     except Exception:
-        pass
+        logger.debug("Failed to detect version from file-relative path.", exc_info=True)
 
     return "unknown"
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd4c596 and f8a074a.

📒 Files selected for processing (2)
  • claude_code_log/cache.py
  • test/test_cache_integration.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • test/test_cache_integration.py
  • claude_code_log/cache.py
🧠 Learnings (1)
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_cache_integration.py
🧬 Code graph analysis (2)
test/test_cache_integration.py (1)
claude_code_log/cache.py (6)
  • get_cached_project_data (661-733)
  • export_session_to_jsonl (1037-1058)
  • load_session_entries (1060-1083)
  • delete_session (1385-1442)
  • is_file_cached (377-398)
  • get_all_cached_projects (1462-1516)
claude_code_log/cache.py (4)
claude_code_log/factories/transcript_factory.py (1)
  • create_transcript_entry (219-238)
claude_code_log/migrations/runner.py (1)
  • run_migrations (135-156)
claude_code_log/models.py (4)
  • AssistantTranscriptEntry (175-178)
  • SummaryTranscriptEntry (181-186)
  • SystemTranscriptEntry (189-200)
  • UserTranscriptEntry (168-172)
claude_code_log/renderer.py (2)
  • get (121-132)
  • is_html_outdated (2308-2320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.14)

Comment on lines +241 to +266
def _ensure_project_exists(self) -> None:
"""Ensure project record exists and get its ID."""
project_path_str = str(self.project_path)

with self._get_connection() as conn:
row = conn.execute(
"SELECT id, version FROM projects WHERE project_path = ?",
(project_path_str,),
).fetchone()

if row:
self._project_id = row["id"]
cached_version = row["version"]

# Check version compatibility
if not self._is_cache_version_compatible(cached_version):
print(
f"Cache version incompatible: {self._project_cache.version} -> {self.library_version}, invalidating cache"
f"Cache version incompatible: {cached_version} -> {self.library_version}, invalidating cache"
)
self.clear_cache()
self._project_cache = None
except Exception as e:
print(f"Warning: Failed to load cache index, will rebuild: {e}")
self._project_cache = None

# Initialize empty cache if none exists
if self._project_cache is None:
self._project_cache = ProjectCache(
version=self.library_version,
cache_created=datetime.now().isoformat(),
last_updated=datetime.now().isoformat(),
project_path=str(self.project_path),
cached_files={},
sessions={},
)
self._clear_project_data(conn)
self._project_id = self._create_project(conn)
else:
self._project_id = self._create_project(conn)

conn.commit()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid print() in library code; use logger and write UTC timestamps.
print() in _ensure_project_exists() will pollute CLI/TUI output and is hard to test. Also, timestamps are currently local-time datetime.now().isoformat(); using UTC avoids cross-machine weirdness.

Proposed fix
-        now = datetime.now().isoformat()
+        now = datetime.now(timezone.utc).isoformat()
@@
-                    print(
-                        f"Cache version incompatible: {cached_version} -> {self.library_version}, invalidating cache"
-                    )
+                    logger.warning(
+                        "Cache version incompatible: %s -> %s; invalidating cache",
+                        cached_version,
+                        self.library_version,
+                    )

Also applies to: 269-296

🤖 Prompt for AI Agents
In @claude_code_log/cache.py around lines 241 - 266, Replace the print() in
_ensure_project_exists with the library logger and include a UTC timestamp:
obtain or use the module/class logger (e.g., self.logger or
logging.getLogger(__name__)), call logger.warning or logger.info instead of
print, and prepend or format the message with
datetime.datetime.now(datetime.timezone.utc).isoformat() to produce an explicit
UTC timestamp; make the same replacement for the other print occurrences
referenced around lines 269-296 (the blocks that call
_is_cache_version_compatible, _clear_project_data, and _create_project) so all
cache-invalidation messages use the logger + UTC timestamp consistently.

Comment on lines +359 to +363
def _deserialize_entry(self, row: sqlite3.Row) -> TranscriptEntry:
"""Convert SQLite row back to TranscriptEntry."""
content_dict = json.loads(zlib.decompress(row["content"]).decode("utf-8"))
return create_transcript_entry(content_dict)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle corrupted content rows (zlib/json) so partial corruption doesn’t crash reads.
SQLite can stay readable even if individual BLOBs are corrupted; _deserialize_entry() will currently raise and fail the whole operation. Consider treating failures as a cache-miss signal (and possibly invalidating the project).

Proposed fix (local hardening)
 def _deserialize_entry(self, row: sqlite3.Row) -> TranscriptEntry:
     """Convert SQLite row back to TranscriptEntry."""
-    content_dict = json.loads(zlib.decompress(row["content"]).decode("utf-8"))
-    return create_transcript_entry(content_dict)
+    try:
+        content_dict = json.loads(zlib.decompress(row["content"]).decode("utf-8"))
+        return create_transcript_entry(content_dict)
+    except (zlib.error, UnicodeDecodeError, json.JSONDecodeError, ValueError) as e:
+        raise sqlite3.DatabaseError(f"Corrupted cached message content: {e}") from e

Also applies to: 400-477, 1037-1084

Comment on lines 417 to +477
def load_cached_entries_filtered(
self, jsonl_path: Path, from_date: Optional[str], to_date: Optional[str]
) -> Optional[list[TranscriptEntry]]:
"""Load cached entries with efficient timestamp-based filtering."""
) -> Optional[List[TranscriptEntry]]:
"""Load cached entries with SQL-based timestamp filtering."""
if not self.is_file_cached(jsonl_path):
return None

cache_file = self._get_cache_file_path(jsonl_path)
try:
with open(cache_file, "r", encoding="utf-8") as f:
cache_data = json.load(f)
# If no date filtering needed, fall back to regular loading
if not from_date and not to_date:
return self.load_cached_entries(jsonl_path)

# If no date filtering needed, fall back to regular loading
if not from_date and not to_date:
return self.load_cached_entries(jsonl_path)
file_id = self._get_file_id(jsonl_path)
if file_id is None:
return None

# Parse date filters
from .parser import parse_timestamp
import dateparser
# Parse dates
import dateparser

from_dt = None
to_dt = None

if from_date:
from_dt = dateparser.parse(from_date)
if from_dt and (
from_date in ["today", "yesterday"] or "days ago" in from_date
):
from_dt = from_dt.replace(hour=0, minute=0, second=0, microsecond=0)

if to_date:
to_dt = dateparser.parse(to_date)
if to_dt:
to_dt = to_dt.replace(hour=23, minute=59, second=59, microsecond=999999)

# Build query with SQL-based filtering
sql = "SELECT content FROM messages WHERE file_id = ?"
params: List[Any] = [file_id]

if from_dt:
# Normalize to UTC 'Z' format for consistent string comparison
# with stored timestamps (which use 'Z' suffix from JSONL)
if from_dt.tzinfo is None:
from_dt = from_dt.replace(tzinfo=timezone.utc)
from_bound = from_dt.strftime("%Y-%m-%dT%H:%M:%SZ")
# Include entries with NULL timestamp (like summaries) OR within date range
sql += " AND (timestamp IS NULL OR timestamp >= ?)"
params.append(from_bound)

if to_dt:
# Normalize to UTC 'Z' format for consistent string comparison
if to_dt.tzinfo is None:
to_dt = to_dt.replace(tzinfo=timezone.utc)
to_bound = to_dt.strftime("%Y-%m-%dT%H:%M:%SZ")
sql += " AND (timestamp IS NULL OR timestamp <= ?)"
params.append(to_bound)

sql += " ORDER BY timestamp NULLS LAST"

with self._get_connection() as conn:
rows = conn.execute(sql, params).fetchall()

return [self._deserialize_entry(row) for row in rows]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Timezone bounds are wrong for tz-aware inputs in load_cached_entries_filtered().
If dateparser.parse() returns a tz-aware datetime (e.g., user passes “2023-01-01 10:00 PST”), the code formats it as ...Z without converting to UTC, which shifts the effective filter window.

Proposed fix
         if from_dt:
             # Normalize to UTC 'Z' format for consistent string comparison
             # with stored timestamps (which use 'Z' suffix from JSONL)
-            if from_dt.tzinfo is None:
-                from_dt = from_dt.replace(tzinfo=timezone.utc)
+            if from_dt.tzinfo is None:
+                from_dt = from_dt.replace(tzinfo=timezone.utc)
+            else:
+                from_dt = from_dt.astimezone(timezone.utc)
             from_bound = from_dt.strftime("%Y-%m-%dT%H:%M:%SZ")
@@
         if to_dt:
             # Normalize to UTC 'Z' format for consistent string comparison
-            if to_dt.tzinfo is None:
-                to_dt = to_dt.replace(tzinfo=timezone.utc)
+            if to_dt.tzinfo is None:
+                to_dt = to_dt.replace(tzinfo=timezone.utc)
+            else:
+                to_dt = to_dt.astimezone(timezone.utc)
             to_bound = to_dt.strftime("%Y-%m-%dT%H:%M:%SZ")
What `dateparser` version is used, and does the project expect `--from-date/--to-date` values with explicit timezones (e.g., “PST”)?
🤖 Prompt for AI Agents
In @claude_code_log/cache.py around lines 417 - 477, The bug in
load_cached_entries_filtered() is that tz-aware parsed datetimes are being
formatted with a "Z" without converting them to UTC, shifting filter bounds;
change the handling of from_dt and to_dt so that if dt.tzinfo is not None you
call dt = dt.astimezone(timezone.utc) (and if dt.tzinfo is None keep the
existing replace(tzinfo=timezone.utc) behavior), then format to_bound/from_bound
from that UTC-converted dt; update the branches that set from_bound and to_bound
(and ensure timezone is imported where needed) so comparisons use true UTC "Z"
timestamps.

Comment on lines +1087 to +1102
def get_page_size_config(self) -> Optional[int]:
"""Get the configured page size from the most recent page, if any."""
if self._project_id is None:
return None

with self._get_connection() as conn:
row = conn.execute(
"""SELECT page_size_config FROM html_pages
WHERE project_id = ?
ORDER BY page_number ASC
LIMIT 1""",
(self._project_id,),
).fetchone()

return row["page_size_config"] if row else None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_page_size_config() returns the oldest page, not the “most recent”.
The query orders page_number ASC but the docstring says “most recent page”.

Proposed fix
-                """SELECT page_size_config FROM html_pages
-                   WHERE project_id = ?
-                   ORDER BY page_number ASC
-                   LIMIT 1""",
+                """SELECT page_size_config FROM html_pages
+                   WHERE project_id = ?
+                   ORDER BY page_number DESC
+                   LIMIT 1""",
🤖 Prompt for AI Agents
In @claude_code_log/cache.py around lines 1087 - 1102, The
get_page_size_config() implementation returns the oldest page because the query
orders by page_number ASC; update the SQL in get_page_size_config() (and keep
LIMIT 1) to order by page_number DESC so it selects the most recent page's
page_size_config, and adjust the docstring only if you choose a different “most
recent” definition (e.g., use updated_at) to ensure the comment matches the
query.

Comment on lines +1385 to +1443
def delete_session(self, session_id: str) -> bool:
"""Delete a session and its messages from cache.
Args:
session_id: The session ID to delete
Returns:
True if session was deleted, False if not found
"""
if self._project_id is None:
return False

with self._get_connection() as conn:
# Check if session exists
row = conn.execute(
"SELECT id FROM sessions WHERE project_id = ? AND session_id = ?",
(self._project_id, session_id),
).fetchone()

if not row:
return False

# Delete messages for this session
conn.execute(
"DELETE FROM messages WHERE project_id = ? AND session_id = ?",
(self._project_id, session_id),
)

# Delete HTML cache entries for this session
conn.execute(
"DELETE FROM html_cache WHERE project_id = ? AND source_session_id = ?",
(self._project_id, session_id),
)

# Delete page_sessions entries referencing this session
conn.execute(
"""DELETE FROM page_sessions WHERE session_id = ?
AND page_id IN (SELECT id FROM html_pages WHERE project_id = ?)""",
(session_id, self._project_id),
)

# Delete cached_files entry for this session's JSONL file
# File name pattern is {session_id}.jsonl
conn.execute(
"DELETE FROM cached_files WHERE project_id = ? AND file_name = ?",
(self._project_id, f"{session_id}.jsonl"),
)

# Delete the session record
conn.execute(
"DELETE FROM sessions WHERE project_id = ? AND session_id = ?",
(self._project_id, session_id),
)

self._update_last_updated(conn)
conn.commit()

return True

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Checking migrations for messages table structure ==="
find . -type f -name "*.py" -path "*/migrations*" | head -20

echo -e "\n=== 2) Searching for JSONL file naming patterns ==="
rg -n '{session_id}\.jsonl|f"\{session_id\}\.jsonl"|session_id.*\.jsonl' --type py -A 2 -B 2 | head -50

echo -e "\n=== 3) Checking where cached_files table is used ==="
rg -n 'cached_files' --type py -B 2 -A 2 | head -80

echo -e "\n=== 4) Searching for session file creation/writing logic ==="
rg -n 'write.*jsonl|jsonl.*write|\.jsonl' --type py -B 2 -A 2 | head -100

echo -e "\n=== 5) Checking file structure of cache.py around session methods ==="
rg -n 'def.*session|JSONL|jsonl' claude_code_log/cache.py -B 1 -A 1 | head -50

Repository: daaain/claude-code-log

Length of output: 415


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Checking migration files ==="
cat claude_code_log/migrations/runner.py | head -100

echo -e "\n=== 2) Searching for jsonl file naming (fixed regex) ==="
rg -n 'jsonl' claude_code_log/cache.py -B 2 -A 2

echo -e "\n=== 3) Finding where session files are created/saved ==="
rg -n 'session_id.*\.jsonl|\.jsonl.*session' --type py | head -30

echo -e "\n=== 4) Check cached_files creation/insertion ==="
rg -n 'INSERT.*cached_files|cached_files.*INSERT' --type py -B 3 -A 3 | head -60

echo -e "\n=== 5) Search for any session file path construction ==="
rg -n 'session.*path|path.*session|file.*name.*session' --type py | head -40

echo -e "\n=== 6) Check if there are tests for delete_session ==="
rg -n 'delete_session|test.*session' --type py | head -30

Repository: daaain/claude-code-log

Length of output: 25875


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Check if delete_session also deletes the actual JSONL file ==="
rg -n 'def delete_session' claude_code_log/cache.py -A 80 | head -100

echo -e "\n=== 2) Search for where delete_session is called ==="
rg -n 'delete_session\(' --type py | head -30

echo -e "\n=== 3) Check for tests specifically testing delete_session ==="
rg -n 'test.*delete.*session|delete_session' test/ --type py -B 2 -A 10 | head -80

echo -e "\n=== 4) Look for file deletion in relation to session deletion ==="
rg -n 'unlink\|remove\|Path.*session.*delete' --type py | head -40

echo -e "\n=== 5) Check if there's a specific pattern for managing session JSONL file lifecycle ==="
rg -n 'session.*\.jsonl.*delete|delete.*session.*file' --type py -B 2 -A 2 | head -40

Repository: daaain/claude-code-log

Length of output: 8961


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Look at tui.py delete_session action to see full context ==="
rg -n 'def action_delete_session' claude_code_log/tui.py -A 50 | head -70

echo -e "\n=== 2) Check if JSONL files are ever created with names other than {session_id}.jsonl ==="
rg -n 'write.*text\|\.write_text' claude_code_log/cache.py | grep -i jsonl || echo "No matches in cache.py"

echo -e "\n=== 3) Search for any session file with different naming ==="
rg -n '\.jsonl' claude_code_log/ --type py | grep -v 'f"{session_id}.jsonl' | grep -v 'jsonl_path' | head -20

echo -e "\n=== 4) Look at migration files to confirm file_name column in cached_files ==="
find claude_code_log/migrations -name "*.sql" -type f -exec cat {} \; | head -200

echo -e "\n=== 5) Check if there are any unique constraints on cached_files ==="
rg -n 'UNIQUE\|PRIMARY KEY' claude_code_log/migrations/ -A 2 -B 2 | head -50

Repository: daaain/claude-code-log

Length of output: 11006


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Check if cached_files entries are only created for session JSONL files ==="
rg -n 'INSERT INTO cached_files|save_cached_entries' claude_code_log/cache.py -B 5 -A 10 | head -50

echo -e "\n=== 2) Check for any agent file handling in cache.py ==="
rg -n 'agent' claude_code_log/cache.py | head -20

echo -e "\n=== 3) Verify the UNIQUE constraint on cached_files is enforced ==="
rg -n 'UNIQUE.*project_id.*file_name' claude_code_log/migrations/ -B 5 -A 5

echo -e "\n=== 4) Check tests for delete_session to see expected behavior ==="
rg -n 'def test.*delete.*session' test/test_cache_integration.py -A 30 | head -100

echo -e "\n=== 5) Verify if a session can map to multiple cached_files ==="
rg -n 'cached_files.*session|session.*cached_files' claude_code_log/cache.py -B 2 -A 2 | head -40

Repository: daaain/claude-code-log

Length of output: 7985


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Check if sessions table has any reference to source file ==="
rg -n 'sessions.*table|CREATE TABLE.*sessions' claude_code_log/migrations/ -A 30 | head -50

echo -e "\n=== 2) Check how session_id is extracted from JSONL files ==="
rg -n 'sessionId|session_id' claude_code_log/cache.py -B 2 -A 2 | grep -A 3 -B 3 'extract\|parse' | head -40

echo -e "\n=== 3) Verify that a JSONL file name is always {session_id}.jsonl ==="
rg -n 'def.*session|session_id.*=' claude_code_log/converter.py | head -30

echo -e "\n=== 4) Check if there's a way to query cached_files by session_id ==="
rg -n 'session_id.*cached_files|cached_files.*session_id' claude_code_log/cache.py -B 2 -A 2

echo -e "\n=== 5) Look at how messages are linked to files in database ==="
rg -n 'file_id|file_name.*session' claude_code_log/cache.py | head -30

Repository: daaain/claude-code-log

Length of output: 4208


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Check the relationship between messages, sessions, and cached_files ==="
rg -n 'messages.*session_id|session_id.*messages' claude_code_log/migrations/ -B 5 -A 5 | head -60

echo -e "\n=== 2) Can a single session_id have messages from different file_ids? ==="
rg -n 'file_id.*session_id|session_id.*file_id' claude_code_log/migrations/ -B 3 -A 3

echo -e "\n=== 3) Look at how converter creates sessions and maps to files ==="
rg -n 'add_session|update_session' claude_code_log/converter.py -B 2 -A 8 | head -50

echo -e "\n=== 4) Verify if file_name in cached_files is always {session_id}.jsonl ==="
cat claude_code_log/cache.py | sed -n '479,520p'

echo -e "\n=== 5) Check if a single session can be stored across multiple JSONL files ==="
rg -n 'session.*append|multiple.*session|session.*merge' claude_code_log/ --type py | head -20

Repository: daaain/claude-code-log

Length of output: 998


🏁 Script executed:

#!/bin/bash
set -euoplicauset -euo pipefail

echo "=== 1) Check if save_cached_entries validates filename matches session_id ==="
rg -n 'def save_cached_entries' claude_code_log/cache.py -A 40 | head -50

echo -e "\n=== 2) Search for any logic that creates JSONL files with custom names ==="
rg -n 'Path.*\.jsonl|\.jsonl.*Path' claude_code_log/converter.py -B 2 -A 2 | head -40

echo -e "\n=== 3) Check if there's any code that could generate sessions from multiple files ==="
rg -n 'session.*aggregate|aggregate.*session|merge.*session' claude_code_log/ --type py | head -20

echo -e "\n=== 4) Verify that a session ID always comes from the file being processed ==="
cat claude_code_log/converter.py | sed -n '1110,1160p'

echo -e "\n=== 5) Check the relationship between file processing and session creation ==="
rg -n 'def load_transcript|def process.*jsonl' claude_code_log/converter.py -A 5 | head -40

Repository: daaain/claude-code-log

Length of output: 740


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Check if save_cached_entries validates filename matches session_id ==="
rg -n 'def save_cached_entries' claude_code_log/cache.py -A 40 | head -50

echo -e "\n=== 2) Search for any logic that creates JSONL files with custom names ==="
rg -n 'Path.*jsonl|jsonl.*Path' claude_code_log/converter.py -B 2 -A 2 | head -40

echo -e "\n=== 3) Check if there's any code that could generate sessions from multiple files ==="
rg -n 'session.*aggregate|aggregate.*session|merge.*session' claude_code_log/ --type py | head -20

echo -e "\n=== 4) Verify relationship: one JSONL file = one session ==="
sed -n '1110,1160p' claude_code_log/converter.py

echo -e "\n=== 5) Check the key assumption: file_name column in cached_files stores just filename ==="
sed -n '479,510p' claude_code_log/cache.py

Repository: daaain/claude-code-log

Length of output: 5401


delete_session() assumes {session_id}.jsonl file naming; verify this contract is validated.

The code at lines 1426-1431 deletes the cached_files entry by matching f"{session_id}.jsonl", but there is no validation that this file_name actually exists in the cached_files table or corresponds to the session being deleted. If a JSONL file is named differently, the cached_files entry will silently remain, leaving stale cache. While the entire codebase consistently uses the {session_id}.jsonl naming pattern, consider adding explicit validation in delete_session() to confirm the cached_files entry exists before attempting deletion, or use a more robust approach (e.g., query cached_files for the session's actual messages and delete via file_id).

@daaain
Copy link
Owner Author

daaain commented Jan 12, 2026

I'm pretty happy with this branch now @cboos, added a lot more tests and also addressed all your feedback from the archive management PR, here, and even some bits from your scratchpad.

Basically, there's full archive management, more intuitive TUI (with esc / enter navigation, etc), cache file having a unique name by default and supporting setting an alternative path outside of ~/.claude (which was also super useful for better testing!), compressed JSON data to halve disk space usage, etc.

I think this is also going to be a great foundation for more advanced functionality that needs fast access to the logs in any shape and form!

@cboos
Copy link
Collaborator

cboos commented Jan 13, 2026

I'm pretty happy with this branch now @cboos, added a lot more tests and also addressed all your feedback from the archive management PR, here, and even some bits from your scratchpad. [...]

Sounds great! I can't wait to test it... but I'll probably have to wait for this WE ;-) Feel free to merge in the meantime.
You might even want to cut a 0.10 release, but then with a caveat that Markdown preview in the TUI is "experimental" (very slow on large sessions, crash when following the "Back to combined sessions" link, no toggle for the outline...).
..

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
claude_code_log/tui.py (5)

108-116: Clamp column widths to avoid negative/invalid widths on small terminals.
self.size.width - 13 can go <= 0.

Proposed fix
@@
-        table.add_column("Project", width=self.size.width - 13)
+        table.add_column("Project", width=max(20, self.size.width - 13))
         table.add_column("Sessions", width=10)

1507-1519: format_timestamp() “short” vs “long” format currently identical.
Both branches return "%m-%d %H:%M", so time_width = 16 doesn’t buy you anything.

Proposed fix
@@
-            elif short_format:
-                return dt.strftime("%m-%d %H:%M")
-            else:
-                return dt.strftime("%m-%d %H:%M")
+            elif short_format:
+                return dt.strftime("%m-%d %H:%M")
+            else:
+                return dt.strftime("%Y-%m-%d %H:%M")

1728-1803: Optimize session message loading for current sessions to avoid loading all messages

For current sessions, load_directory_transcripts() loads all sessions from all .jsonl files, which renderer.generate_session() then filters by session_id internally. While correctness is preserved through the renderer's filtering, this is inefficient when exporting a single session. Consider loading only the specific session's file (e.g., {project_path}/{session_id}.jsonl) for current sessions, similar to how archived sessions load from the cache manager by ID.


108-203: Fix ambiguous project selection (directory name collisions can select the wrong project).

The current implementation resolves row selection by parsing and matching against project_path.name, which fails when multiple projects have the same directory name (e.g., /path/a/project and /path/b/project). The first collision match would always be selected, making duplicate-named projects unselectable.

Replace the text-parsing approach with direct row-index-to-list-index mapping. Since populate_table() adds rows sequentially without sorting or filtering, the table row order is always guaranteed to match self.projects order. This eliminates parsing overhead and collision risk entirely.

Proposed fix
     def _update_selected_project_from_cursor(self) -> None:
         """Update the selected project based on the current cursor position."""
         try:
             table = cast(DataTable[str], self.query_one("#projects-table", DataTable))
-            row_data = table.get_row_at(table.cursor_row)
-            if row_data:
-                # Extract project display from the first column
-                project_display = str(row_data[0]).strip()
-
-                # Remove the arrow indicator if present
-                if project_display.startswith("→"):
-                    project_display = project_display[1:].strip()
-
-                # Remove the archived indicator if present
-                if project_display.endswith(" [ARCHIVED]"):
-                    project_display = project_display[:-11].strip()
-
-                # Find the matching project path
-                for project_path in self.projects:
-                    if project_path.name == project_display:
-                        self.selected_project_path = project_path
-                        break
+            row = table.cursor_row
+            if 0 <= row < len(self.projects):
+                self.selected_project_path = self.projects[row]
         except Exception:
             # If widget not mounted yet or we can't get the row data, don't update selection
             pass

1362-1547: Use Textual row keys to track full session IDs and eliminate prefix-matching ambiguity.

Prefix-matching with startswith() can select the wrong session if session IDs share the same 8-character prefix. Additionally, the loop through session dictionaries on every cursor movement adds unnecessary O(n) complexity.

Pass the full session_id as the row key when calling table.add_row(...):

table.add_row(
    session_id[:8],
    preview,
    start_time,
    end_time,
    str(session_data.message_count),
    token_display,
    key=session_id,  # Use full session ID as row key
)

Then retrieve it in _update_selected_session_from_cursor() using the Textual API instead of scanning:

row_key = table.get_row_key(table.cursor_row)
self.selected_session_id = str(row_key)
claude_code_log/renderer.py (1)

1491-1557: Harden _filter_messages() against string content for consistency with _render_messages().

While message.message.content should always be list[ContentItem] for user/assistant messages after factory processing (via create_message_content()), iterating a string as a list would cause characters to be processed, breaking has_tool_items detection and extract_text_content() behavior. Apply the same defensive pattern already used in _render_messages():

Proposed fix
@@
-        else:
-            message_content = message.message.content
+        else:
+            raw_content = message.message.content
+            message_content = raw_content if isinstance(raw_content, list) else []
@@
-        text_content = extract_text_content(message_content)
+        text_content = (
+            extract_text_content(message_content)
+            if message_content
+            else (str(raw_content) if raw_content is not None else "")
+        )
🤖 Fix all issues with AI agents
In @claude_code_log/tui.py:
- Around line 571-619: The _split_into_pages function currently uses a capturing
split re.split(r"(\n(?=## ))", ...) which yields delimiter-only chunks and
causes odd boundaries; replace that with a multiline zero-width split like
re.split(r"(?m)(?=^##\s)', content) or otherwise split on a non-capturing
lookahead so you don't get standalone "\n" tokens, and ensure you trim or skip
empty tokens and preserve section headings when building pages (handle oversized
sections by splitting on lines as you already do). Also verify and update usages
of viewer.document.update to use the supported public Textual API for the
installed version (e.g., viewer.update or recreating the widget / calling the
documented method) rather than directly mutating viewer.document; locate
references to viewer.document.update in the same module and swap to the public
method or widget-recreation approach if Textual v6.5.0 requires it.

In @test/test_tui.py:
- Around line 1690-1920: Several tests call Path.write_text(...) to create JSONL
files without explicitly setting encoding; update those write_text calls in
tests like test_archive_project_action, test_delete_project_cache_only,
test_delete_project_both, test_restore_project_not_archived_shows_warning and
other occurrences in test_tui.py to pass encoding="utf-8" (e.g.,
Path(...).write_text('{"type":"user"}\n', encoding="utf-8")) so file writes are
consistent and explicit across ProjectSelector-related tests.
🧹 Nitpick comments (3)
test/test_message_types.py (1)

176-187: Consider verifying no warning is printed in silent mode.

The test validates that silent=True returns an empty list, but doesn't verify that no warning message was printed. This would strengthen the test coverage for the silent behavior.

♻️ Suggested enhancement
-def test_load_transcript_missing_file_silent_mode():
+def test_load_transcript_missing_file_silent_mode(capsys: CaptureFixture[str]):
     """Test that load_transcript handles missing files in silent mode."""
     nonexistent_file = Path("/tmp/nonexistent-session-xyz789.jsonl")
     # Ensure it doesn't exist
     if nonexistent_file.exists():
         nonexistent_file.unlink()

     # Should return empty list without printing
     messages = load_transcript(nonexistent_file, silent=True)
     assert messages == [], f"Expected empty list, got {messages}"

+    # Should NOT print any warning
+    captured = capsys.readouterr()
+    assert "Warning" not in captured.out, "Silent mode should not print warnings"
+
     print("✓ Test passed: Missing file in silent mode returns empty list")
claude_code_log/tui.py (1)

254-317: File operations: consider atomic JSONL restore + avoid “disappearing project” when file deletion fails.

  • Restore writes directly to the final *.jsonl path; a crash/exception can leave a partial file.
  • In delete-project (result == "both"), if some JSONL deletions fail you still remove the project from the list, which can hide a still-present project directory from the UI.

Suggested tweaks:

  • Write to *.jsonl.tmp then Path.replace() to make restore atomic.
  • If any JSONL delete fails, consider keeping the project in self.projects (or re-scan) so the UI reflects reality.

Also applies to: 333-382, 399-466

test/test_tui.py (1)

150-277: Tests may be flaky due to fixed pause(0.1/0.2/1.0) timings; prefer waiting on observable UI state.
Where possible, wait for the table to have expected row_count, or for a widget to exist, instead of sleeping.

(Not sure what the best “wait for” primitive is in Textual v6.5.0 test pilot—confirm available helpers and I can suggest concrete replacements.)

Also applies to: 326-377, 585-595

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8a074a and 4ef769f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • claude_code_log/converter.py
  • claude_code_log/factories/user_factory.py
  • claude_code_log/html/renderer_code.py
  • claude_code_log/html/tool_formatters.py
  • claude_code_log/html/utils.py
  • claude_code_log/renderer.py
  • claude_code_log/tui.py
  • pyproject.toml
  • stubs/pygments/__init__.pyi
  • stubs/pygments/formatter.pyi
  • stubs/pygments/formatters/__init__.pyi
  • stubs/pygments/lexer.pyi
  • stubs/pygments/lexers/__init__.pyi
  • stubs/pygments/util.pyi
  • test/test_cache.py
  • test/test_index_timezone.py
  • test/test_message_types.py
  • test/test_tui.py
✅ Files skipped from review due to trivial changes (1)
  • stubs/pygments/lexer.pyi
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use -n auto flag for parallel test execution with pytest in Python projects (e.g., uv run pytest -n auto -m "not (tui or browser)" -v)
Use ruff format for code formatting in Python projects
Use ruff check --fix for linting and automatic fixes in Python projects
Use uv run pyright or uv run ty check for type checking in Python projects

Files:

  • claude_code_log/factories/user_factory.py
  • test/test_tui.py
  • claude_code_log/html/tool_formatters.py
  • test/test_cache.py
  • claude_code_log/html/utils.py
  • claude_code_log/converter.py
  • claude_code_log/renderer.py
  • test/test_index_timezone.py
  • claude_code_log/tui.py
  • claude_code_log/html/renderer_code.py
  • test/test_message_types.py
claude_code_log/renderer.py

📄 CodeRabbit inference engine (CLAUDE.md)

When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Files:

  • claude_code_log/renderer.py
🧠 Learnings (5)
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Implement interactive TUI (Terminal User Interface) for browsing Claude Code sessions with real-time navigation, summaries, and quick actions

Applied to files:

  • test/test_tui.py
  • claude_code_log/tui.py
📚 Learning: 2025-12-09T23:52:47.578Z
Learnt from: daaain
Repo: daaain/claude-code-log PR: 59
File: test/test_cache.py:135-165
Timestamp: 2025-12-09T23:52:47.578Z
Learning: SQLite supports NULLS FIRST and NULLS LAST in ORDER BY since v3.30.0 (Oct 2019). Do not flag SQL that uses these clauses as an error when reviewing Python tests or code that interacts with SQLite. If reviewing SQL strings, verify the target SQLite version supports NULLS FIRST/LAST and ensure the syntax is used correctly for the database in use.

Applied to files:

  • test/test_tui.py
  • test/test_cache.py
  • test/test_index_timezone.py
  • test/test_message_types.py
📚 Learning: 2025-11-30T23:24:07.840Z
Learnt from: cboos
Repo: daaain/claude-code-log PR: 54
File: claude_code_log/renderer.py:2912-2945
Timestamp: 2025-11-30T23:24:07.840Z
Learning: In claude_code_log/renderer.py, the agentId field is currently only set on Task tool_result messages, not on tool_use messages, because the agentId is generated after the tool_use is logged. The _reorder_sidechain_template_messages function relies on this to avoid duplicate sidechain insertion.

Applied to files:

  • claude_code_log/converter.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/templates/components/timeline.html : When adding new message types or modifying CSS class generation in renderer.py, ensure the timeline's message type detection logic in timeline.html is updated accordingly to maintain feature parity

Applied to files:

  • claude_code_log/renderer.py
  • claude_code_log/html/renderer_code.py
📚 Learning: 2025-12-31T16:37:39.494Z
Learnt from: CR
Repo: daaain/claude-code-log PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-31T16:37:39.494Z
Learning: Applies to claude_code_log/renderer.py : When adding new message types or modifying CSS class generation, ensure the timeline's message type detection logic is updated accordingly and that the filter is applied consistently to messages in both the main transcript and timeline

Applied to files:

  • claude_code_log/renderer.py
  • claude_code_log/html/renderer_code.py
🧬 Code graph analysis (9)
stubs/pygments/formatters/__init__.pyi (1)
stubs/pygments/formatter.pyi (1)
  • Formatter (5-7)
claude_code_log/factories/user_factory.py (1)
claude_code_log/models.py (1)
  • ImageContent (70-78)
stubs/pygments/lexers/__init__.pyi (1)
stubs/pygments/lexer.pyi (1)
  • Lexer (5-7)
claude_code_log/html/tool_formatters.py (1)
claude_code_log/html/utils.py (1)
  • escape_html (191-198)
test/test_cache.py (1)
claude_code_log/cache.py (5)
  • is_file_cached (377-398)
  • get_cached_project_data (661-733)
  • load_cached_entries_filtered (417-477)
  • CacheManager (193-1459)
  • get_all_cached_projects (1462-1516)
claude_code_log/html/utils.py (7)
stubs/pygments/__init__.pyi (1)
  • highlight (5-5)
stubs/pygments/lexer.pyi (1)
  • Lexer (5-7)
stubs/pygments/lexers/__init__.pyi (2)
  • get_lexer_by_name (11-11)
  • TextLexer (7-9)
stubs/pygments/formatter.pyi (1)
  • Formatter (5-7)
stubs/pygments/formatters/__init__.pyi (1)
  • HtmlFormatter (7-16)
stubs/pygments/util.pyi (1)
  • ClassNotFound (3-6)
claude_code_log/renderer_timings.py (1)
  • timing_stat (88-110)
claude_code_log/renderer.py (1)
claude_code_log/html/renderer.py (1)
  • is_outdated (587-598)
claude_code_log/html/renderer_code.py (5)
stubs/pygments/__init__.pyi (1)
  • highlight (5-5)
stubs/pygments/lexer.pyi (1)
  • Lexer (5-7)
stubs/pygments/formatter.pyi (1)
  • Formatter (5-7)
stubs/pygments/formatters/__init__.pyi (1)
  • HtmlFormatter (7-16)
stubs/pygments/util.pyi (1)
  • ClassNotFound (3-6)
test/test_message_types.py (1)
claude_code_log/converter.py (1)
  • load_transcript (116-293)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (ubuntu-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.12)
  • GitHub Check: test (windows-latest, 3.13)
  • GitHub Check: test (windows-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (windows-latest, 3.14)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (ubuntu-latest, 3.14)
  • GitHub Check: test (windows-latest, 3.10)
  • GitHub Check: test (ubuntu-latest, 3.10)
🔇 Additional comments (47)
test/test_index_timezone.py (1)

14-14: LGTM! Type ignore cleanup is appropriate.

Removing type: ignore[call-non-callable] comments from pytest.skip() calls is correct cleanup. Modern pytest type stubs properly define skip as a callable, so these suppressions are no longer needed.

Also applies to: 26-26, 71-71, 85-85

stubs/pygments/lexers/__init__.pyi (1)

1-20: LGTM!

The type stubs correctly define TextLexer as extending Lexer, and the function signatures for get_lexer_by_name and get_all_lexers accurately reflect the pygments API. The return type for get_all_lexers properly captures the 4-tuple structure (name, aliases, patterns, mimetypes).

stubs/pygments/util.pyi (1)

1-6: LGTM!

The ClassNotFound exception stub correctly inherits from Exception and provides adequate typing for error handling in code that catches this pygments-specific exception.

stubs/pygments/formatter.pyi (1)

1-7: LGTM!

The base Formatter class stub is correctly defined with the **options pattern matching the pygments API, and properly serves as the parent class for HtmlFormatter.

stubs/pygments/__init__.pyi (1)

1-5: LGTM!

The highlight function stub correctly captures the core API. Using Any for lexer/formatter parameters is acceptable for minimal stubs, though you could optionally use Lexer and Formatter types for stricter checking if desired.

stubs/pygments/formatters/__init__.pyi (1)

1-16: LGTM!

The HtmlFormatter stub provides excellent typing coverage for the commonly used parameters. The linenos: bool | Literal["table", "inline"] union type correctly captures that this parameter accepts boolean values or specific string literals, which matches the pygments API behavior.

pyproject.toml (2)

57-59: LGTM! Consistent with existing Pyright configuration.

The extra-paths = ["stubs"] setting aligns with the Pyright stubPath = "stubs" configuration on line 65, ensuring both type checkers use the same custom stubs for untyped libraries.


119-119: The ty>=0.0.11 constraint is valid and current.

Version 0.0.11 is confirmed to exist on PyPI and is the latest available version. No changes needed.

claude_code_log/html/tool_formatters.py (2)

545-545: LGTM — Unicode handling preserved.

The ensure_ascii=False correctly preserves non-ASCII characters in JSON output, which is important for internationalization.

Note: The AI summary incorrectly states that ensure_ascii was removed and defaults to True. The code clearly retains ensure_ascii=False.


561-564: LGTM — Cleaner type handling.

The explicit cast(object, value) satisfies static type checkers without relying on # type: ignore comments. The added comment clarifies the fallback intent.

claude_code_log/factories/user_factory.py (2)

447-449: LGTM!

The type ignore removal is appropriate here. The hasattr(item, "text") check on line 447 ensures the attribute exists at runtime, and the explicit : str annotation handles the type.


463-465: Remove or clarify the "duck-typed" terminology—the code is type-safe.

The concern about item.model_dump() failing is unfounded. Because content_list is typed as list[ContentItem] (a Union of Pydantic models), every item is guaranteed to be a Pydantic BaseModel. This elif branch handles other Union members that happen to have compatible source and type == "image" attributes, not arbitrary duck-typed objects. Since model_dump() is a Pydantic method available on all Union members, the code is type-safe. The comment would be clearer if it described the actual scenario: "Convert image-like Pydantic models to our ImageContent model."

Likely an incorrect or invalid review comment.

test/test_message_types.py (3)

7-8: LGTM!

Clean import of CaptureFixture for proper typing of the pytest fixture parameter.


153-174: LGTM!

This test properly verifies the race condition handling where a file may be deleted between glob and open. The assertions correctly check both the return value (empty list) and the warning output.


193-194: LGTM!

Appropriate comment explaining why the capsys-dependent test is skipped when running directly.

claude_code_log/html/renderer_code.py (4)

15-20: LGTM!

Clean import structure using explicit Pygments public API types (Lexer, Formatter) instead of relying on broad type-ignores. This improves type safety and makes the code more maintainable.


54-54: LGTM!

Proper use of underscore prefix for unused loop variables (_name, _mimetypes), following Python conventions.


98-121: LGTM!

Good refactoring pattern: initializing a default TextLexer before the try block ensures the fallback is always available without duplicating fallback logic. The explicit Lexer type annotation improves clarity.


124-133: LGTM!

The explicit Formatter type annotation and removal of type-ignore on the highlight call align with the improved typing throughout this module.

claude_code_log/html/utils.py (3)

203-208: LGTM!

Import structure matches the pattern established in renderer_code.py, maintaining consistency across the codebase for Pygments type usage.


219-233: LGTM!

The default TextLexer initialization pattern and explicit type annotations are consistent with the changes in renderer_code.py. Good code consistency across the HTML rendering modules.


445-447: Pragmatic workaround for Jinja2's strict globals typing.

The Any cast is a reasonable approach to bypass Jinja2's overly strict type definition for env.globals. The comment explains the intent clearly.

test/test_cache.py (7)

26-33: LGTM!

Good approach: creating a nested project subdirectory ensures each test gets a unique db_path at the parent level, providing proper test isolation for the shared SQLite database.


104-116: LGTM!

Tests correctly verify the new db_path property and its location at the parent directory level, aligned with the SQLite-backed cache design.


141-170: LGTM!

Direct SQLite queries are appropriate here to verify the internal storage structure. Filtering by project_id ensures test isolation when using a shared database. The use of NULLS LAST is valid (supported since SQLite 3.30.0). Based on learnings, this SQL syntax is correct.


266-328: Excellent edge case coverage.

This test catches a subtle bug where ISO format with microseconds (.999999) vs UTC Z suffix could cause incorrect timestamp comparisons. The detailed assertion message will help debug any regressions.


330-352: LGTM!

Test properly verifies that clear_cache removes cached files and sessions while preserving the project structure in the SQLite database.


660-670: LGTM!

Test renamed to better reflect the scenario (missing cache entry rather than corrupted file), appropriate for the SQLite-backed design where file corruption is less relevant.


707-792: Comprehensive environment variable test coverage.

The tests properly verify the precedence order: explicit db_path > CLAUDE_CODE_LOG_CACHE_PATH env var > default location. Using monkeypatch for environment manipulation is the correct pytest approach.

claude_code_log/converter.py (15)

6-9: LGTM!

Import additions are appropriate for the new functionality: dataclass for GenerationStats, cache-related imports, and is_html_outdated for HTML regeneration checks.

Also applies to: 23-28, 39-39


157-166: LGTM!

Proper handling of the race condition where a file may be deleted between glob discovery and file open. The graceful degradation to an empty list with an optional warning is the right approach.


420-486: LGTM!

GenerationStats is a well-designed dataclass for tracking per-project generation metrics. The summary() method provides a clean, formatted output for reporting.


542-576: Session assignment may produce pages larger than expected.

The logic adds sessions to the current page and then checks if the limit is exceeded. This means a page can contain significantly more messages than page_size if a large session is added. While the docstring notes "overflow allowed to keep sessions intact," consider whether there should be an upper bound or warning for extremely large sessions.

Consider whether a session with, say, 50,000 messages should be allowed on a single page, or if there should be a secondary split mechanism or warning for such cases.


579-674: LGTM!

Good fallback implementation for building session data when cache is unavailable. The logic correctly filters warmup sessions and extracts token usage from assistant messages.


726-732: LGTM!

Proper cleanup of orphan pages when page count decreases. This prevents stale HTML files from accumulating when sessions are deleted or pagination configuration changes.


793-805: Page navigation logic is sound.

The page_info dict correctly provides previous/next links with appropriate is_last_page flag. Enabling the previous page's next link when creating a new page ensures navigation stays consistent.


952-972: Good early exit optimization.

This fast path avoids expensive message loading when all HTML files are current. The check order (combined stale → session stale) is logical.


1015-1023: Verify pagination threshold behavior.

Pagination is enabled when total_message_count > page_size OR existing_page_count > 1. The second condition preserves pagination for projects that were previously paginated but now have fewer messages. This seems intentional to avoid constantly switching between modes.


1099-1129: LGTM!

has_cache_changes provides a fast mtime-only check without loading messages, enabling efficient early-exit paths in the pipeline.


1456-1460: Good API enhancement.

Returning the count of regenerated sessions enables better progress reporting in the calling code.


1561-1577: LGTM!

Proper tracking of HTML cache updates per session, enabling incremental regeneration on subsequent runs.


1586-1627: Helpful user guidance for archived sessions.

Reading cleanupPeriodDays from settings and printing a note with a link to restoration documentation improves the user experience when encountering archived sessions.


1973-2028: LGTM!

Archived project handling is well-integrated: cached data is retrieved, displayed with [ARCHIVED] indicator, and added to project summaries with is_archived: True flag for UI differentiation.


2055-2073: LGTM!

Clean summary output with timing, error/warning reporting, and the archived sessions note when applicable.

claude_code_log/renderer.py (2)

1738-1753: Good: _render_messages() now safely handles list vs string content.
The list-vs-string branching + chunk_message_content() integration makes message handling clearer and more robust.


2308-2320: is_html_outdated() helper looks fine; consider reusing get_renderer("html") if instantiation gets hot.
Probably negligible, but worth keeping in mind if this ends up on a tight loop.

test/test_tui.py (1)

90-107: Fixture update to per-session JSONL (+ explicit UTF-8) is a solid improvement.
Matches real-world layout and removes implicit-encoding ambiguity.

Comment on lines +571 to +619
def _split_into_pages(self, content: str) -> list[str]:
"""Split markdown content into pages by character count.
Splits at section boundaries (## ) when possible to avoid
cutting mid-section, but will split within sections if
a single section exceeds PAGE_SIZE_CHARS.
"""
import re

if len(content) <= self.PAGE_SIZE_CHARS:
return [content]

pages: list[str] = []
current_page = ""

# Split by level 2 headings, keeping the delimiter
sections = re.split(r"(\n(?=## ))", content)

for section in sections:
if not section:
continue

# If adding this section exceeds page size
if len(current_page) + len(section) > self.PAGE_SIZE_CHARS:
# If current page has content, save it
if current_page.strip():
pages.append(current_page)
current_page = ""

# If section itself exceeds page size, split it by lines
if len(section) > self.PAGE_SIZE_CHARS:
lines = section.split("\n")
for line in lines:
if len(current_page) + len(line) + 1 > self.PAGE_SIZE_CHARS:
if current_page.strip():
pages.append(current_page)
current_page = line + "\n"
else:
current_page += line + "\n"
else:
current_page = section
else:
current_page += section

# Don't forget the last page
if current_page.strip():
pages.append(current_page)

return pages if pages else [content]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pagination split logic + Textual API usage need a quick sanity check.

  • re.split(r"(\n(?=## ))", content) likely produces delimiter-only chunks ("\n") and can lead to odd page boundaries.
  • viewer.document.update(...) may be Textual-version sensitive (and document might not be the supported public API).

If Textual v6.5.0 supports it, prefer updating via a public method/property (or recreating the widget) and use a multiline heading split like (?m)(?=^##\s) to avoid delimiter tokens.

Also applies to: 724-744

🤖 Prompt for AI Agents
In @claude_code_log/tui.py around lines 571 - 619, The _split_into_pages
function currently uses a capturing split re.split(r"(\n(?=## ))", ...) which
yields delimiter-only chunks and causes odd boundaries; replace that with a
multiline zero-width split like re.split(r"(?m)(?=^##\s)', content) or otherwise
split on a non-capturing lookahead so you don't get standalone "\n" tokens, and
ensure you trim or skip empty tokens and preserve section headings when building
pages (handle oversized sections by splitting on lines as you already do). Also
verify and update usages of viewer.document.update to use the supported public
Textual API for the installed version (e.g., viewer.update or recreating the
widget / calling the documented method) rather than directly mutating
viewer.document; locate references to viewer.document.update in the same module
and swap to the public method or widget-recreation approach if Textual v6.5.0
requires it.

Comment on lines +1690 to +1920
"""Test that Enter key selects the highlighted project."""
with tempfile.TemporaryDirectory() as temp_dir:
project1 = Path(temp_dir) / "project1"
project1.mkdir()
(project1 / "session-1.jsonl").write_text('{"type":"user"}\n')

project2 = Path(temp_dir) / "project2"
project2.mkdir()
(project2 / "session-2.jsonl").write_text('{"type":"user"}\n')

app = ProjectSelector(
projects=[project1, project2],
matching_projects=[],
archived_projects=set(),
)

async with app.run_test() as pilot:
await pilot.pause(0.2)

# Select first project and press Enter
await pilot.press("enter")
await pilot.pause(0.1)

@pytest.mark.asyncio
async def test_escape_key_quits(self):
"""Test that Escape key quits the application."""
with tempfile.TemporaryDirectory() as temp_dir:
project1 = Path(temp_dir) / "project1"
project1.mkdir()

app = ProjectSelector(
projects=[project1],
matching_projects=[],
archived_projects=set(),
)

async with app.run_test() as pilot:
await pilot.pause(0.2)

# Press Escape to quit
await pilot.press("escape")
await pilot.pause(0.1)

@pytest.mark.asyncio
async def test_archive_project_action(self):
"""Test archiving a project deletes JSONL files."""
with tempfile.TemporaryDirectory() as temp_dir:
project_path = Path(temp_dir) / "project1"
project_path.mkdir()
jsonl1 = project_path / "session-1.jsonl"
jsonl2 = project_path / "session-2.jsonl"
jsonl1.write_text('{"type":"user"}\n')
jsonl2.write_text('{"type":"user"}\n')

app = ProjectSelector(
projects=[project_path],
matching_projects=[],
archived_projects=set(),
)

async with app.run_test() as pilot:
await pilot.pause(0.2)

# Select the project
app.selected_project_path = project_path

# Both JSONL files should exist
assert jsonl1.exists()
assert jsonl2.exists()

# Press 'a' to archive and then confirm
await pilot.press("a")
await pilot.pause(0.1)
await pilot.press("y")
await pilot.pause(0.1)

# JSONL files should be deleted
assert not jsonl1.exists()
assert not jsonl2.exists()

# Project should now be in archived set
assert project_path in app.archived_projects

@pytest.mark.asyncio
async def test_archive_project_already_archived_shows_warning(self):
"""Test archiving an already archived project shows warning."""
with tempfile.TemporaryDirectory() as temp_dir:
project_path = Path(temp_dir) / "project1"
project_path.mkdir()

app = ProjectSelector(
projects=[project_path],
matching_projects=[],
archived_projects={project_path}, # Already archived
)

async with app.run_test() as pilot:
await pilot.pause(0.2)

# Select the archived project
app.selected_project_path = project_path

# Try to archive - should show warning
await pilot.press("a")
await pilot.pause(0.1)

@pytest.mark.asyncio
async def test_delete_project_cache_only(self):
"""Test deleting project cache only keeps JSONL files."""
with tempfile.TemporaryDirectory() as temp_dir:
project_path = Path(temp_dir) / "project1"
project_path.mkdir()
jsonl = project_path / "session-1.jsonl"
jsonl.write_text('{"type":"user"}\n')

app = ProjectSelector(
projects=[project_path],
matching_projects=[],
archived_projects=set(),
)

with patch.object(CacheManager, "clear_cache"):
async with app.run_test() as pilot:
await pilot.pause(0.2)

# Select the project
app.selected_project_path = project_path

# Press 'd' to delete and choose cache only
await pilot.press("d")
await pilot.pause(0.1)
await pilot.press("c") # Cache only
await pilot.pause(0.1)

# JSONL file should still exist
assert jsonl.exists()

@pytest.mark.asyncio
async def test_delete_project_both(self):
"""Test deleting project cache and JSONL files."""
with tempfile.TemporaryDirectory() as temp_dir:
project_path = Path(temp_dir) / "project1"
project_path.mkdir()
jsonl = project_path / "session-1.jsonl"
jsonl.write_text('{"type":"user"}\n')

app = ProjectSelector(
projects=[project_path],
matching_projects=[],
archived_projects=set(),
)

with patch.object(CacheManager, "clear_cache"):
async with app.run_test() as pilot:
await pilot.pause(0.2)

# Select the project
app.selected_project_path = project_path

assert jsonl.exists()

# Press 'd' to delete and choose both
await pilot.press("d")
await pilot.pause(0.1)
await pilot.press("b") # Both
await pilot.pause(0.1)

# JSONL file should be deleted
assert not jsonl.exists()

@pytest.mark.asyncio
async def test_restore_project_creates_directory(self):
"""Test restoring a project creates directory if missing."""
with tempfile.TemporaryDirectory() as temp_dir:
project_path = Path(temp_dir) / "deleted_project"
# Don't create the directory

mock_session_data = {
"session-123": SessionCacheData(
session_id="session-123",
first_timestamp="2025-01-01T10:00:00Z",
last_timestamp="2025-01-01T10:01:00Z",
message_count=1,
first_user_message="Test",
total_input_tokens=10,
total_output_tokens=10,
),
}

app = ProjectSelector(
projects=[project_path],
matching_projects=[],
archived_projects={project_path}, # Archived project
)

with (
patch.object(CacheManager, "get_cached_project_data") as mock_cache,
patch.object(
CacheManager,
"export_session_to_jsonl",
return_value=['{"type":"user"}'],
),
):
mock_cache.return_value = Mock(sessions=mock_session_data)

async with app.run_test() as pilot:
await pilot.pause(0.2)

# Select the project
app.selected_project_path = project_path

# Directory should not exist
assert not project_path.exists()

# Press 'r' to restore and confirm
await pilot.press("r")
await pilot.pause(0.1)
await pilot.press("y")
await pilot.pause(0.1)

# Directory should now exist
assert project_path.exists()

@pytest.mark.asyncio
async def test_restore_project_not_archived_shows_warning(self):
"""Test restoring a non-archived project shows warning."""
with tempfile.TemporaryDirectory() as temp_dir:
project_path = Path(temp_dir) / "project1"
project_path.mkdir()
(project_path / "session-1.jsonl").write_text('{"type":"user"}\n')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make test file writes consistently UTF-8 (a few write_text() calls omit encoding).
Some tests use Path.write_text('...') without encoding="utf-8" while others specify it.

Example patch pattern
-            (project1 / "session-1.jsonl").write_text('{"type":"user"}\n')
+            (project1 / "session-1.jsonl").write_text('{"type":"user"}\n', encoding="utf-8")

(Apply similarly to the other write_text(...jsonl...) sites.)

🤖 Prompt for AI Agents
In @test/test_tui.py around lines 1690 - 1920, Several tests call
Path.write_text(...) to create JSONL files without explicitly setting encoding;
update those write_text calls in tests like test_archive_project_action,
test_delete_project_cache_only, test_delete_project_both,
test_restore_project_not_archived_shows_warning and other occurrences in
test_tui.py to pass encoding="utf-8" (e.g.,
Path(...).write_text('{"type":"user"}\n', encoding="utf-8")) so file writes are
consistent and explicit across ProjectSelector-related tests.

@daaain
Copy link
Owner Author

daaain commented Jan 13, 2026

You might even want to cut a 0.10 release, but then with a caveat that Markdown preview in the TUI is "experimental" (very slow on large sessions, crash when following the "Back to combined sessions" link, no toggle for the outline...). ..

Oh right! Today I added pagination to the Markdown view in TUI, heading links scroll, links pop up a message saying it won't work, the ToC outline can be toggled now 😸

Will just play a bit more with it in the coming days and do another sweep of CodeRabbit reviews, I'll be away with work so weekend to merge sounds good.

I'm thinking of doing this as a 1.0.0 release as it's kind of a breaking change from the previous JSON cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants