Skip to content

fix: consolidate test infrastructure, harden parallel spec workflows#80

Merged
maxritter merged 3 commits intomainfrom
dev
Feb 20, 2026
Merged

fix: consolidate test infrastructure, harden parallel spec workflows#80
maxritter merged 3 commits intomainfrom
dev

Conversation

@maxritter
Copy link
Owner

@maxritter maxritter commented Feb 20, 2026

Summary

  • Test consolidation: Merged duplicate pilot/tests/hooks/ into pilot/hooks/tests/, fixing 16 broken tests from API drift (exit codes, mock paths, removed functions). All 1154 Python tests pass.
  • CI/pre-commit coverage: Added pilot/hooks/tests/ to CI pipeline and installer/tests/unit/ trigger to pre-commit hook.
  • Parallel spec agents: Added native background: true to all 4 verification agent definitions; updated spec-plan, spec-verify, and task-and-workflow rules for consistency (run_in_background=true kept as fallback).
  • Playwright session isolation: Added -s=$PILOT_SESSION_ID session isolation to spec-verify, spec-implement, playwright-cli, testing, and verification rules to prevent cross-session browser interference.
  • Installer improvements: Moved settings.json to plugin directory (no more three-way merge into ~/.claude/settings.json), added migration for old merged settings, fixed Vexor install UX (model pre-download is best-effort).
  • Hook hardening: File checker warnings are now non-blocking (additionalContext); EnterPlanMode/ExitPlanMode blocking strengthened with exit code 2 and clearer messaging.

Test plan

  • All 1154 Python tests pass (installer/tests/unit/ + launcher/tests/unit/ + pilot/hooks/tests/)
  • Pre-commit hook runs and passes for both launcher/hook and installer changes
  • CI pipeline runs all three test directories
  • PR bot reviews pass

Summary by CodeRabbit

  • Bug Fixes

    • Installer is more resilient: embedding model pre-download is best-effort and will download on first use if needed.
  • Chores

    • Settings are now stored under the plugin directory; existing user settings are automatically migrated and legacy Pilot-managed defaults are pruned.
  • UI

    • Removed the "g r" keyboard shortcut.
  • Tests

    • Added/updated tests for settings migration and resilient installer behavior.

- Merge duplicate test directories (pilot/tests/hooks/ → pilot/hooks/tests/)
  fixing 16 broken tests from API drift (exit codes, mock paths, removed functions)
- Add pilot/hooks/tests/ to CI pipeline and installer tests to pre-commit
- Add `background: true` to all 4 verification agents for native parallel execution
- Update spec-plan, spec-verify, and task-and-workflow rules for background agent consistency
- Move settings.json installation to plugin directory, migrate old merged settings
- Make file_checker warnings non-blocking (additionalContext instead of block)
- Strengthen EnterPlanMode/ExitPlanMode blocking with exit code 2
- Add playwright-cli session isolation to spec-verify, spec-implement, and testing rules
- Fix Vexor install UX: model pre-download is best-effort, not a failure
@vercel
Copy link

vercel bot commented Feb 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
claude-pilot Ignored Ignored Feb 20, 2026 9:08am

Request Review

@claude
Copy link

claude bot commented Feb 20, 2026

Claude finished @maxritter's task in 4m 9s —— View job


Code Review

A well-scoped PR with clean implementations across all areas. All 1154 tests pass and the consolidation rationale is sound. Below are observations by area.


Test Consolidation (pilot/tests/hooks/pilot/hooks/tests/)

Positive: The centralized conftest.py approach cleanly eliminates per-file sys.path.insert boilerplate. Co-locating tests with the code they test is the right structure.

Minor — _run_with_input duplicated across 3 test classes:

pilot/hooks/tests/test_tool_redirect.py:88-94, :116-122, :158-164 — identical helper defined three times. Consider a module-level helper or a shared base class:

def _run_redirect(tool_name: str, tool_input: dict | None = None) -> int:
    hook_data = {"tool_name": tool_name}
    if tool_input is not None:
        hook_data["tool_input"] = tool_input
    with patch("sys.stdin", StringIO(json.dumps(hook_data))):
        return run_tool_redirect()

Fix this →


Installer: Settings Migration (claude_files.py:526-571)

Positive: The migration logic is correct and well-tested. The three test cases (user customizations survive, no-change deletes old file, no baseline = clean install) cover the key scenarios.

Observation — shallow comparison for nested objects (_migrate_old_settings):

installer/steps/claude_files.py:557-561:

for key, value in current.items():
    if key not in baseline:
        user_settings[key] = value
    elif value != baseline[key]:
        user_settings[key] = value

If a user modified one nested key (e.g. env.DISABLE_TELEMETRY) while Pilot manages the whole env dict, the entire env object is preserved — including unchanged Pilot-managed entries like env.NEW_VAR_ADDED_BY_PILOT. Since this is a one-time migration (the baseline file is deleted after running), over-preservation is the safer trade-off. Not a bug, but worth noting that a deep-merge approach (like merge_settings already uses) would be more precise for nested keys if this matters post-migration.

_get_forced_claude_version() reads ~/.claude/settings.json (dependencies.py:97-104):

settings_path = Path.home() / ".claude" / "settings.json"

After this PR, settings live in ~/.claude/pilot/settings.json. If users upgrade and their ~/.claude/settings.json is cleaned up by migration, this function will no longer find FORCE_CLAUDE_VERSION. Consider also checking ~/.claude/pilot/settings.json as a fallback, or documenting that this env var must be set via another mechanism.

Fix this →


Hook Hardening (tool_redirect.py)

Positive: Exit code 2 from block() is correct behavior — it signals an intentional block rather than a script error. The red stderr message makes debugging straightforward. Tests are updated to assert result == 2.

Unused process_settings() in claude_files.py:

installer/steps/claude_files.py:54-57 — this function exists at module level but is only called in tests. The _update_plugin_settings method uses patch_claude_paths() but not process_settings() for re-serialization. If the function is purely for test/utility purposes, consider moving it to the test file or removing it from the module.


Pre-commit Hook (.githooks/pre-commit)

Minor — tail -5 may truncate meaningful failure context:

.githooks/pre-commit:17:

uv run pytest launcher/tests/ pilot/hooks/tests/ -q --tb=short 2>&1 | tail -5

With set -eo pipefail, failures propagate correctly. However, tail -5 limits output to 5 lines, which may hide the root cause of test failures (e.g., import errors, fixture failures). tail -20 or tail -30 would be more useful for debugging without significantly affecting terminal noise on passing runs.


Parallel Spec / Playwright Session Isolation

Positive: The session isolation changes are thorough and consistent across spec-verify.md, playwright-cli.md, verification.md, and testing.md. The PW_SESSION="${PILOT_SESSION_ID:-default}" fallback pattern is correct. Requiring session close after E2E testing (playwright-cli -s="$PW_SESSION" close) prevents leaked browser instances.

Positive: The background: true frontmatter on all 4 agent definitions is cleaner than relying on call-site run_in_background=true. Keeping both as redundancy is a prudent safety net.


CI Pipeline (.github/workflows/release.yml)

Adding pilot/hooks/tests/ to the CI run correctly closes the gap where the newly consolidated tests would have been missed.


Summary

Area Status
Test consolidation ✅ Clean, correct structure
Settings migration ✅ Correct, well-tested; one forward-compat concern
Hook hardening ✅ Correct behavior
Playwright session isolation ✅ Comprehensive and consistent
Pre-commit hook ✅ Works; tail -5 is minor UX concern
CI coverage ✅ Properly updated

Issues requiring attention before merge: None — all findings are minor improvements or observations. The FORCE_CLAUDE_VERSION forward-compatibility concern is the one worth tracking if users rely on that feature, but it likely only affects edge-case pinned installs.

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

Migrate Pilot-managed settings from /.claude/settings.json into a plugin-specific file (/.claude/pilot/settings.json), add post-install settings migration and path-patching, change Vexor local-model setup to best-effort, and adjust CI pre-commit and release workflows to gate and run tests for launcher/hooks and installer changes.

Changes

Cohort / File(s) Summary
CI/CD Pipeline Configuration
.githooks/pre-commit, .github/workflows/release.yml
Replaced Python-change gate with LAUNCHER_CHANGED and added INSTALLER_CHANGED to trigger installer unit tests; release workflow test paths expanded to include pilot/hooks/tests/ and coverage for pilot.hooks.
Installer — settings & post-install logic
installer/steps/claude_files.py
Removed _install_settings and explicit "settings" category handling; added _update_plugin_settings and _migrate_old_settings; post-install now updates plugin settings paths and migrates/prunes old Pilot-managed keys from ~/.claude/settings.json into ~/.claude/pilot/settings.json.
Installer — dependency handling (Vexor)
installer/steps/dependencies.py
Treat Vexor local-model setup as best-effort: failures log that the embedding model will download on first use and return success so install continues across MLX/fallback paths.
Installer tests — settings migration
installer/tests/unit/steps/test_claude_files.py
Updated expectations to write settings to ~/.claude/pilot/settings.json; added TestSettingsMigration validating migration of user customizations, pruning of Pilot-managed keys, baseline cleanup, and skip-on-clean-install.
Installer tests — Vexor install
installer/tests/unit/steps/test_dependencies.py
Added tests for Vexor install failure scenarios asserting return behavior and UI messaging when local download or binary install fails.
Console tests
console/tests/ui/search-removal.test.ts
Removed existence-check test and existsSync import; added assertion to ensure "g r" is not present in shortcuts source.

Sequence Diagram(s)

sequenceDiagram
    participant Installer
    participant HomeFS as Home FS (~/.claude)
    participant PluginDir as Plugin Dir (~/.claude/pilot)
    participant Logger

    Installer->>HomeFS: check for baseline file (pilot_baseline)
    alt baseline exists
        Installer->>HomeFS: read baseline and current settings
        Installer->>PluginDir: write migrated settings (preserve user keys, remove Pilot-managed keys)
        Installer->>HomeFS: delete baseline file
        Logger->>Installer: info migration complete
    else no baseline
        Installer->>Logger: no migration needed (clean install)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

released

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title partially addresses consolidating test infrastructure but does not capture the full scope of changes, including settings migration, Vexor installation hardening, and hook enhancements.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
.github/workflows/release.yml (1)

136-138: Coverage reporting doesn't include the newly added pilot/hooks/tests/ directory.

Tests in pilot/hooks/tests/ will run, but --cov=installer --cov=launcher won't measure coverage for pilot.hooks code. If coverage data for that module matters, add --cov=pilot.hooks (or --cov=pilot).

-          python3 -m pytest installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/ -v \
-            --cov=installer --cov=launcher \
+          python3 -m pytest installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/ -v \
+            --cov=installer --cov=launcher --cov=pilot.hooks \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/release.yml around lines 136 - 138, The pytest command in
the release workflow runs tests under pilot/hooks/tests but does not collect
coverage for that package; update the pytest invocation (the line starting with
"python3 -m pytest installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/
-v \") to include coverage for the pilot package by adding either
--cov=pilot.hooks or --cov=pilot so the pilot.hooks code is measured alongside
--cov=installer and --cov=launcher.
installer/steps/dependencies.py (1)

385-388: Consider extracting the repeated best-effort pattern into a small helper.

The same 3-line block appears four times. A tiny wrapper would reduce duplication and centralize the message:

♻️ Optional refactor
+def _best_effort_model_setup(ui: Any = None, **kwargs: Any) -> None:
+    """Attempt local model download; log info and continue on failure."""
+    if not _setup_vexor_local_model(ui, **kwargs):
+        if ui:
+            ui.info("Embedding model will download on first use")

Then each call site becomes _best_effort_model_setup(ui) or _best_effort_model_setup(ui, device="mlx").

Also applies to: 410-413, 421-424, 427-430

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@installer/steps/dependencies.py` around lines 385 - 388, There’s a repeated
3-line “best-effort” pattern after model setup calls (example: the block that
checks if not _setup_vexor_local_model(ui): if ui: ui.info("Embedding model will
download on first use") return True) — extract that into a small helper (e.g.
_best_effort_model_setup(ui, device=None)) that runs the call, logs the
"Embedding model will download on first use" message when the setup returns
False, and returns the boolean result; then replace the four duplicated blocks
(the call sites around _setup_vexor_local_model(ui) and the other three similar
setup calls) with a single call to the helper, passing device or context when
needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/release.yml:
- Around line 136-138: The pytest command in the release workflow runs tests
under pilot/hooks/tests but does not collect coverage for that package; update
the pytest invocation (the line starting with "python3 -m pytest
installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/ -v \") to include
coverage for the pilot package by adding either --cov=pilot.hooks or --cov=pilot
so the pilot.hooks code is measured alongside --cov=installer and
--cov=launcher.

In `@installer/steps/dependencies.py`:
- Around line 385-388: There’s a repeated 3-line “best-effort” pattern after
model setup calls (example: the block that checks if not
_setup_vexor_local_model(ui): if ui: ui.info("Embedding model will download on
first use") return True) — extract that into a small helper (e.g.
_best_effort_model_setup(ui, device=None)) that runs the call, logs the
"Embedding model will download on first use" message when the setup returns
False, and returns the boolean result; then replace the four duplicated blocks
(the call sites around _setup_vexor_local_model(ui) and the other three similar
setup calls) with a single call to the helper, passing device or context when
needed.

The directory existence check fails in CI due to git-crypt decryption
creating placeholder files. The remaining 7 tests already verify Search
is fully removed from exports, imports, routes, and navigation.
@claude
Copy link

claude bot commented Feb 20, 2026

Claude finished @maxritter's task in 42s —— View job


Incremental Review

  • Analyzing new push (changes to console/tests/ui/search-removal.test.ts)
  • Review delta changes
  • Post findings

New push: console/tests/ui/search-removal.test.ts

Change: Removed the flaky existsSync directory-presence test and its now-unused existsSync import.

- import { readFileSync, existsSync } from "fs";
+ import { readFileSync } from "fs";

- it("Search view directory no longer exists", () => {
-   const searchViewExists = existsSync("src/ui/viewer/views/Search");
-   expect(searchViewExists).toBe(false);
- });

No new issues introduced. This is a correct cleanup — testing for the absence of a directory via existsSync is brittle (path-sensitive, can vary by environment) and redundant when the content-based tests already verify no references to SearchView remain in the codebase.


Still open (from prior review)

  • pilot/hooks/tests/test_tool_redirect.py_run_with_input defined identically in 3 test classes
  • installer/steps/dependencies.py_get_forced_claude_version() reads only ~/.claude/settings.json, not the new ~/.claude/pilot/settings.json
  • installer/steps/claude_files.pyprocess_settings() exists at module level but appears unused outside tests
  • .githooks/pre-committail -5 may truncate useful failure context
  • .github/workflows/release.yml:136-138pilot/hooks/tests/ runs in CI but no --cov=pilot.hooks coverage tracking

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
console/tests/ui/search-removal.test.ts (1)

49-56: ⚠️ Potential issue | 🟡 Minor

Test name claims to verify the 'g r' key binding, but no assertion covers it.

The description "keyboard shortcuts do not include 'g r' for Search" implies the shortcut key itself is checked, yet the assertions only look for the human-readable label ("Go to Search") and the action URI ("navigate:/search"). The actual "g r" token in shortcuts.ts is never asserted, leaving a silent coverage gap for the key binding.

🔧 Proposed fix — assert the key string or tighten the test name

Option A — add the missing key assertion:

  it("keyboard shortcuts do not include 'g r' for Search", () => {
    const source = readFileSync(
      "src/ui/viewer/constants/shortcuts.ts",
      "utf-8"
    );
    expect(source).not.toContain("Go to Search");
    expect(source).not.toContain("navigate:/search");
+   expect(source).not.toContain('"g r"');   // or whatever literal is used in shortcuts.ts
  });

Option B — rename to match what is actually tested:

- it("keyboard shortcuts do not include 'g r' for Search", () => {
+ it("keyboard shortcuts do not include Search label or navigate:/search action", () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@console/tests/ui/search-removal.test.ts` around lines 49 - 56, The test
"keyboard shortcuts do not include 'g r' for Search" fails to assert the actual
key token; update the test in search-removal.test.ts to either (A) add an
assertion that the shortcuts source (from
"src/ui/viewer/constants/shortcuts.ts") does not contain the key string "g r"
(or whatever exact token is used for that binding) in addition to the existing
checks for "Go to Search" and "navigate:/search", or (B) change the test name to
reflect that it only checks for the label and URI (e.g., "keyboard shortcuts do
not include Search label or action"). Locate the test function (it("keyboard
shortcuts do not include 'g r' for Search", ...)) and modify it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@console/tests/ui/search-removal.test.ts`:
- Around line 49-56: The test "keyboard shortcuts do not include 'g r' for
Search" fails to assert the actual key token; update the test in
search-removal.test.ts to either (A) add an assertion that the shortcuts source
(from "src/ui/viewer/constants/shortcuts.ts") does not contain the key string "g
r" (or whatever exact token is used for that binding) in addition to the
existing checks for "Go to Search" and "navigate:/search", or (B) change the
test name to reflect that it only checks for the label and URI (e.g., "keyboard
shortcuts do not include Search label or action"). Locate the test function
(it("keyboard shortcuts do not include 'g r' for Search", ...)) and modify it
accordingly.

- Check ~/.claude/pilot/settings.json in _get_forced_claude_version fallback
- Add --cov=pilot.hooks to CI pytest coverage tracking
- Extract duplicated _run_with_input helper to module-level function
- Increase pre-commit tail output from 5 to 20 lines
- Add missing "g r" key assertion in search-removal test
@maxritter maxritter merged commit df6fd5a into main Feb 20, 2026
5 of 11 checks passed
@maxritter maxritter deleted the dev branch February 20, 2026 09:08
@claude
Copy link

claude bot commented Feb 20, 2026

Incremental Review

  • Analyzing new push
  • Review changed files
  • Post findings

View job run

github-actions bot pushed a commit that referenced this pull request Feb 20, 2026
## [6.9.1](v6.9.0...v6.9.1) (2026-02-20)

### Bug Fixes

* consolidate test infrastructure, harden parallel spec workflows ([#80](#80)) ([df6fd5a](df6fd5a))
* make vexor model pre-download best-effort during installation ([4b2f2da](4b2f2da))
* use sys.executable instead of uv in spec_validators tests ([9145e19](9145e19))
@github-actions
Copy link
Contributor

🎉 This PR is included in version 6.9.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant