fix: consolidate test infrastructure, harden parallel spec workflows by maxritter · Pull Request #80 · maxritter/pilot-shell

maxritter · 2026-02-20T08:38:36Z

Summary

Test consolidation: Merged duplicate pilot/tests/hooks/ into pilot/hooks/tests/, fixing 16 broken tests from API drift (exit codes, mock paths, removed functions). All 1154 Python tests pass.
CI/pre-commit coverage: Added pilot/hooks/tests/ to CI pipeline and installer/tests/unit/ trigger to pre-commit hook.
Parallel spec agents: Added native background: true to all 4 verification agent definitions; updated spec-plan, spec-verify, and task-and-workflow rules for consistency (run_in_background=true kept as fallback).
Playwright session isolation: Added -s=$PILOT_SESSION_ID session isolation to spec-verify, spec-implement, playwright-cli, testing, and verification rules to prevent cross-session browser interference.
Installer improvements: Moved settings.json to plugin directory (no more three-way merge into ~/.claude/settings.json), added migration for old merged settings, fixed Vexor install UX (model pre-download is best-effort).
Hook hardening: File checker warnings are now non-blocking (additionalContext); EnterPlanMode/ExitPlanMode blocking strengthened with exit code 2 and clearer messaging.

Test plan

All 1154 Python tests pass (installer/tests/unit/ + launcher/tests/unit/ + pilot/hooks/tests/)
Pre-commit hook runs and passes for both launcher/hook and installer changes
CI pipeline runs all three test directories
PR bot reviews pass

Summary by CodeRabbit

Bug Fixes
- Installer is more resilient: embedding model pre-download is best-effort and will download on first use if needed.
Chores
- Settings are now stored under the plugin directory; existing user settings are automatically migrated and legacy Pilot-managed defaults are pruned.
UI
- Removed the "g r" keyboard shortcut.
Tests
- Added/updated tests for settings migration and resilient installer behavior.

- Merge duplicate test directories (pilot/tests/hooks/ → pilot/hooks/tests/) fixing 16 broken tests from API drift (exit codes, mock paths, removed functions) - Add pilot/hooks/tests/ to CI pipeline and installer tests to pre-commit - Add `background: true` to all 4 verification agents for native parallel execution - Update spec-plan, spec-verify, and task-and-workflow rules for background agent consistency - Move settings.json installation to plugin directory, migrate old merged settings - Make file_checker warnings non-blocking (additionalContext instead of block) - Strengthen EnterPlanMode/ExitPlanMode blocking with exit code 2 - Add playwright-cli session isolation to spec-verify, spec-implement, and testing rules - Fix Vexor install UX: model pre-download is best-effort, not a failure

vercel · 2026-02-20T08:38:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
claude-pilot	Ignored		Feb 20, 2026 9:08am

claude · 2026-02-20T08:38:55Z

Claude finished @maxritter's task in 4m 9s —— View job

Code Review

A well-scoped PR with clean implementations across all areas. All 1154 tests pass and the consolidation rationale is sound. Below are observations by area.

Test Consolidation (`pilot/tests/hooks/` → `pilot/hooks/tests/`)

Positive: The centralized conftest.py approach cleanly eliminates per-file sys.path.insert boilerplate. Co-locating tests with the code they test is the right structure.

Minor — _run_with_input duplicated across 3 test classes:

pilot/hooks/tests/test_tool_redirect.py:88-94, :116-122, :158-164 — identical helper defined three times. Consider a module-level helper or a shared base class:

def _run_redirect(tool_name: str, tool_input: dict | None = None) -> int:
    hook_data = {"tool_name": tool_name}
    if tool_input is not None:
        hook_data["tool_input"] = tool_input
    with patch("sys.stdin", StringIO(json.dumps(hook_data))):
        return run_tool_redirect()

Fix this →

Installer: Settings Migration (`claude_files.py:526-571`)

Positive: The migration logic is correct and well-tested. The three test cases (user customizations survive, no-change deletes old file, no baseline = clean install) cover the key scenarios.

Observation — shallow comparison for nested objects (_migrate_old_settings):

installer/steps/claude_files.py:557-561:

for key, value in current.items():
    if key not in baseline:
        user_settings[key] = value
    elif value != baseline[key]:
        user_settings[key] = value

If a user modified one nested key (e.g. env.DISABLE_TELEMETRY) while Pilot manages the whole env dict, the entire env object is preserved — including unchanged Pilot-managed entries like env.NEW_VAR_ADDED_BY_PILOT. Since this is a one-time migration (the baseline file is deleted after running), over-preservation is the safer trade-off. Not a bug, but worth noting that a deep-merge approach (like merge_settings already uses) would be more precise for nested keys if this matters post-migration.

_get_forced_claude_version() reads ~/.claude/settings.json (dependencies.py:97-104):

settings_path = Path.home() / ".claude" / "settings.json"

After this PR, settings live in ~/.claude/pilot/settings.json. If users upgrade and their ~/.claude/settings.json is cleaned up by migration, this function will no longer find FORCE_CLAUDE_VERSION. Consider also checking ~/.claude/pilot/settings.json as a fallback, or documenting that this env var must be set via another mechanism.

Fix this →

Hook Hardening (`tool_redirect.py`)

Positive: Exit code 2 from block() is correct behavior — it signals an intentional block rather than a script error. The red stderr message makes debugging straightforward. Tests are updated to assert result == 2.

Unused process_settings() in claude_files.py:

installer/steps/claude_files.py:54-57 — this function exists at module level but is only called in tests. The _update_plugin_settings method uses patch_claude_paths() but not process_settings() for re-serialization. If the function is purely for test/utility purposes, consider moving it to the test file or removing it from the module.

Pre-commit Hook (`.githooks/pre-commit`)

Minor — tail -5 may truncate meaningful failure context:

.githooks/pre-commit:17:

uv run pytest launcher/tests/ pilot/hooks/tests/ -q --tb=short 2>&1 | tail -5

With set -eo pipefail, failures propagate correctly. However, tail -5 limits output to 5 lines, which may hide the root cause of test failures (e.g., import errors, fixture failures). tail -20 or tail -30 would be more useful for debugging without significantly affecting terminal noise on passing runs.

Parallel Spec / Playwright Session Isolation

Positive: The session isolation changes are thorough and consistent across spec-verify.md, playwright-cli.md, verification.md, and testing.md. The PW_SESSION="${PILOT_SESSION_ID:-default}" fallback pattern is correct. Requiring session close after E2E testing (playwright-cli -s="$PW_SESSION" close) prevents leaked browser instances.

Positive: The background: true frontmatter on all 4 agent definitions is cleaner than relying on call-site run_in_background=true. Keeping both as redundancy is a prudent safety net.

CI Pipeline (`.github/workflows/release.yml`)

Adding pilot/hooks/tests/ to the CI run correctly closes the gap where the newly consolidated tests would have been missed.

Summary

Area	Status
Test consolidation	✅ Clean, correct structure
Settings migration	✅ Correct, well-tested; one forward-compat concern
Hook hardening	✅ Correct behavior
Playwright session isolation	✅ Comprehensive and consistent
Pre-commit hook	✅ Works; `tail -5` is minor UX concern
CI coverage	✅ Properly updated

Issues requiring attention before merge: None — all findings are minor improvements or observations. The FORCE_CLAUDE_VERSION forward-compatibility concern is the one worth tracking if users rely on that feature, but it likely only affects edge-case pinned installs.

coderabbitai · 2026-02-20T08:38:55Z

Caution

Review failed

The pull request is closed.

Walkthrough

Migrate Pilot-managed settings from ~~/.claude/settings.json into a plugin-specific file (~~/.claude/pilot/settings.json), add post-install settings migration and path-patching, change Vexor local-model setup to best-effort, and adjust CI pre-commit and release workflows to gate and run tests for launcher/hooks and installer changes.

Changes

Cohort / File(s)	Summary
CI/CD Pipeline Configuration `.githooks/pre-commit`, `.github/workflows/release.yml`	Replaced Python-change gate with `LAUNCHER_CHANGED` and added `INSTALLER_CHANGED` to trigger installer unit tests; release workflow test paths expanded to include `pilot/hooks/tests/` and coverage for `pilot.hooks`.
Installer — settings & post-install logic `installer/steps/claude_files.py`	Removed `_install_settings` and explicit "settings" category handling; added `_update_plugin_settings` and `_migrate_old_settings`; post-install now updates plugin settings paths and migrates/prunes old Pilot-managed keys from `~/.claude/settings.json` into `~/.claude/pilot/settings.json`.
Installer — dependency handling (Vexor) `installer/steps/dependencies.py`	Treat Vexor local-model setup as best-effort: failures log that the embedding model will download on first use and return success so install continues across MLX/fallback paths.
Installer tests — settings migration `installer/tests/unit/steps/test_claude_files.py`	Updated expectations to write settings to `~/.claude/pilot/settings.json`; added `TestSettingsMigration` validating migration of user customizations, pruning of Pilot-managed keys, baseline cleanup, and skip-on-clean-install.
Installer tests — Vexor install `installer/tests/unit/steps/test_dependencies.py`	Added tests for Vexor install failure scenarios asserting return behavior and UI messaging when local download or binary install fails.
Console tests `console/tests/ui/search-removal.test.ts`	Removed existence-check test and `existsSync` import; added assertion to ensure `"g r"` is not present in shortcuts source.

Sequence Diagram(s)

sequenceDiagram
    participant Installer
    participant HomeFS as Home FS (~/.claude)
    participant PluginDir as Plugin Dir (~/.claude/pilot)
    participant Logger

    Installer->>HomeFS: check for baseline file (pilot_baseline)
    alt baseline exists
        Installer->>HomeFS: read baseline and current settings
        Installer->>PluginDir: write migrated settings (preserve user keys, remove Pilot-managed keys)
        Installer->>HomeFS: delete baseline file
        Logger->>Installer: info migration complete
    else no baseline
        Installer->>Logger: no migration needed (clean install)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix: consolidate test infrastructure, harden parallel spec workflows #80 — similar changes touching pre-commit gates, release workflow, and installer settings migration.
feat: global extended context toggle and compact settings UI #63 — earlier edits to .githooks/pre-commit altering test gating logic.

Suggested labels

released

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title partially addresses consolidating test infrastructure but does not capture the full scope of changes, including settings migration, Vexor installation hardening, and hook enhancements.
Docstring Coverage	✅ Passed	Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

.github/workflows/release.yml (1)

136-138: Coverage reporting doesn't include the newly added pilot/hooks/tests/ directory.

Tests in pilot/hooks/tests/ will run, but --cov=installer --cov=launcher won't measure coverage for pilot.hooks code. If coverage data for that module matters, add --cov=pilot.hooks (or --cov=pilot).

-          python3 -m pytest installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/ -v \
-            --cov=installer --cov=launcher \
+          python3 -m pytest installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/ -v \
+            --cov=installer --cov=launcher --cov=pilot.hooks \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/release.yml around lines 136 - 138, The pytest command in
the release workflow runs tests under pilot/hooks/tests but does not collect
coverage for that package; update the pytest invocation (the line starting with
"python3 -m pytest installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/
-v \") to include coverage for the pilot package by adding either
--cov=pilot.hooks or --cov=pilot so the pilot.hooks code is measured alongside
--cov=installer and --cov=launcher.

installer/steps/dependencies.py (1)

385-388: Consider extracting the repeated best-effort pattern into a small helper.

The same 3-line block appears four times. A tiny wrapper would reduce duplication and centralize the message:

♻️ Optional refactor

+def _best_effort_model_setup(ui: Any = None, **kwargs: Any) -> None:
+    """Attempt local model download; log info and continue on failure."""
+    if not _setup_vexor_local_model(ui, **kwargs):
+        if ui:
+            ui.info("Embedding model will download on first use")

Then each call site becomes _best_effort_model_setup(ui) or _best_effort_model_setup(ui, device="mlx").

Also applies to: 410-413, 421-424, 427-430

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@installer/steps/dependencies.py` around lines 385 - 388, There’s a repeated
3-line “best-effort” pattern after model setup calls (example: the block that
checks if not _setup_vexor_local_model(ui): if ui: ui.info("Embedding model will
download on first use") return True) — extract that into a small helper (e.g.
_best_effort_model_setup(ui, device=None)) that runs the call, logs the
"Embedding model will download on first use" message when the setup returns
False, and returns the boolean result; then replace the four duplicated blocks
(the call sites around _setup_vexor_local_model(ui) and the other three similar
setup calls) with a single call to the helper, passing device or context when
needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/release.yml:
- Around line 136-138: The pytest command in the release workflow runs tests
under pilot/hooks/tests but does not collect coverage for that package; update
the pytest invocation (the line starting with "python3 -m pytest
installer/tests/unit/ launcher/tests/unit/ pilot/hooks/tests/ -v \") to include
coverage for the pilot package by adding either --cov=pilot.hooks or --cov=pilot
so the pilot.hooks code is measured alongside --cov=installer and
--cov=launcher.

In `@installer/steps/dependencies.py`:
- Around line 385-388: There’s a repeated 3-line “best-effort” pattern after
model setup calls (example: the block that checks if not
_setup_vexor_local_model(ui): if ui: ui.info("Embedding model will download on
first use") return True) — extract that into a small helper (e.g.
_best_effort_model_setup(ui, device=None)) that runs the call, logs the
"Embedding model will download on first use" message when the setup returns
False, and returns the boolean result; then replace the four duplicated blocks
(the call sites around _setup_vexor_local_model(ui) and the other three similar
setup calls) with a single call to the helper, passing device or context when
needed.

The directory existence check fails in CI due to git-crypt decryption creating placeholder files. The remaining 7 tests already verify Search is fully removed from exports, imports, routes, and navigation.

claude · 2026-02-20T08:50:12Z

Claude finished @maxritter's task in 42s —— View job

Incremental Review

Analyzing new push (changes to console/tests/ui/search-removal.test.ts)
Review delta changes
Post findings

New push: `console/tests/ui/search-removal.test.ts`

Change: Removed the flaky existsSync directory-presence test and its now-unused existsSync import.

- import { readFileSync, existsSync } from "fs";
+ import { readFileSync } from "fs";

- it("Search view directory no longer exists", () => {
-   const searchViewExists = existsSync("src/ui/viewer/views/Search");
-   expect(searchViewExists).toBe(false);
- });

No new issues introduced. This is a correct cleanup — testing for the absence of a directory via existsSync is brittle (path-sensitive, can vary by environment) and redundant when the content-based tests already verify no references to SearchView remain in the codebase.

Still open (from prior review)

pilot/hooks/tests/test_tool_redirect.py — _run_with_input defined identically in 3 test classes
installer/steps/dependencies.py — _get_forced_claude_version() reads only ~/.claude/settings.json, not the new ~/.claude/pilot/settings.json
installer/steps/claude_files.py — process_settings() exists at module level but appears unused outside tests
.githooks/pre-commit — tail -5 may truncate useful failure context
.github/workflows/release.yml:136-138 — pilot/hooks/tests/ runs in CI but no --cov=pilot.hooks coverage tracking

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

console/tests/ui/search-removal.test.ts (1)
49-56: ⚠️ Potential issue | 🟡 Minor

Test name claims to verify the 'g r' key binding, but no assertion covers it.

The description "keyboard shortcuts do not include 'g r' for Search" implies the shortcut key itself is checked, yet the assertions only look for the human-readable label ("Go to Search") and the action URI ("navigate:/search"). The actual "g r" token in shortcuts.ts is never asserted, leaving a silent coverage gap for the key binding.
🔧 Proposed fix — assert the key string or tighten the test name

Option A — add the missing key assertion:
  it("keyboard shortcuts do not include 'g r' for Search", () => {
    const source = readFileSync(
      "src/ui/viewer/constants/shortcuts.ts",
      "utf-8"
    );
    expect(source).not.toContain("Go to Search");
    expect(source).not.toContain("navigate:/search");
+   expect(source).not.toContain('"g r"');   // or whatever literal is used in shortcuts.ts
  });
Option B — rename to match what is actually tested:
- it("keyboard shortcuts do not include 'g r' for Search", () => {
+ it("keyboard shortcuts do not include Search label or navigate:/search action", () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@console/tests/ui/search-removal.test.ts` around lines 49 - 56, The test
"keyboard shortcuts do not include 'g r' for Search" fails to assert the actual
key token; update the test in search-removal.test.ts to either (A) add an
assertion that the shortcuts source (from
"src/ui/viewer/constants/shortcuts.ts") does not contain the key string "g r"
(or whatever exact token is used for that binding) in addition to the existing
checks for "Go to Search" and "navigate:/search", or (B) change the test name to
reflect that it only checks for the label and URI (e.g., "keyboard shortcuts do
not include Search label or action"). Locate the test function (it("keyboard
shortcuts do not include 'g r' for Search", ...)) and modify it accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@console/tests/ui/search-removal.test.ts`:
- Around line 49-56: The test "keyboard shortcuts do not include 'g r' for
Search" fails to assert the actual key token; update the test in
search-removal.test.ts to either (A) add an assertion that the shortcuts source
(from "src/ui/viewer/constants/shortcuts.ts") does not contain the key string "g
r" (or whatever exact token is used for that binding) in addition to the
existing checks for "Go to Search" and "navigate:/search", or (B) change the
test name to reflect that it only checks for the label and URI (e.g., "keyboard
shortcuts do not include Search label or action"). Locate the test function
(it("keyboard shortcuts do not include 'g r' for Search", ...)) and modify it
accordingly.

- Check ~/.claude/pilot/settings.json in _get_forced_claude_version fallback - Add --cov=pilot.hooks to CI pytest coverage tracking - Extract duplicated _run_with_input helper to module-level function - Increase pre-commit tail output from 5 to 20 lines - Add missing "g r" key assertion in search-removal test

claude · 2026-02-20T09:08:21Z

Incremental Review

Analyzing new push
Review changed files
Post findings

View job run

## [6.9.1](v6.9.0...v6.9.1) (2026-02-20) ### Bug Fixes * consolidate test infrastructure, harden parallel spec workflows ([#80](#80)) ([df6fd5a](df6fd5a)) * make vexor model pre-download best-effort during installation ([4b2f2da](4b2f2da)) * use sys.executable instead of uv in spec_validators tests ([9145e19](9145e19))

github-actions · 2026-02-20T09:42:05Z

🎉 This PR is included in version 6.9.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

fix: remove flaky existsSync check from search-removal test

b3c1e6f

The directory existence check fails in CI due to git-crypt decryption creating placeholder files. The remaining 7 tests already verify Search is fully removed from exports, imports, routes, and navigation.

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

maxritter merged commit df6fd5a into main Feb 20, 2026
5 of 11 checks passed

maxritter deleted the dev branch February 20, 2026 09:08

github-actions bot added the released label Feb 20, 2026

coderabbitai bot mentioned this pull request Feb 24, 2026

feat: rename Claude Pilot to Pilot Shell #86

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: consolidate test infrastructure, harden parallel spec workflows#80

fix: consolidate test infrastructure, harden parallel spec workflows#80
maxritter merged 3 commits intomainfrom
dev

maxritter commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Review failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

claude bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

claude bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maxritter commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

vercel bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Test Consolidation (pilot/tests/hooks/ → pilot/hooks/tests/)

Installer: Settings Migration (claude_files.py:526-571)

Hook Hardening (tool_redirect.py)

Pre-commit Hook (.githooks/pre-commit)

Parallel Spec / Playwright Session Isolation

CI Pipeline (.github/workflows/release.yml)

Summary

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Incremental Review

New push: console/tests/ui/search-removal.test.ts

Still open (from prior review)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Incremental Review

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maxritter commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Feb 20, 2026 •

edited

Loading

claude bot commented Feb 20, 2026 •

edited

Loading

Test Consolidation (`pilot/tests/hooks/` → `pilot/hooks/tests/`)

Installer: Settings Migration (`claude_files.py:526-571`)

Hook Hardening (`tool_redirect.py`)

Pre-commit Hook (`.githooks/pre-commit`)

CI Pipeline (`.github/workflows/release.yml`)

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

claude bot commented Feb 20, 2026 •

edited

Loading

New push: `console/tests/ui/search-removal.test.ts`

claude bot commented Feb 20, 2026 •

edited

Loading