Skip to content

test(e2e): expand agent-pane coverage, add release report, fix harness flakiness#356

Merged
vanzue merged 41 commits into
mainfrom
dev/vanzue/e2e-coverage-and-stability
Jun 25, 2026
Merged

test(e2e): expand agent-pane coverage, add release report, fix harness flakiness#356
vanzue merged 41 commits into
mainfrom
dev/vanzue/e2e-coverage-and-stability

Conversation

@vanzue

@vanzue vanzue commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What

Expands and hardens the ItE2E end-to-end suite, adds a release-report generator, and
fixes the test-harness flakiness that was producing spurious failures. Net result on the full
Feature run: 0 failed (down from 9), with honest skips only.

All changes are under test/e2e/ (PowerShell harness + tests) — no product code is touched.

Coverage added / strengthened

  • Strengthened weak agent-pane tests (were token assertions):
    • /model now opens and asserts the real model picker (not just "menu renders").
    • Open/hide exercised at all four pane positions (was right+bottom only).
    • Tab-close now asserts the pre-warmed helper is actually torn down (descendant wta.exe count).
    • Real Shift+Enter via a win32-input-mode sequence (new Send-AgentShiftEnter), instead of substituting a plain Enter.
  • New suites:
    • Feature.ShellIntegration.Tests.ps1 — §3 OSC 133 marks (success/failure) + non-integrated cmd.exe safety (deterministic).
    • Feature.AgentProposedCommand.Tests.ps1 — §2 agent-proposed command Insert/Run into the shell pane via the non-autofix chat path.
    • Feature.AgentMatrix.Tests.ps1 — §2/§3 Claude/Codex/Gemini chat + autofix through the ACP adapter, per-CLI auth-gated (runs only when the CLI is installed and authenticated, else skips with the reason recorded).
    • §2 "View switch preserves input" (draft survives a session-view round-trip) in Feature.SessionList.Tests.ps1.

Release report generator

test/e2e/New-ReleaseReport.ps1 (+ release-coverage-map.psd1) turns doc/release-check-list.md
into a clean, human-facing report driven purely by test results:

  • all [UT✓]/[E2E]/[MANUAL] jargon stripped,
  • [x] = verified by automation (UT or E2E), ⚠️ AUTOMATION FAILED = a test ran and failed, plain [ ] = verify manually,
  • conservative mapping (unmapped → manual, never a false [x]); merges multiple results files.

Latest run: 104 verified / 0 failed / 131 manual (of 235).

Stability fixes (the flaky-failure root causes)

  • Stale-instance cleanupStart-Terminal now closes any leftover IT window (store + dev) before launching, so a prior test's crashed BeforeAll can't leave a window that the single-instance launch attaches to in a broken state (new-tabCreateTab E_FAIL 0x80004005) or that collides on the shared per-brand COM CLSID.
  • Deterministic Wait-AgentReady — judges readiness by the user-visible connected input placeholder ("Ask anything, / for commands..", rendered only in ConnectionState::Connected), not by an internal session-registry artifact, and returns the instant it's observed. Fixes the "agent-pane readiness" timeout flake on initial connect and /restart/settings reconnect.
  • WSL-autofix Describe now skips (try/catch) instead of failing when a build can't create a wsl.exe tab; Shift+Enter injection retried under load.

Verification

Full Feature run pass fail skip
before stale-cleanup 80 9 8
after stale-cleanup 91 3 3
after readiness fix 89 0 8

The 8 skips are honest/environment-gated (gemini unauthenticated, WSL has no in-distro shell
integration, wta sessions list identity-gated, and autofix LLM-variance where the agent
returns an explanation instead of a card).

Follow-ups (not in this PR)

  • Filed Autofix card not dismissed by Esc (intermittent; E2E-discovered) #346 — autofix card not dismissed by Esc (intermittent; E2E-discovered).
  • The autofix LLM-variance skips (agent returns "explain", not a card) are non-deterministic and
    slow; the planned mock ACP agent — Form B (doc/specs/mock-acp-agent.md,
    mock-acp-agent.exe as a custom: agent) would make that surface deterministic.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

vanzue and others added 12 commits June 24, 2026 11:30
…oposed-command suites

- Remove redundant fixed Start-Sleep in AutofixPane (rely on poll-based Assert-Pane).
- Strengthen /model picker, all-four pane positions, helper-cleanup, real Shift+Enter
  (new Send-AgentShiftEnter win32-input-mode helper).
- Add Feature.ShellIntegration.Tests.ps1 (OSC 133 marks + cmd.exe missing-integration safety).
- Add Feature.AgentProposedCommand.Tests.ps1 (non-autofix chat Insert/Run recommendation card).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the post-merge convention so the new ShellIntegration and
AgentProposedCommand suites honor ITE2E_PACKAGE instead of hardcoding Store.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a deterministic SessionList test: a typed-but-unsubmitted draft survives a
round-trip through the session view (open + Esc back to chat). No LLM involved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-gated)

Cover §2 Claude/Codex/Gemini chat through the IT agent pane's ACP adapters. Each
per-CLI Context runs only when the CLI is installed AND authenticated (print-mode
auth probe at discovery), else skips with the reason recorded. Verified live:
Claude + Codex pass, Gemini skips (installed but unauthenticated here).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…x/Gemini)

New-ReleaseReport.ps1 turns the release checklist into a clean human-facing report
driven purely by test results: tags (UT/E2E/MANUAL) stripped; [x]=automation passed,
'AUTOMATION FAILED'=test failed, plain [ ]=not covered, verify manually. Mapping is
title-substring + a curated override map (release-coverage-map.psd1), conservative by
design (unmapped -> manual, never a false [x]).

Also extend the agent matrix with a per-CLI autofix case (§3 Autofix with Claude/Codex/Gemini).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- New-ReleaseReport.ps1 accepts multiple -ResultsXml (later overrides earlier per
  test name), so an isolated re-run of a flaky suite layers onto the full run.
- release-coverage-map.psd1: map a few passing tests whose names differ from their
  checklist titles (Focus hotkey, Model control/changes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…b E_FAIL flake)

A prior test whose AfterAll/Stop-Terminal didn't run (e.g. a BeforeAll that threw)
leaves an IT window behind. The single-instance AUMID launch then hands off to that
stale, often half-initialised window (Launched=false), so the harness drives a broken
instance where new-tab returns CreateTab E_FAIL (0x80004005); and because the store and
dev packages share one per-brand COM CLSID, a stale window of the OTHER package steals
wtcli's CoCreateInstance and misroutes every call.

Start-Terminal now calls Stop-StaleItInstances first, closing every leftover IT window
(store + dev, matched by *IntelligentTerminal* install location only — never the user's
stock WT) so each launch is deterministic and freshly-owned. Verified: a simulated
leftover is cleaned and the fresh instance's new-tab wsl.exe succeeds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- WSL autofix Describe: wrap the WSL-pane setup in try/catch so a build that can't create a
  wsl.exe tab via the protocol (stale dev pkg predating OSC 9001 -> CreateTab E_FAIL) SKIPS
  via the per-It guards instead of failing the Describe in BeforeAll.
- Shift+Enter on a live session row: skip if no selectable row; retry the raw win32-input
  keystroke up to 3x while polling for the view to dismiss (the injection can drop under load).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…st map

- Generator: an originally-ticked [x] item is verified by an automated unit test, so credit
  it as passed (unless a mapped E2E test failed). Unit tests are automation; the human needn't
  re-verify them.
- Map: drop backticks from keys (the report strips them from titles, so backticked keys never
  matched -> false manual); add /model, Shift+Enter, Autofix-with-Copilot, FRE auto-error
  on-variants, session-mgmt-choice-persists, packaging/logging name mismatches.
- Net on the last full run: 79 -> 104 verified, 156 -> 130 manual.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion)

The agent-pane-readiness flake: Wait-AgentReady matched the helper-log 'acp_initialize'
marker, which fires several seconds BEFORE the helper writes its session origin
('recording agent-pane session origin') -> the jsonl that Get-AgentPaneSession reads. So it
returned ready too early and the next agent-pane call (Send-AgentKey/Open-SessionList) raced
a not-yet-written record and timed out. The agent_status connected/failed event is NOT
broadcast to wtcli listen (verified), so events can't be used.

Wait-AgentReady now polls Get-AgentPaneSession (the exact precondition every primitive needs:
a recorded, running pane session) and returns the instant it resolves — deterministic, not a
fixed delay — for both the initial connect and a reconnect after /restart or a settings-driven
rebuild (newest running record wins). A logged auth/fatal failure short-circuits. The
AgentRestart test now waits for reconnect-readiness after the settings change before driving
the menu. Verified: 3/3 consecutive green (the test previously flaked on a 20s timeout).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… registry

Per review: gating Wait-AgentReady on Get-AgentPaneSession (which reads the
agent-pane-sessions.jsonl session registry) is verifying a feature with that same feature —
if the registry breaks, the gate false-readies or hangs and masks the bug.

Wait-AgentReady now matches the agent pane buffer for the connected input placeholder
('Ask anything, / for commands..'), which the TUI renders ONLY in ConnectionState::Connected
(ui/input.rs:62; the connecting/disconnected placeholders are distinct strings). That is the
user-visible ground truth of 'ready to chat', independent of the session-tracking feature, and
still returns the instant it's observed (deterministic). Auth/fatal log markers short-circuit.
Verified: AgentRestart 2/2 green (initial connect + reconnect).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 24, 2026 14:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands and hardens the PowerShell-driven ItE2E end-to-end suite under test/e2e/, adds a release-report generator that maps checklist items to test outcomes, and reduces harness/test flakiness by replacing fixed sleeps with polling and by proactively cleaning up stale Intelligent Terminal instances before launch.

Changes:

  • Added new E2E suites for shell integration (OSC 133), agent-proposed command Insert/Run, and an auth-gated multi-agent (Claude/Codex/Gemini) matrix.
  • Hardened existing agent-pane coverage (pane positions, /model picker, draft preservation, real Shift+Enter injection) and reduced fixed-delay sleeps in favor of polling.
  • Added New-ReleaseReport.ps1 + a coverage map to generate a checklist-like release report from NUnit/Pester results.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/e2e/tests/Feature.ShellIntegration.Tests.ps1 New suite validating OSC 133 success/failure marks and cmd.exe “no shell integration” safety.
test/e2e/tests/Feature.SessionList.Tests.ps1 Adds a deterministic assertion that draft input survives a session-view round-trip.
test/e2e/tests/Feature.AutofixPane.Tests.ps1 Replaces fixed sleeps with polling; wraps WSL setup to skip cleanly when unsupported.
test/e2e/tests/Feature.AgentRestart.Tests.ps1 Adds readiness gating around settings-driven reconnect and uses real Shift+Enter injection with retries.
test/e2e/tests/Feature.AgentProposedCommand.Tests.ps1 New suite covering Insert/Run recommendation cards via the non-autofix chat path.
test/e2e/tests/Feature.AgentPaneInteraction.Tests.ps1 Expands open/hide coverage across all pane positions; strengthens /model assertions; verifies helper teardown on tab close.
test/e2e/tests/Feature.AgentMatrix.Tests.ps1 New auth-gated Claude/Codex/Gemini chat + autofix coverage through the ACP adapter.
test/e2e/release-coverage-map.psd1 New checklist-title → test-name regex mapping for release report generation.
test/e2e/README.md Updates suite inventory and status/coverage description to reflect new tests and gating behavior.
test/e2e/New-ReleaseReport.ps1 New script to generate a human-facing release checklist report from NUnit/Pester results + mapping.
test/e2e/ItE2E/Public/Harness.ps1 Adds stale-instance cleanup and makes Start-Terminal always clear stale IT instances before config/launch.
test/e2e/ItE2E/Public/AgentInput.ps1 Adds Send-AgentShiftEnter using win32-input-mode raw sequences.
test/e2e/ItE2E/Public/Agent.ps1 Reworks Wait-AgentReady to gate on user-visible connected placeholder instead of internal artifacts.
test/e2e/ItE2E/ItE2E.psm1 Exports new public harness/input helpers (Stop-StaleItInstances, Send-AgentShiftEnter).

Comment thread test/e2e/ItE2E/Public/Agent.ps1 Outdated
…ilot review)

Match 'Ask anything … for commands' in order on one line instead of either fragment
anywhere in the captured scrollback, so stray transcript/help text can't false-positive the
readiness gate. Verified: AgentRestart 2/2 green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comment thread test/e2e/tests/Feature.AgentProposedCommand.Tests.ps1
…pilot review)

Split the Insert and Run cases into separate Describes, each with its own fresh terminal
(matching Feature.AutofixPane). With the shared terminal, a prior card's
'Run command'/'Insert in Terminal' text lingered in the scrollback and could co-occur with the
next case's marker (echoed in the prompt) to false-positive the card-readiness check before a
fresh card rendered. Verified: both cases 2/2 green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comment thread test/e2e/ItE2E/Public/Harness.ps1 Outdated
Comment thread test/e2e/tests/Feature.AgentMatrix.Tests.ps1 Outdated
Comment thread test/e2e/tests/Feature.AgentMatrix.Tests.ps1 Outdated
…ss comment (Copilot review)

- AgentMatrix: report a PRECISE per-agent skip reason (not installed vs installed-but-
  unauthenticated vs package missing) via Set-ItResult instead of a boolean Context -Skip, so CI
  shows why; no terminal is launched when skipping. Re-checks package presence in BeforeAll
  because a script-scoped var from BeforeDiscovery does not persist into the run phase (only the
  -ForEach data does).
- AgentMatrix: chat assertion uses a word-boundary match instead of a bare '7'.
- Harness: correct the Stop-StaleItInstances comment - it makes -ColdStart redundant, but
  -ShowFre still controls whether the FRE overlay is shown.

Verified: Claude/Codex chat+autofix pass; Gemini skips with "installed but not authenticated".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

vanzue and others added 2 commits June 24, 2026 23:38
…te checklist

Per design decision: Copilot is the primary agent and its full behaviour (chat, autofix,
insert/run, permission, render, slash, sessions) is covered in depth by the copilot-only
suites. All built-in agents share the same agent-pane -> helper -> master -> agent-CLI (ACP)
path; the only per-agent difference is the spawned command. So we stop re-testing every
behaviour per agent.

- Feature.AgentMatrix.Tests.ps1: collapsed from a per-agent (Claude/Codex/Gemini) x
  chat+autofix matrix into ONE consolidated test case that, for each installed+authenticated
  non-Copilot agent, does a single connect + chat round-trip in its own fresh terminal; skips
  when none is available.
- doc/release-check-list.md: collapsed the per-agent items (Claude/Codex/Gemini chat,
  autofix, delegate, installed, hook-install; and the custom-agent behavioural items) into
  single consolidated items, keeping Copilot as the primary and the config/selection/tracking
  items. Total 235 -> 220 items.
- release-coverage-map.psd1 + README updated to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tention, /sessions)

Adds three high-confidence, deterministic E2E cases (no mock/agent variance)
plus one mapping fix, expanding genuine release-checklist coverage:

- §9 "WT_COM_CLSID is injected": read $env:WT_COM_CLSID back from a shell pane
  and assert a braced CLSID, proving WT injects protocol discovery into panes.
- §10 "Old log cleanup is safe": seed a sentinel in the running version's log
  dir + a stale other-version dir, restart the build, assert the running
  version's logs survive and the stale version dir is pruned wholesale.
- §4 "Slash command works": /sessions opens the session view (the command-menu
  path, complementing the existing button path).
- §10 "Early startup failures are logged": coverage-map override (the test
  "...would be logged" already exists and passes).

All three new cases validated live against the Store package.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 00:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comment thread test/e2e/ItE2E/Public/Agent.ps1
Comment thread test/e2e/tests/Feature.AutofixPane.Tests.ps1
…ader ctor throws (Copilot review)

- Feature.AutofixPane: CardShown and the two other card-detection predicates (lines 30/102/271)
  hard-coded the English "Run command|Insert in Terminal" labels, so they'd mis-skip/fail on
  non-en-US machines. Added an exported Get-RecommendationCardRegex helper (EITHER button label,
  localized across all bundled locales via Get-WtaLocalizedTextRegex, en-US fallback) and routed
  all three through it. Verified it matches the en-US card line; the variance-skip path still works
  live.
- Wait-AgentReady: if the StreamReader ctor throws, $fs was left undisposed (file-handle leak).
  Wrapped the reader in a nested try/finally so $fs is always disposed (double-dispose after the
  reader closes it is a safe no-op).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Comment thread test/e2e/ItE2E/Public/Agent.ps1
…tRegex (Copilot review)

Double-quoted YAML scalars were captured raw, so a value containing \" (e.g. setup.subtitle.*
"Your agent \"%{agent}\" …") kept the literal backslashes — the generated regex then looked for
backslashes absent from the rendered UI text, breaking locale-robust assertions for such keys.
Now the double-quoted branch unescapes \" \\ \n \t \r (\x -> x) before the value is regex-escaped.

Verified: setup.subtitle.copilot_missing no longer yields a regex containing \" (the escape is
resolved to a literal "), while the keys the tests actually use (no backslashes) are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment thread test/e2e/tests/Feature.AutofixPane.Tests.ps1 Outdated
Comment thread test/e2e/tests/Feature.AutofixPane.Tests.ps1 Outdated
Comment thread test/e2e/tests/Feature.AgentProposedCommand.Tests.ps1 Outdated
Comment thread test/e2e/tests/Feature.AgentProposedCommand.Tests.ps1 Outdated
…ll (Copilot review)

Four BeforeAll blocks piped Wait-AgentReady to Out-Null, discarding its boolean — so an auth/fatal
connect failure would proceed in a not-ready state and surface later as opaque card-polling
failures. Assert | Should -BeTrue with a clear -Because in all four (AutofixPane card-render +
AutofixPane WSL setup, AgentProposedCommand Insert + Run), so a failed/again-auth connect fails
immediately and attributably. (The WSL one is inside the best-effort try/catch, so a readiness
failure there is logged and degrades to a skip via the existing per-It $wslShell guards.) Verified
live: the Insert BeforeAll assertion passes when copilot connects.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comment thread test/e2e/ItE2E/Public/SessionList.ps1 Outdated
Comment thread test/e2e/tests/Feature.ShellIntegration.Tests.ps1 Outdated
…lure mark (Copilot review)

- Get-SessionViewRenderRegex: matched the FULL localized agents.footer_hint, but the TUI
  end-truncates that hint to the pane width (agents_view.rs render_footer_hint -> trunc), so the
  full line may never appear and Open/Test-SessionListShown could time out. Every bundled locale
  leads the hint with the invariant nav arrows "↑ ↓" (en "(↑ ↓ to navigate …)", zh "(↑ ↓ 导航 …)"),
  and being at the start they survive truncation — so match those (en-US footer words kept as an
  extra fallback). Verified live: the rendered footer matches; the slash-/sessions path is green.
- Feature.ShellIntegration failure-mark test: Wait-WtCommandFailure listened to the global
  vt_sequence stream, so an unrelated OSC 133;D mark could satisfy it. The event's `pane_id`
  equals the pane session_id (Get-ActivePane.session_id), whereas its `tab_id` is a GUID and
  Get-ActivePane/Get-WtTabs expose tab_id only as a numeric INDEX — so added a -PaneId filter to
  Wait-WtCommandFailure and scoped the assertion to the active pane. Verified live: passes scoped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread test/e2e/tests/Feature.AutofixPane.Tests.ps1
Comment thread test/e2e/tests/Feature.AgentRestart.Tests.ps1 Outdated
…ish reconnect probe (Copilot review)

- Feature.AutofixPane Run-action: the card-detection predicate still hard-coded the English
  "Run command" label (missed in the earlier sweep). Routed it through Get-RecommendationCardRegex
  like the other card-detection sites so it's locale-robust.
- Feature.AgentRestart: removed the post-/restart `Test-Until … -match 'Ask anything|Copilot|Agent'`
  reconnect probe — it matched hard-coded English (not locale-robust) and was redundant with the
  Wait-AgentReady | Should -BeTrue gate immediately after, which is the deterministic
  reconnect-and-ready signal. Verified live: the restart case still passes (37s).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread test/e2e/tests/Feature.AutofixPane.Tests.ps1
Comment thread test/e2e/release-coverage-map.psd1 Outdated
…opilot restart (Copilot review)

The override regex 'connects and answers' for "Non-Copilot agents chat works" also matched the
Copilot restart test name "(/restart reconnects and answers)" — "reconnects and answers" contains
"connects and answers" — which could credit the checklist item from the wrong test in the report.
Anchored on "non-Copilot agent.*connects and answers" so it uniquely matches the AgentMatrix case.
Verified: matches the AgentMatrix name, does NOT match the Copilot restart name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Comment thread test/e2e/ItE2E/Public/SessionList.ps1
@vanzue vanzue merged commit 1b2fce9 into main Jun 25, 2026
8 of 10 checks passed
@vanzue vanzue deleted the dev/vanzue/e2e-coverage-and-stability branch June 25, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants