Replace mobile-scan with platform-agnostic CI outer-loop failure scanner by kotlarmilos · Pull Request #127824 · dotnet/runtime

kotlarmilos · 2026-05-05T19:23:55Z

Summary

Generalizes the existing .github/workflows/mobile-scan.md (Apple mobile + Android only, daily, sonnet-4.5, helix-centric) into .github/workflows/ci-failure-scan.md. The new workflow scans every public outer-loop pipeline on dnceng-public/public, classifies each failure (build break vs test failure vs infra), and converges the pipeline to green by either filing tracking issues, filing Known Build Errors that Arcade Build Analysis can auto-match, opening companion PRs that skip the failing test against an existing tracking issue, or opening small product-fix PRs when a localized root cause is clear.

What changed

Aspect	Before	After
Pipelines scanned	runtime-extra-platforms (154) only	+ JIT/GC/PGO stress, libraries-jitstress, jit-experimental, ilasm, jit-cfg, superpmi-replay, randomized stress (109–160, 230, 235)
Cadence	daily	every 12h (fuzzy schedule)
Model	claude-sonnet-4.5	claude-sonnet-4.6
Skill routing	mobile-platforms only	mobile-platforms (Apple/Android/WASM), jit-regression-test + ci-pipeline-monitor (JIT/GC/PGO), extensions-review / system-net-review where applicable
Failure classification	helix-workitem-only (silently no-op'd on build breaks)	explicit walk: build break (Send-to-Helix skipped) vs Phase-only failure vs Helix workitem vs infra
Outcomes	per-test PR or tracking issue	+ Known Build Error issue (Arcade Build Analysis JSON format, exact 3-backtick fences) + companion skip-PR (csproj `<GCStressIncompatible>` for stress-incompatible JIT tests; `[ActiveIssue(..., TestPlatforms.<plat>)]` for unit tests) + small product-fix PR (≤20 lines, single file, non-API, non-JIT/GC/threading/security, with the failing test as evidence)
Convergence	none — same failure re-issued each run	two-pass: run N files tracking issue, run N+1 finds existing issue + still-failing test, opens companion PR scoped to allowed paths
PR `allowed-files`	`src/libraries//tests/`	`src/libraries/`, `src/coreclr/`, `src/mono/`, `src/tests/`, `src/native/`, `eng/testing/`
Title discipline	none	every issue/PR title starts with `[ci-scan]` ; titles use "Skip"/"Disable"/"Suppress", never "Mute"
Coverage discipline	none — picks failures opportunistically	per-pipeline tally files; every signature is recorded as filed-issue / filed-PR / reused-existing / skipped-with-reason
Caps	5 PRs / 3 issues	10 PRs / 5 issues
Filename	`mobile-scan.{md,lock.yml}`	`ci-failure-scan.{md,lock.yml}`

Test runs

The test runs produced the following issues:
#127817
#127827
#127828
#127829
#127830
#127831

Security

No new secrets and no new actions are introduced relative to the workflow. The only changes are inside the engine, tools, safe-outputs, network, and prompt sections of the markdown. The PR allowed-files widening is the surface-area change: it lets the agent edit any path under src/libraries, src/coreclr, src/mono, src/tests, src/native, and eng/testing/** — including product/runtime source — to enable small, well-localized product fixes. The protected-files: blocked policy still prevents touching package.json, lockfiles, global.json, NuGet.Config, Directory.Packages.props, CODEOWNERS, and .github/ / .agents/ paths. CODEOWNERS-mandated review remains the hard gate before any merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…uilds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-05-05T19:25:07Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR replaces the prior mobile-only CI failure scanning workflow prompt with a platform-agnostic “CI Outer-Loop Failure Scanner” that targets multiple outer-loop pipelines and supports broader failure classification and convergence actions (issues, Known Build Errors, and muting PRs).

Changes:

Removes the old .github/workflows/mobile-scan.md prompt and introduces a new generalized .github/workflows/ci-failure-scan.md prompt with expanded pipeline coverage and updated run cadence.
Updates the corresponding generated workflow .github/workflows/ci-failure-scan.lock.yml to reflect the new workflow identity, schedule, model, and expanded safe-outputs configuration.

Show a summary per file

File	Description
.github/workflows/mobile-scan.md	Deleted the mobile-only scanner prompt (replaced by platform-agnostic scanner).
.github/workflows/ci-failure-scan.md	New platform-agnostic scanner prompt (pipelines list, classification rules, Known Build Error guidance, convergence/muting guidance).
.github/workflows/ci-failure-scan.lock.yml	Regenerated/updated locked workflow to match the new prompt, schedule, model, and safe-outputs settings.

Copilot's findings

Comments suppressed due to low confidence (1)

.github/workflows/ci-failure-scan.lock.yml:418

The Safe Outputs protection list is inconsistent: config.json's protected_files omits AGENTS.md, but GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG includes it later in the workflow. This likely leaves AGENTS.md unprotected for PR patches depending on which config is enforced. Align the two protected file lists (e.g., add AGENTS.md to the config.json list as well).

Files reviewed: 3/3 changed files
Comments generated: 2

…erage discipline Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… small product fixes, schedule every 12h Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vitek-karas

I like this - my main remaining question is the approach we want to take:

Do we prefer muting PRs (adding ActiveIssue or similar) - which are not effective immediately, and require human interaction to take effect.
Or prefer KBEs, which are effective immediately, but they will likely have a muting PR eventually anyway.

Side observation: Creating a KBE which is immediately active (so the issues has the right labels for build analysis to pick it up) is something which so far has been reserved for contributors, since it effectively disables that test/failure for all PRs and potentially allows introducing more breaks into the system by PRs causing new breaks with the same signature.
Right now I don't know what would be better - letting the agent create KBEs, which reduces noise and keeps the PR CI greener, or require human approval (either on the KBE or via a PR) which introduces a delay into the system but avoids remote chance of introduction of further breaks.

JulieLeeMSFT · 2026-05-05T22:28:07Z

@kotlarmilos, please add these pipelines:
| runtime-coreclr gc-simulator | 123 | |
| gc-standalone | 146 | ADO name differs from display name |
| runtime-coreclr crossgen2 | 124 | |
| runtime-coreclr r2r | 120 | |
| runtime-coreclr r2r-extra | 114 | |
| runtime-interpreter | 316 | ADO name differs from display name |
| runtime-libraries-interpreter | 330 | ADO name differs from display name |
| runtime-nativeaot-outerloop | 265 | |
| runtime-diagnostics | 309 | |
| runtime-coreclr outerloop | 108 | |

JulieLeeMSFT · 2026-05-05T22:55:12Z

For JIT pipelines, please add these informations: failed pipelines, console log link, failed test legs (and failed test case names), and error message. You can add other details after that.
Please refer the template in #125685.

…scipline, more pipelines Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n] on KBE title, dedup convergence statement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Comments suppressed due to low confidence (3)

.github/workflows/ci-failure-scan.lock.yml:418

The safe-outputs protected_files list includes "NuGet.Config", but the repo file is "NuGet.config" (lowercase 'c'). On case-sensitive filesystems this would leave NuGet.config unprotected for agent-authored PRs. Please update the protected file entry to match the actual filename (and consider including both casings if you want to be defensive).
.github/workflows/ci-failure-scan.lock.yml:1357
The safe-outputs protected_files list is duplicated (config.json earlier in the workflow, and this GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG). They diverge (e.g., AGENTS.md appears here but not in config.json), which risks future protection gaps. Please make these lists identical (or ensure one source of truth) so protected-file enforcement is consistent.
.github/workflows/ci-failure-scan.lock.yml:1357
This handler config also lists "NuGet.Config" as a protected file, but the repo file is "NuGet.config". On case-sensitive filesystems that means NuGet.config wouldn't be protected unless the casing matches. Please change this entry to "NuGet.config" (and keep it consistent with the config.json protected_files list).

Files reviewed: 3/3 changed files
Comments generated: 1

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

vitek-karas

Thanks a lot - this looks great.

Copilot

Copilot's findings

Files reviewed: 3/3 changed files
Comments generated: 4

kotlarmilos · 2026-05-06T11:38:25Z

@vitek @JulieLeeMSFT Latest run created these:

[ci-scan] Known Build Error: GC assertion is_in_heap_range in runtime-interpreter and runtime-libraries-interpreter #127855
[ci-scan] Known Build Error: AsyncProfilerTests.RuntimeAsync_CallstackSimulation fails in runtime-nativeaot-outerloop on osx-arm #127856
[ci-scan] Known Build Error: Microsoft.Extensions.DependencyInjection.Tests crash on windows-arm64 under JIT stress #127857
[ci-scan] Known Build Error: Crossgen2 crashes with exitcode 139 (SIGSEGV) amd The file is not a ReadyToRun image error on osx-arm64 in r2r-extra #127858

vitek-karas · 2026-05-06T11:50:21Z

#127856 - the title is really weird - the safe output probably does something with issue titles which are too long. Maybe we need to instruct the agent to make sure the issue title is reasonably short.

But it's a nit - we can do this later.

- safe-outputs.create-issue.allowed-labels restricted to ["Known Build Error", "blocking-clean-ci"] - safe-outputs.create-pull-request.allowed-labels restricted to [agentic-workflows] - Stripped prose telling the model to add os-*/area-*/arch-* labels; added a single 'Labels (hard restriction)' clause under 'Outputs: title and labels'; KBE label line lists only the two permitted labels - Added 'Signature specificity (mandatory)' subsection: rejects bare exit codes / generic tool-failed verbs / bare exception types; requires assertion text or test+exception message; forbids padding ErrorMessage arrays with generic tokens; instructs model to file a regular issue (not a KBE) when no specific signature exists Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-06T16:37:03Z

🤖 Copilot Code Review — PR #127824

Note

This review was generated by GitHub Copilot.

Holistic Assessment

Motivation: The PR generalizes the existing mobile-only CI failure scanner into a platform-agnostic outer-loop failure scanner covering 30+ pipelines. This is well-motivated — keeping outer-loop CI green requires systematic tracking, and the previous scope (mobile-only, single pipeline) was too narrow.

Approach: The approach evolves the existing gh-aw workflow definition by renaming/expanding the configuration and writing a comprehensive new prompt document. The two-pass KBE → muting-PR flow is a thoughtful design that works within the agent's permission constraints.

Summary: ⚠️ Needs Human Review. The implementation is well-structured and the prompt engineering is thorough, but the significantly expanded allowed-files scope for PRs and the broader write permissions deserve careful human assessment of blast radius risk.

Detailed Findings

⚠️ Expanded `allowed-files` scope — significant blast radius increase

The old workflow restricted PRs to src/libraries/**/tests/** and src/libraries/Common/tests/** (test files only). The new workflow allows modifications across:

src/libraries/** (includes production source, not just tests)
src/coreclr/** (full CoreCLR tree including product code)
src/mono/** (full Mono tree)
src/tests/**
src/native/**
eng/testing/**

While the prompt instructions say "small product fix opportunity" requires ≤ 20 lines, single file, non-API changes, the allowed-files enforcement is at the safe-outputs infrastructure level and permits much broader changes. The protected_path_prefixes only blocks .github/ and .agents/.

Question for maintainers: Is the team comfortable with an automated agent having write access to production source files in src/coreclr/**, src/mono/**, and src/libraries/** (non-test)? The prompt engineering constrains behavior, but the guardrails are soft (prompt-level) not hard (infrastructure-level). A misbehaving or prompt-injected agent could theoretically propose changes to production code within these paths.

✅ Label restriction — good hardening

The addition of allowed_labels on issues ("Known Build Error", "blocking-clean-ci") and on PRs ("agentic-workflows") is a sound security improvement. This prevents the agent from adding arbitrary area/OS/arch labels, which is appropriate given its automated nature.

✅ Two-pass KBE flow — well-designed constraint workaround

The two-pass design (KBE in run N, muting PR in run N+1) elegantly works around the issues: write permission limitation in the agent job. The 12-hour cadence makes the convergence time acceptable for outer-loop pipelines.

✅ Signature specificity rules — prevents overly broad KBEs

The "Signature specificity (mandatory)" section with explicit reject/prefer lists is a strong addition. It prevents generic patterns like exitcode: 139 from silencing unrelated failures.

💡 Coverage discipline documentation — thorough but long

The ci-failure-scan.md prompt is 408 lines. While thoroughness is valuable for an autonomous agent, the complexity increases maintenance burden. Consider whether some of the "Hard environment constraints" documentation could be moved to a shared reference rather than inline.

⚠️ Timeout increase (60 → 90 minutes) — cost/resource concern

With 30+ pipelines to scan, each potentially requiring multiple Helix API calls and log fetches, 90 minutes may still be tight — or conversely, may be too generous for a workflow that runs every 12 hours (7.5 min per pipeline on average). Worth monitoring after deployment.

💡 Model upgrade to claude-sonnet-4.6

The model upgrade from claude-sonnet-4.5 to claude-sonnet-4.6 is reasonable for a more capable agent handling the expanded scope.

✅ Safe-output caps — reasonable scaling

Increasing from 3 issues/5 PRs to 5 issues/10 PRs per run scales proportionally with the ~6× increase in pipeline coverage. The prompt clearly documents cap-hit behavior ("skipped: cap reached").

⚠️ `max_patch_size: 1024` may be insufficient

With the new scope including potential product fixes (up to 20 lines in a single file), the 1024-byte patch size limit inherited from the old configuration may be too tight for some valid fixes. The muting PRs (adding [ActiveIssue] annotations) should fit, but the "small product fix" PRs mentioned in the prompt might not. Consider whether this limit needs adjustment.

✅ Concurrency group rename — clean migration

The rename from mobile-scan to ci-failure-scan in both the concurrency group and workflow ID is consistent throughout the lock file. No stale references remain.

Generated by Code Review for issue #127824 · ◷

Copilot and others added 10 commits May 5, 2026 16:22

Generalize mobile-scan to platform-agnostic CI outer-loop scanner

1dfe322

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ci-failure-scan: handle build breaks, phase-only failures, canceled b…

4db1870

…uilds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Keep mobile-scan filename for dispatch compatibility

98391ac

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Iterate: variable-binding pattern + sonnet-4.6 model

f39fd49

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Iterate: require companion ActiveIssue PRs for stress-mode product bugs

d2c7a42

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix expression validator: drop ${{ }} wrapping around aw_id token

d7eb430

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Drop literal expression syntax from prose

24eb059

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two-pass approach: file issue first, file ActiveIssue PR on next run

f6dc05d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Make muting-PR step mandatory when tracking issue already exists

6e55e00

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Rename to ci-failure-scan to reflect platform-agnostic scope

745f130

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kotlarmilos requested a review from jeffhandley as a code owner May 5, 2026 19:23

Copilot AI review requested due to automatic review settings May 5, 2026 19:23

kotlarmilos requested a review from a team as a code owner May 5, 2026 19:23

Copilot started reviewing on behalf of kotlarmilos May 5, 2026 19:24 View session

github-actions Bot added the area-Infrastructure label May 5, 2026

github-project-automation Bot added this to Runtime Infra May 5, 2026

dotnet-policy-service Bot assigned kotlarmilos May 5, 2026

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md Outdated

Copilot and others added 2 commits May 5, 2026 21:53

Iterate: [ci-scan] prefix on all outputs, no Mute, fix KBE fence, cov…

521872f

…erage discipline Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Temporarily revert filename for branch dispatch testing

44e81be

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 5, 2026 19:54

Copilot started reviewing on behalf of kotlarmilos May 5, 2026 19:54 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated

Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated

Copilot and others added 2 commits May 5, 2026 22:27

Rename to ci-failure-scan.* (final state)

43479b2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Drop production-source paths from allowed-files (test-only scope)

ce1557e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 5, 2026 20:36

Copilot started reviewing on behalf of kotlarmilos May 5, 2026 20:36 View session

Address review: widen allowed-files to product/runtime sources, allow…

09f0a28

… small product fixes, schedule every 12h Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kotlarmilos requested review from JulieLeeMSFT, PureWeen, matouskozak and vitek-karas May 5, 2026 20:44

This comment has been minimized.

Sign in to view

vitek-karas reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

JulieLeeMSFT reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

JulieLeeMSFT reviewed May 5, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md

Address review: KBE-first two-pass flow, JIT issue template, label di…

c4bcd07

…scipline, more pipelines Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vitek-karas reviewed May 6, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md

Address review: explicit Linked KBE line, per-pass YES check, [ci-sca…

e42c9e9

…n] on KBE title, dedup convergence statement Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 6, 2026 11:18

Copilot started reviewing on behalf of kotlarmilos May 6, 2026 11:18 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Potential fix for pull request finding

784fbb6

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 6, 2026 11:27

Copilot started reviewing on behalf of kotlarmilos May 6, 2026 11:27 View session

vitek-karas approved these changes May 6, 2026

View reviewed changes

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md Outdated

This comment has been minimized.

Sign in to view

kotlarmilos merged commit 4da638d into main May 6, 2026
23 checks passed

github-project-automation Bot moved this to Done in Runtime Infra May 6, 2026

kotlarmilos deleted the ci-failure-scan-workflow branch May 6, 2026 16:36

dotnet-maestro Bot mentioned this pull request May 7, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#6497

Merged

Conversation

kotlarmilos commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test runs

Security

Uh oh!

dotnet-policy-service Bot commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

vitek-karas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JulieLeeMSFT commented May 5, 2026

Uh oh!

JulieLeeMSFT commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

vitek-karas left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kotlarmilos commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

vitek-karas commented May 6, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 6, 2026

🤖 Copilot Code Review — PR #127824

Holistic Assessment

Detailed Findings

⚠️ Expanded allowed-files scope — significant blast radius increase

✅ Label restriction — good hardening

✅ Two-pass KBE flow — well-designed constraint workaround

✅ Signature specificity rules — prevents overly broad KBEs

💡 Coverage discipline documentation — thorough but long

⚠️ Timeout increase (60 → 90 minutes) — cost/resource concern

💡 Model upgrade to claude-sonnet-4.6

kotlarmilos commented May 5, 2026 •

edited

Loading

kotlarmilos commented May 6, 2026 •

edited

Loading

⚠️ Expanded `allowed-files` scope — significant blast radius increase

⚠️ `max_patch_size: 1024` may be insufficient