Skip to content

Replace mobile-scan with platform-agnostic CI outer-loop failure scanner#127824

Merged
kotlarmilos merged 19 commits intomainfrom
ci-failure-scan-workflow
May 6, 2026
Merged

Replace mobile-scan with platform-agnostic CI outer-loop failure scanner#127824
kotlarmilos merged 19 commits intomainfrom
ci-failure-scan-workflow

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented May 5, 2026

Summary

Generalizes the existing .github/workflows/mobile-scan.md (Apple mobile + Android only, daily, sonnet-4.5, helix-centric) into .github/workflows/ci-failure-scan.md. The new workflow scans every public outer-loop pipeline on dnceng-public/public, classifies each failure (build break vs test failure vs infra), and converges the pipeline to green by either filing tracking issues, filing Known Build Errors that Arcade Build Analysis can auto-match, opening companion PRs that skip the failing test against an existing tracking issue, or opening small product-fix PRs when a localized root cause is clear.

What changed

Aspect Before After
Pipelines scanned runtime-extra-platforms (154) only + JIT/GC/PGO stress, libraries-jitstress, jit-experimental, ilasm, jit-cfg, superpmi-replay, randomized stress (109–160, 230, 235)
Cadence daily every 12h (fuzzy schedule)
Model claude-sonnet-4.5 claude-sonnet-4.6
Skill routing mobile-platforms only mobile-platforms (Apple/Android/WASM), jit-regression-test + ci-pipeline-monitor (JIT/GC/PGO), extensions-review / system-net-review where applicable
Failure classification helix-workitem-only (silently no-op'd on build breaks) explicit walk: build break (Send-to-Helix skipped) vs Phase-only failure vs Helix workitem vs infra
Outcomes per-test PR or tracking issue + Known Build Error issue (Arcade Build Analysis JSON format, exact 3-backtick fences) + companion skip-PR (csproj <GCStressIncompatible> for stress-incompatible JIT tests; [ActiveIssue(..., TestPlatforms.<plat>)] for unit tests) + small product-fix PR (≤20 lines, single file, non-API, non-JIT/GC/threading/security, with the failing test as evidence)
Convergence none — same failure re-issued each run two-pass: run N files tracking issue, run N+1 finds existing issue + still-failing test, opens companion PR scoped to allowed paths
PR allowed-files src/libraries/**/tests/** src/libraries/**, src/coreclr/**, src/mono/**, src/tests/**, src/native/**, eng/testing/**
Title discipline none every issue/PR title starts with [ci-scan] ; titles use "Skip"/"Disable"/"Suppress", never "Mute"
Coverage discipline none — picks failures opportunistically per-pipeline tally files; every signature is recorded as filed-issue / filed-PR / reused-existing / skipped-with-reason
Caps 5 PRs / 3 issues 10 PRs / 5 issues
Filename mobile-scan.{md,lock.yml} ci-failure-scan.{md,lock.yml}

Test runs

The test runs produced the following issues:
#127817
#127827
#127828
#127829
#127830
#127831

Security

No new secrets and no new actions are introduced relative to the workflow. The only changes are inside the engine, tools, safe-outputs, network, and prompt sections of the markdown. The PR allowed-files widening is the surface-area change: it lets the agent edit any path under src/libraries, src/coreclr, src/mono, src/tests, src/native, and eng/testing/** — including product/runtime source — to enable small, well-localized product fixes. The protected-files: blocked policy still prevents touching package.json, lockfiles, global.json, NuGet.Config, Directory.Packages.props, CODEOWNERS, and .github/ / .agents/ paths. CODEOWNERS-mandated review remains the hard gate before any merge.

Copilot and others added 10 commits May 5, 2026 16:22
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uilds

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kotlarmilos kotlarmilos requested a review from jeffhandley as a code owner May 5, 2026 19:23
Copilot AI review requested due to automatic review settings May 5, 2026 19:23
@kotlarmilos kotlarmilos requested a review from a team as a code owner May 5, 2026 19:23
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the prior mobile-only CI failure scanning workflow prompt with a platform-agnostic “CI Outer-Loop Failure Scanner” that targets multiple outer-loop pipelines and supports broader failure classification and convergence actions (issues, Known Build Errors, and muting PRs).

Changes:

  • Removes the old .github/workflows/mobile-scan.md prompt and introduces a new generalized .github/workflows/ci-failure-scan.md prompt with expanded pipeline coverage and updated run cadence.
  • Updates the corresponding generated workflow .github/workflows/ci-failure-scan.lock.yml to reflect the new workflow identity, schedule, model, and expanded safe-outputs configuration.
Show a summary per file
File Description
.github/workflows/mobile-scan.md Deleted the mobile-only scanner prompt (replaced by platform-agnostic scanner).
.github/workflows/ci-failure-scan.md New platform-agnostic scanner prompt (pipelines list, classification rules, Known Build Error guidance, convergence/muting guidance).
.github/workflows/ci-failure-scan.lock.yml Regenerated/updated locked workflow to match the new prompt, schedule, model, and safe-outputs settings.

Copilot's findings

Comments suppressed due to low confidence (1)

.github/workflows/ci-failure-scan.lock.yml:418

  • The Safe Outputs protection list is inconsistent: config.json's protected_files omits AGENTS.md, but GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG includes it later in the workflow. This likely leaves AGENTS.md unprotected for PR patches depending on which config is enforced. Align the two protected file lists (e.g., add AGENTS.md to the config.json list as well).
  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md Outdated
Copilot and others added 2 commits May 5, 2026 21:53
…erage discipline

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 19:54
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 4

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated
Copilot and others added 2 commits May 5, 2026 22:27
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 20:36
… small product fixes, schedule every 12h

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this - my main remaining question is the approach we want to take:

  • Do we prefer muting PRs (adding ActiveIssue or similar) - which are not effective immediately, and require human interaction to take effect.
  • Or prefer KBEs, which are effective immediately, but they will likely have a muting PR eventually anyway.

Side observation: Creating a KBE which is immediately active (so the issues has the right labels for build analysis to pick it up) is something which so far has been reserved for contributors, since it effectively disables that test/failure for all PRs and potentially allows introducing more breaks into the system by PRs causing new breaks with the same signature.
Right now I don't know what would be better - letting the agent create KBEs, which reduces noise and keeps the PR CI greener, or require human approval (either on the KBE or via a PR) which introduces a delay into the system but avoids remote chance of introduction of further breaks.

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md
@JulieLeeMSFT
Copy link
Copy Markdown
Member

@kotlarmilos, please add these pipelines:
| runtime-coreclr gc-simulator | 123 | |
| gc-standalone | 146 | ADO name differs from display name |
| runtime-coreclr crossgen2 | 124 | |
| runtime-coreclr r2r | 120 | |
| runtime-coreclr r2r-extra | 114 | |
| runtime-interpreter | 316 | ADO name differs from display name |
| runtime-libraries-interpreter | 330 | ADO name differs from display name |
| runtime-nativeaot-outerloop | 265 | |
| runtime-diagnostics | 309 | |
| runtime-coreclr outerloop | 108 | |

@JulieLeeMSFT
Copy link
Copy Markdown
Member

For JIT pipelines, please add these informations: failed pipelines, console log link, failed test legs (and failed test case names), and error message. You can add other details after that.
Please refer the template in #125685.

…scipline, more pipelines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md
…n] on KBE title, dedup convergence statement

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 6, 2026 11:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Comments suppressed due to low confidence (3)

.github/workflows/ci-failure-scan.lock.yml:418

  • The safe-outputs protected_files list includes "NuGet.Config", but the repo file is "NuGet.config" (lowercase 'c'). On case-sensitive filesystems this would leave NuGet.config unprotected for agent-authored PRs. Please update the protected file entry to match the actual filename (and consider including both casings if you want to be defensive).
    .github/workflows/ci-failure-scan.lock.yml:1357
  • The safe-outputs protected_files list is duplicated (config.json earlier in the workflow, and this GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG). They diverge (e.g., AGENTS.md appears here but not in config.json), which risks future protection gaps. Please make these lists identical (or ensure one source of truth) so protected-file enforcement is consistent.
    .github/workflows/ci-failure-scan.lock.yml:1357
  • This handler config also lists "NuGet.Config" as a protected file, but the repo file is "NuGet.config". On case-sensitive filesystems that means NuGet.config wouldn't be protected unless the casing matches. Please change this entry to "NuGet.config" (and keep it consistent with the config.json protected_files list).
  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread .github/workflows/ci-failure-scan.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 6, 2026 11:27
Copy link
Copy Markdown
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot - this looks great.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 4

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md Outdated
@github-actions

This comment has been minimized.

@vitek-karas
Copy link
Copy Markdown
Member

#127856 - the title is really weird - the safe output probably does something with issue titles which are too long. Maybe we need to instruct the agent to make sure the issue title is reasonably short.

But it's a nit - we can do this later.

- safe-outputs.create-issue.allowed-labels restricted to ["Known Build Error", "blocking-clean-ci"]
- safe-outputs.create-pull-request.allowed-labels restricted to [agentic-workflows]
- Stripped prose telling the model to add os-*/area-*/arch-* labels; added a single 'Labels (hard restriction)' clause under 'Outputs: title and labels'; KBE label line lists only the two permitted labels
- Added 'Signature specificity (mandatory)' subsection: rejects bare exit codes / generic tool-failed verbs / bare exception types; requires assertion text or test+exception message; forbids padding ErrorMessage arrays with generic tokens; instructs model to file a regular issue (not a KBE) when no specific signature exists

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kotlarmilos kotlarmilos merged commit 4da638d into main May 6, 2026
23 checks passed
@kotlarmilos kotlarmilos deleted the ci-failure-scan-workflow branch May 6, 2026 16:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🤖 Copilot Code Review — PR #127824

Note

This review was generated by GitHub Copilot.

Holistic Assessment

Motivation: The PR generalizes the existing mobile-only CI failure scanner into a platform-agnostic outer-loop failure scanner covering 30+ pipelines. This is well-motivated — keeping outer-loop CI green requires systematic tracking, and the previous scope (mobile-only, single pipeline) was too narrow.

Approach: The approach evolves the existing gh-aw workflow definition by renaming/expanding the configuration and writing a comprehensive new prompt document. The two-pass KBE → muting-PR flow is a thoughtful design that works within the agent's permission constraints.

Summary: ⚠️ Needs Human Review. The implementation is well-structured and the prompt engineering is thorough, but the significantly expanded allowed-files scope for PRs and the broader write permissions deserve careful human assessment of blast radius risk.


Detailed Findings

⚠️ Expanded allowed-files scope — significant blast radius increase

The old workflow restricted PRs to src/libraries/**/tests/** and src/libraries/Common/tests/** (test files only). The new workflow allows modifications across:

  • src/libraries/** (includes production source, not just tests)
  • src/coreclr/** (full CoreCLR tree including product code)
  • src/mono/** (full Mono tree)
  • src/tests/**
  • src/native/**
  • eng/testing/**

While the prompt instructions say "small product fix opportunity" requires ≤ 20 lines, single file, non-API changes, the allowed-files enforcement is at the safe-outputs infrastructure level and permits much broader changes. The protected_path_prefixes only blocks .github/ and .agents/.

Question for maintainers: Is the team comfortable with an automated agent having write access to production source files in src/coreclr/**, src/mono/**, and src/libraries/** (non-test)? The prompt engineering constrains behavior, but the guardrails are soft (prompt-level) not hard (infrastructure-level). A misbehaving or prompt-injected agent could theoretically propose changes to production code within these paths.

✅ Label restriction — good hardening

The addition of allowed_labels on issues ("Known Build Error", "blocking-clean-ci") and on PRs ("agentic-workflows") is a sound security improvement. This prevents the agent from adding arbitrary area/OS/arch labels, which is appropriate given its automated nature.

✅ Two-pass KBE flow — well-designed constraint workaround

The two-pass design (KBE in run N, muting PR in run N+1) elegantly works around the issues: write permission limitation in the agent job. The 12-hour cadence makes the convergence time acceptable for outer-loop pipelines.

✅ Signature specificity rules — prevents overly broad KBEs

The "Signature specificity (mandatory)" section with explicit reject/prefer lists is a strong addition. It prevents generic patterns like exitcode: 139 from silencing unrelated failures.

💡 Coverage discipline documentation — thorough but long

The ci-failure-scan.md prompt is 408 lines. While thoroughness is valuable for an autonomous agent, the complexity increases maintenance burden. Consider whether some of the "Hard environment constraints" documentation could be moved to a shared reference rather than inline.

⚠️ Timeout increase (60 → 90 minutes) — cost/resource concern

With 30+ pipelines to scan, each potentially requiring multiple Helix API calls and log fetches, 90 minutes may still be tight — or conversely, may be too generous for a workflow that runs every 12 hours (7.5 min per pipeline on average). Worth monitoring after deployment.

💡 Model upgrade to claude-sonnet-4.6

The model upgrade from claude-sonnet-4.5 to claude-sonnet-4.6 is reasonable for a more capable agent handling the expanded scope.

✅ Safe-output caps — reasonable scaling

Increasing from 3 issues/5 PRs to 5 issues/10 PRs per run scales proportionally with the ~6× increase in pipeline coverage. The prompt clearly documents cap-hit behavior ("skipped: cap reached").

⚠️ max_patch_size: 1024 may be insufficient

With the new scope including potential product fixes (up to 20 lines in a single file), the 1024-byte patch size limit inherited from the old configuration may be too tight for some valid fixes. The muting PRs (adding [ActiveIssue] annotations) should fit, but the "small product fix" PRs mentioned in the prompt might not. Consider whether this limit needs adjustment.

✅ Concurrency group rename — clean migration

The rename from mobile-scan to ci-failure-scan in both the concurrency group and workflow ID is consistent throughout the lock file. No stale references remain.

Generated by Code Review for issue #127824 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants