[ci-scanner] Improve workflow and fix detection auth by kotlarmilos · Pull Request #128125 · dotnet/runtime

kotlarmilos · 2026-05-13T08:28:13Z

Description

This PR improves the workflow based on the past runs and fixes the security scanning requires review caution banner that has been appearing on every issue/PR the scanner produces.

Changes

Restructured body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section
KBE template now includes the ## Error Details section that carries the build-analysis indicator block
Area-path assignment is delegated to the runtime labeler bot so each issue ends up with a single area path
Project linkage is left to the existing net-helix[bot] automation (which watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project within ~2s); the workflow only has to apply the label

Validation artifacts

Issues produced by workflow_dispatch runs of this branch:

- Restructure ci-failure-scan.md body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section. - Wire same-run project linkage for Known Build Error issues via update-project + temporary_id so Build Analysis picks them up. - Add the Error Details template section so KBE issues carry the build-analysis indicator block. - Delegate area path assignment to the runtime labeler bot (single area path per issue). - Add an outage circuit breaker: when failures exceed the threshold, open one tracking issue and stop filing per-failure issues. - Regenerate lock with gh-aw v0.71.5 and patch pat_pool into the detection job's needs list so the threat-detection PAT resolves correctly and the security-review caution banner stops appearing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-05-13T08:29:48Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR updates the CI Outer-Loop Failure Scanner’s gh-aw prompt/workflow to make triage output more deterministic, add same-run linkage of KBEs to a GitHub Project via a new update_project safe-output tool, and patch the generated lock workflow to fix threat-detection authentication.

Changes:

Rewrites .github/workflows/ci-failure-scan.md into an explicit step-by-step decision flow with stricter “hard rules”, updated templates (including ## Error Details), and an outage circuit breaker.
Adds update_project as an allowed safe-output operation and documents a temporary_id-based “create issue then attach to project” payload.
Updates the compiled .lock.yml workflow to include the new tool and to ensure the threat-detection job depends on pat_pool so its COPILOT token expression resolves.

Show a summary per file

File	Description
.github/workflows/ci-failure-scan.md	Major restructuring of the scanner prompt plus new templates and `update_project` safe-output integration guidance.
.github/workflows/ci-failure-scan.lock.yml	Regenerated workflow + manual patch to threat-detection `needs`, plus wiring for `update_project` in safe-outputs configuration.

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 3

Same shape of bug as the detection patch: gh-aw v0.71.5 does not auto-wire pat_pool into safe_outputs.needs even though the job's env references needs.pat_pool.outputs.pat_number indirectly via the update-project github-token. Without this, safe_outputs.update-project fails with 'Input required and not supplied: github-token' because secrets.COPILOT_GITHUB_TOKEN is empty/stale in this repo. - Add pat_pool to safe_outputs.needs. - Replace secrets.COPILOT_GITHUB_TOKEN with the same case() expression used in engine.env in three places inside the Process Safe Outputs step: handler config JSON, GH_AW_PROJECT_GITHUB_TOKEN env, and the github-script with: github-token input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…r project The previous patch over-broadly switched the safe_outputs github-script 'with: github-token' to the pat_pool case() expression, which broke create_issue and create_pull_request: those use the same octokit, and the COPILOT_PAT_# tokens don't have repo issue/pr write scope for dotnet/runtime. - Restore with: github-token to secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN (the standard pattern). The job-level permissions block grants issues:write / pull-requests:write so the workflow GITHUB_TOKEN can create issues and PRs. - Keep GH_AW_PROJECT_GITHUB_TOKEN env on the case() expression so the update_project handler still has a PAT with project:write scope. - Fix safe_outputs.permissions.issues regression from read to write (gh-aw dropped it to read when update-project was added; restoring to match upstream's pattern for create-issue safe-outputs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 4

Remove the outage circuit breaker step and its associated template. The per-run trip thresholds were too aggressive in practice — any significant outage immediately tripped them and produced one consolidated tracking issue instead of per-failure KBEs, which is not actionable. Renumber steps 5/6/7 to 4/5/6 accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The org "Known Build Errors" project (dotnet/projects/111) has an auto-add rule that pulls in any dotnet/runtime issue labeled "Known Build Error". The workflow's update_project handler was redundant with that rule and required a Projects v2 PAT scope the agent's token pool intentionally does not have, so update_project always failed at the GraphQL call. Removing it: - Drops the safe-outputs.update-project block from the workflow. - Drops the temporary_id / same-run project linkage instructions from the agent prompt; project membership is achieved purely by the Known Build Error label. - Cleans up the lock: no more GH_AW_PROJECT_GITHUB_TOKEN env, no more update_project handler config, and gh-aw now derives safe_outputs.permissions.issues: write and the correct needs graph on its own. Only one manual patch remains (detection.needs += pat_pool) with an inline explanatory comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 2

- Clarify Step 5 branching: A/B/D/E/F are mutually exclusive; Branch C is an additive refinement of Branch B (resolves the contradiction between 'Exactly one branch fires' and 'Branch C emits B outputs PLUS...'). - KBE title template now only allows test-failure / hang forms, removing the inconsistent non-test form that contradicted Hard rule #7. - Document that net-helix[bot] is the agent that watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project; the workflow does not need to do anything beyond applying the label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Drop the build-break / infra tracking-issue branches and route every actionable failure (test failure, hang, build break) through the same KBE template. Build Analysis matches both shapes via the JSON body, so a separate tracking-issue path added no value and produced issues that were not picked up by the project board. - Hard rule rewritten: every actionable failure becomes a Known Build Error issue; infra-only failures with no stable signature skip emission entirely. - Step 3 reframed as log-extraction guidance only; deadletter and infra-shaped no-helix failures record 'skipped: infra noise — no stable signature' in the tally. - Step 5 collapsed from A/B/C/D/E/F to A/B/C. Branch A now covers test failures and build breaks (stable = >= 2 occurrences in window OR a build break failing all legs of the current build). Branch B carves out build breaks (no muting path for compile errors). Branch C extended to mechanical build-break fixes. - KBE title template adds a third form for build breaks. - Weak signature now skips emission instead of falling through to a tracking issue. - Tracking issue templates (generic + JIT pipeline) removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 2/2 changed files
Comments generated: 3

Address review feedback: the prompt was using 'muting' internally while forbidding 'Mute'/'Muting' in the created PR/issue titles, which created unnecessary friction. Use 'test-disable' consistently so the agent's internal terminology matches the artifacts it produces. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-14T09:10:15Z

Caution

Security scanning requires review for Code Review

Details

The threat detection results could not be parsed. The workflow output should be reviewed before merging.

Review the workflow run logs for details.

Note

This review was generated by Copilot.

🤖 Copilot Code Review — PR #128125

Holistic Assessment

Motivation: The PR fixes a real, validated auth issue (the "Security scanning requires review" caution banner) and restructures the agent prompt for deterministic behavior. The three linked validation issues (#128126, #128128, #128129) demonstrate the fix works. Justified.

Approach: Two-part change — a surgical lock file fix (adding pat_pool to detection.needs:) and a comprehensive prompt rewrite. The lock file fix is the right minimal patch; the prompt rewrite converts loose prose into a numbered step-by-step flow with hard rules, which should reduce agent drift. Reasonable approach.

Summary: ⚠️ Needs Human Review. The code changes are correct for their stated purpose, but the lock file patch is inherently fragile (will be lost on next gh aw compile), and one unresolved reviewer comment about terminology remains. A human should confirm the prompt restructuring matches operational intent.

Detailed Findings

✅ Auth Fix (lock file) — Correct and well-documented

The pat_pool addition to detection.needs: is the right fix. The 20-line comment block clearly explains the root cause chain: missing needs → empty PAT number → malformed Authorization header → 400s → no THREAT_DETECTION_RESULT marker → caution banner. The comment also documents the upstream gap (gh-aw v0.71.5 fixed inference for the agent job but not detection) and that the patch must be re-applied after every gh aw compile. This is good engineering documentation.

⚠️ Lock file fragility — Manual patch risk (previously flagged)

As noted by a previous reviewer, this manual patch to the compiled lock file will be silently dropped on the next gh aw compile. The comment in the file documents this, but there's no automated guard. Consider tracking this with a repo issue or adding a CI check that asserts pat_pool is in detection.needs: to prevent silent regression. This is advisory, not blocking — the comment is sufficient documentation for now.

✅ Prompt restructuring — Improved determinism

The .md rewrite converts the previous prose-heavy instructions into a numbered step-by-step flow (Steps 1-6) with explicit branch decisions (A/B/C), literal templates, and hard rules. Key improvements:

Hard rules are now numbered and upfront (rules 1-12)
KBE templates include the new ## Error Details section for human-readable context
Area-path assignment is explicitly delegated to the labeler bot
Project linkage is explicitly delegated to net-helix[bot]
Terminology changed from "muting PR" to "test-disable PR" (addressing vitek-karas feedback)
Branch C is now correctly described as "a refinement of Branch B" rather than a separate branch

✅ KBE scope expansion — Build breaks now eligible

The old prompt restricted KBEs to test failures/hangs only. The new prompt allows [ci-scan] Build break: <short error description> titles with KBE bodies, which is a deliberate policy change. The ## Error Message JSON still carries the canonical signature for Build Analysis matching. This is consistent with the stated goal of making Build Analysis effective for all failure types.

💡 Cron schedule change — Trivial

Changed from :31 to :34 past the hour. Likely to avoid collision with another scheduled workflow. No impact.

✅ Output discipline section — Cleaner constraints

The new "Output discipline" section consolidates scattered rules into a clear list. The noop prohibition is retained but the steps now have explicit skipped: <reason> outcomes for signatures that don't warrant action, which resolves the previous contradiction flagged by reviewers.

⚠️ Minor: "Don't emit `noop`" vs existing-KBE outcomes

The output discipline says "Don't emit noop. Either a PR or an issue must come out of every actionable failure." But Step 4.4/4.5 can result in existing-PR #<n> which emits neither a new PR nor a new issue. The intent is clear (existing matches are valid outcomes), but the literal wording could still confuse an LLM agent. Consider rewording to: "Every actionable failure must result in a new issue, a new PR, or a recorded match to an existing one."

Generated by Code Review for issue #128125 · ● 2.7M · ◷

Copilot AI review requested due to automatic review settings May 13, 2026 08:28

Copilot started reviewing on behalf of kotlarmilos May 13, 2026 08:28 View session

github-actions Bot added the area-Infrastructure label May 13, 2026

dotnet-policy-service Bot assigned kotlarmilos May 13, 2026

github-project-automation Bot added this to Runtime Infra May 13, 2026

kotlarmilos requested review from JanKrivanek, JulieLeeMSFT, PureWeen, jeffschwMSFT, matouskozak and vitek-karas May 13, 2026 08:30

kotlarmilos changed the title ~~[ci-scanner] address review feedback and fix detection auth~~ [ci-scanner] Improve workflow and fix detection auth May 13, 2026

kotlarmilos marked this pull request as ready for review May 13, 2026 08:31

kotlarmilos requested review from a team and jeffhandley as code owners May 13, 2026 08:31