Skip to content

[ci-scanner] Improve workflow and fix detection auth#128125

Merged
kotlarmilos merged 8 commits into
mainfrom
kotlarmilos/ci-scanner-feedback
May 14, 2026
Merged

[ci-scanner] Improve workflow and fix detection auth#128125
kotlarmilos merged 8 commits into
mainfrom
kotlarmilos/ci-scanner-feedback

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented May 13, 2026

Description

This PR improves the workflow based on the past runs and fixes the security scanning requires review caution banner that has been appearing on every issue/PR the scanner produces.

Changes

  • Restructured body into a deterministic step-by-step flow with hard rules, branch decisions, literal inline templates, and an output discipline section
  • KBE template now includes the ## Error Details section that carries the build-analysis indicator block
  • Area-path assignment is delegated to the runtime labeler bot so each issue ends up with a single area path
  • Project linkage is left to the existing net-helix[bot] automation (which watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project within ~2s); the workflow only has to apply the label

Validation artifacts

Issues produced by workflow_dispatch runs of this branch:

- Restructure ci-failure-scan.md body into a deterministic step-by-step
  flow with hard rules, branch decisions, literal inline templates, and
  an output discipline section.
- Wire same-run project linkage for Known Build Error issues via
  update-project + temporary_id so Build Analysis picks them up.
- Add the Error Details template section so KBE issues carry the
  build-analysis indicator block.
- Delegate area path assignment to the runtime labeler bot (single area
  path per issue).
- Add an outage circuit breaker: when failures exceed the threshold,
  open one tracking issue and stop filing per-failure issues.
- Regenerate lock with gh-aw v0.71.5 and patch pat_pool into the
  detection job's needs list so the threat-detection PAT resolves
  correctly and the security-review caution banner stops appearing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

@kotlarmilos kotlarmilos changed the title [ci-scanner] address review feedback and fix detection auth [ci-scanner] Improve workflow and fix detection auth May 13, 2026
@kotlarmilos kotlarmilos marked this pull request as ready for review May 13, 2026 08:31
@kotlarmilos kotlarmilos requested review from a team and jeffhandley as code owners May 13, 2026 08:31
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CI Outer-Loop Failure Scanner’s gh-aw prompt/workflow to make triage output more deterministic, add same-run linkage of KBEs to a GitHub Project via a new update_project safe-output tool, and patch the generated lock workflow to fix threat-detection authentication.

Changes:

  • Rewrites .github/workflows/ci-failure-scan.md into an explicit step-by-step decision flow with stricter “hard rules”, updated templates (including ## Error Details), and an outage circuit breaker.
  • Adds update_project as an allowed safe-output operation and documents a temporary_id-based “create issue then attach to project” payload.
  • Updates the compiled .lock.yml workflow to include the new tool and to ensure the threat-detection job depends on pat_pool so its COPILOT token expression resolves.
Show a summary per file
File Description
.github/workflows/ci-failure-scan.md Major restructuring of the scanner prompt plus new templates and update_project safe-output integration guidance.
.github/workflows/ci-failure-scan.lock.yml Regenerated workflow + manual patch to threat-detection needs, plus wiring for update_project in safe-outputs configuration.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml
Same shape of bug as the detection patch: gh-aw v0.71.5 does not
auto-wire pat_pool into safe_outputs.needs even though the job's env
references needs.pat_pool.outputs.pat_number indirectly via the
update-project github-token. Without this, safe_outputs.update-project
fails with 'Input required and not supplied: github-token' because
secrets.COPILOT_GITHUB_TOKEN is empty/stale in this repo.

- Add pat_pool to safe_outputs.needs.
- Replace secrets.COPILOT_GITHUB_TOKEN with the same case() expression
  used in engine.env in three places inside the Process Safe Outputs
  step: handler config JSON, GH_AW_PROJECT_GITHUB_TOKEN env, and the
  github-script with: github-token input.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

…r project

The previous patch over-broadly switched the safe_outputs github-script
'with: github-token' to the pat_pool case() expression, which broke
create_issue and create_pull_request: those use the same octokit, and
the COPILOT_PAT_# tokens don't have repo issue/pr write scope for
dotnet/runtime.

- Restore with: github-token to secrets.GH_AW_GITHUB_TOKEN ||
  secrets.GITHUB_TOKEN (the standard pattern). The job-level
  permissions block grants issues:write / pull-requests:write so the
  workflow GITHUB_TOKEN can create issues and PRs.
- Keep GH_AW_PROJECT_GITHUB_TOKEN env on the case() expression so
  the update_project handler still has a PAT with project:write scope.
- Fix safe_outputs.permissions.issues regression from read to write
  (gh-aw dropped it to read when update-project was added; restoring to
  match upstream's pattern for create-issue safe-outputs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 09:05
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 4

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated
Comment thread .github/workflows/ci-failure-scan.lock.yml Outdated
Remove the outage circuit breaker step and its associated template.
The per-run trip thresholds were too aggressive in practice — any
significant outage immediately tripped them and produced one
consolidated tracking issue instead of per-failure KBEs, which is
not actionable. Renumber steps 5/6/7 to 4/5/6 accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

The org "Known Build Errors" project (dotnet/projects/111) has an
auto-add rule that pulls in any dotnet/runtime issue labeled
"Known Build Error". The workflow's update_project handler was
redundant with that rule and required a Projects v2 PAT scope the
agent's token pool intentionally does not have, so update_project
always failed at the GraphQL call.

Removing it:
- Drops the safe-outputs.update-project block from the workflow.
- Drops the temporary_id / same-run project linkage instructions from
  the agent prompt; project membership is achieved purely by the
  Known Build Error label.
- Cleans up the lock: no more GH_AW_PROJECT_GITHUB_TOKEN env, no more
  update_project handler config, and gh-aw now derives
  safe_outputs.permissions.issues: write and the correct needs graph
  on its own. Only one manual patch remains (detection.needs += pat_pool)
  with an inline explanatory comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 09:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
@github-actions

This comment has been minimized.

- Clarify Step 5 branching: A/B/D/E/F are mutually exclusive; Branch C is an additive refinement of Branch B (resolves the contradiction between 'Exactly one branch fires' and 'Branch C emits B outputs PLUS...').

- KBE title template now only allows test-failure / hang forms, removing the inconsistent non-test form that contradicted Hard rule #7.

- Document that net-helix[bot] is the agent that watches the Known Build Error label and adds matching dotnet/runtime issues to the Known Build Errors org project; the workflow does not need to do anything beyond applying the label.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Drop the build-break / infra tracking-issue branches and route every
actionable failure (test failure, hang, build break) through the same
KBE template. Build Analysis matches both shapes via the JSON body, so
a separate tracking-issue path added no value and produced issues that
were not picked up by the project board.

- Hard rule rewritten: every actionable failure becomes a Known Build
  Error issue; infra-only failures with no stable signature skip
  emission entirely.
- Step 3 reframed as log-extraction guidance only; deadletter and
  infra-shaped no-helix failures record 'skipped: infra noise — no
  stable signature' in the tally.
- Step 5 collapsed from A/B/C/D/E/F to A/B/C. Branch A now covers test
  failures and build breaks (stable = >= 2 occurrences in window OR a
  build break failing all legs of the current build). Branch B carves
  out build breaks (no muting path for compile errors). Branch C
  extended to mechanical build-break fixes.
- KBE title template adds a third form for build breaks.
- Weak signature now skips emission instead of falling through to a
  tracking issue.
- Tracking issue templates (generic + JIT pipeline) removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 13, 2026 13:41
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.lock.yml
Comment thread .github/workflows/ci-failure-scan.md Outdated
Address review feedback: the prompt was using 'muting' internally while
forbidding 'Mute'/'Muting' in the created PR/issue titles, which created
unnecessary friction. Use 'test-disable' consistently so the agent's
internal terminology matches the artifacts it produces.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Caution

Security scanning requires review for Code Review

Details

The threat detection results could not be parsed. The workflow output should be reviewed before merging.

Review the workflow run logs for details.

Note

This review was generated by Copilot.

🤖 Copilot Code Review — PR #128125

Holistic Assessment

Motivation: The PR fixes a real, validated auth issue (the "Security scanning requires review" caution banner) and restructures the agent prompt for deterministic behavior. The three linked validation issues (#128126, #128128, #128129) demonstrate the fix works. Justified.

Approach: Two-part change — a surgical lock file fix (adding pat_pool to detection.needs:) and a comprehensive prompt rewrite. The lock file fix is the right minimal patch; the prompt rewrite converts loose prose into a numbered step-by-step flow with hard rules, which should reduce agent drift. Reasonable approach.

Summary: ⚠️ Needs Human Review. The code changes are correct for their stated purpose, but the lock file patch is inherently fragile (will be lost on next gh aw compile), and one unresolved reviewer comment about terminology remains. A human should confirm the prompt restructuring matches operational intent.


Detailed Findings

✅ Auth Fix (lock file) — Correct and well-documented

The pat_pool addition to detection.needs: is the right fix. The 20-line comment block clearly explains the root cause chain: missing needs → empty PAT number → malformed Authorization header → 400s → no THREAT_DETECTION_RESULT marker → caution banner. The comment also documents the upstream gap (gh-aw v0.71.5 fixed inference for the agent job but not detection) and that the patch must be re-applied after every gh aw compile. This is good engineering documentation.

⚠️ Lock file fragility — Manual patch risk (previously flagged)

As noted by a previous reviewer, this manual patch to the compiled lock file will be silently dropped on the next gh aw compile. The comment in the file documents this, but there's no automated guard. Consider tracking this with a repo issue or adding a CI check that asserts pat_pool is in detection.needs: to prevent silent regression. This is advisory, not blocking — the comment is sufficient documentation for now.

✅ Prompt restructuring — Improved determinism

The .md rewrite converts the previous prose-heavy instructions into a numbered step-by-step flow (Steps 1-6) with explicit branch decisions (A/B/C), literal templates, and hard rules. Key improvements:

  • Hard rules are now numbered and upfront (rules 1-12)
  • KBE templates include the new ## Error Details section for human-readable context
  • Area-path assignment is explicitly delegated to the labeler bot
  • Project linkage is explicitly delegated to net-helix[bot]
  • Terminology changed from "muting PR" to "test-disable PR" (addressing vitek-karas feedback)
  • Branch C is now correctly described as "a refinement of Branch B" rather than a separate branch

✅ KBE scope expansion — Build breaks now eligible

The old prompt restricted KBEs to test failures/hangs only. The new prompt allows [ci-scan] Build break: <short error description> titles with KBE bodies, which is a deliberate policy change. The ## Error Message JSON still carries the canonical signature for Build Analysis matching. This is consistent with the stated goal of making Build Analysis effective for all failure types.

💡 Cron schedule change — Trivial

Changed from :31 to :34 past the hour. Likely to avoid collision with another scheduled workflow. No impact.

✅ Output discipline section — Cleaner constraints

The new "Output discipline" section consolidates scattered rules into a clear list. The noop prohibition is retained but the steps now have explicit skipped: <reason> outcomes for signatures that don't warrant action, which resolves the previous contradiction flagged by reviewers.

⚠️ Minor: "Don't emit noop" vs existing-KBE outcomes

The output discipline says "Don't emit noop. Either a PR or an issue must come out of every actionable failure." But Step 4.4/4.5 can result in existing-PR #<n> which emits neither a new PR nor a new issue. The intent is clear (existing matches are valid outcomes), but the literal wording could still confuse an LLM agent. Consider rewording to: "Every actionable failure must result in a new issue, a new PR, or a recorded match to an existing one."

Generated by Code Review for issue #128125 · ● 2.7M ·

@kotlarmilos kotlarmilos merged commit 47e3ca7 into main May 14, 2026
28 checks passed
@kotlarmilos kotlarmilos deleted the kotlarmilos/ci-scanner-feedback branch May 14, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants