Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Works with Claude Code, Cursor, Cline, GitHub Copilot, and other compatible agen
| [find-bugs](skills/find-bugs/SKILL.md) | Find bugs, security vulnerabilities, and code quality issues in local branch changes. |
| [gh-review-requests](skills/gh-review-requests/SKILL.md) | Fetch unread GitHub notifications for open PRs where review is requested from a specified team or opened by a team member. |
| [gha-security-review](skills/gha-security-review/SKILL.md) | GitHub Actions security review for workflow exploitation vulnerabilities. |
| [iterate-pr](skills/iterate-pr/SKILL.md) | Iterate on a PR until CI passes. |
| [iterate-pr](skills/iterate-pr/SKILL.md) | Iterate on a PR until CI passes and actionable review feedback is addressed. |
| [presentation-creator](skills/presentation-creator/SKILL.md) | Create data-driven presentation slides using React, Vite, and Recharts with Sentry branding. |
| [pr-writer](skills/pr-writer/SKILL.md) | Canonical workflow to create and update pull requests following Sentry conventions. |
| [prompt-optimizer](skills/prompt-optimizer/SKILL.md) | Optimize prompts with evals, model-family adapters, and exact external context references. |
Expand Down
217 changes: 83 additions & 134 deletions skills/iterate-pr/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,195 +1,144 @@
---
name: iterate-pr
description: Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.
description: Iterate on a PR until actionable CI passes and high/medium review feedback is addressed. Use for PR CI failures, review feedback, or green-check loops; do not wait for human approval, draft status, or merge gates.
---

# Iterate on PR Until CI Passes

Continuously iterate on the current branch until all CI checks pass and review feedback is addressed.
Goal: fix actionable CI failures and high/medium review feedback. Stop and report human approval, draft-readiness, and merge-readiness gates.

**Requires**: GitHub CLI (`gh`) authenticated.

**Requires**: The `uv` CLI for python package management, install guide at https://docs.astral.sh/uv/getting-started/installation/

**Important**: All scripts must be run from the repository root directory (where `.git` is located), not from the skill directory. Script paths like `scripts/fetch_pr_checks.py` are relative to this skill's root directory (the directory containing this SKILL.md), not relative to the target repository.
Requires:
- authenticated `gh`
- `uv`
- target repository root as cwd
- skill-root-relative script paths, for example `scripts/fetch_pr_checks.py`

## Bundled Scripts

### `scripts/fetch_pr_checks.py`

Fetches CI check status and extracts failure snippets from logs.

```bash
uv run scripts/fetch_pr_checks.py [--pr NUMBER]
```

Returns JSON:
```json
{
"pr": {"number": 123, "branch": "feat/foo"},
"summary": {"total": 5, "passed": 3, "failed": 2, "pending": 0},
"checks": [
{"name": "tests", "status": "fail", "log_snippet": "...", "run_id": 123},
{"name": "lint", "status": "pass"}
]
}
```

### `scripts/fetch_pr_feedback.py`

Fetches and categorizes PR review feedback using the [LOGAF scale](https://develop.sentry.dev/engineering-practices/code-review/#logaf-scale).
| Script | Run | Output |
|--------|-----|--------|
| `scripts/fetch_pr_checks.py` | `uv run scripts/fetch_pr_checks.py [--pr NUMBER]` | JSON: `pr`, `summary`, `checks`, failure snippets |
| `scripts/fetch_pr_feedback.py` | `uv run scripts/fetch_pr_feedback.py [--pr NUMBER]` | JSON buckets: `high`, `medium`, `low`, `bot`, `resolved` |
| `scripts/monitor_pr_checks.py` | `uv run scripts/monitor_pr_checks.py [--pr NUMBER]` | terminal marker plus tab-separated checks |
| `scripts/reply_to_thread.py` | `uv run scripts/reply_to_thread.py THREAD_ID BODY [...]` | JSON reply results |

```bash
uv run scripts/fetch_pr_feedback.py [--pr NUMBER]
```

Returns JSON with feedback categorized as:
- `high` - Must address before merge (`h:`, blocker, changes requested)
- `medium` - Should address (`m:`, standard feedback)
- `low` - Optional (`l:`, nit, style, suggestion)
- `bot` - Informational automated comments (Codecov, Dependabot, etc.)
- `resolved` - Already resolved threads

Review bot feedback (from Sentry, Warden, Cursor, Bugbot, CodeQL, etc.) appears in `high`/`medium`/`low` with `review_bot: true` — it is NOT placed in the `bot` bucket.

### `scripts/monitor_pr_checks.py`

Monitors PR checks until they all reach a terminal state. Retries transient `gh` failures, treats `skipping` and `cancel` as terminal states, and waits for checks to register after a fresh push instead of exiting early.

```bash
uv run scripts/monitor_pr_checks.py [--pr NUMBER]
```
Check summary fields include `failed`, `pending`, `actionable_pending`, and `human_gate_pending`.

Prints one terminal marker followed by a tab-separated check summary:
Monitor markers:
- `ALL_CHECKS_PASSED`
- `CHECKS_DONE_WITH_FAILURES`
- `NO_CHECKS_REGISTERED`
- `DRAFT_PR_WITH_NO_CHECKS`
- `CHECKS_BLOCKED_BY_REVIEW_GATE`

## Workflow

### 1. Identify PR

Run:
```bash
gh pr view --json number,url,headRefName
gh pr view --json number,url,headRefName,isDraft,reviewDecision
```

Stop if no PR exists for the current branch.
Stop when:
- no PR exists
- draft PR has no checks after monitor grace period: report `DRAFT_PR_WITH_NO_CHECKS`

### 2. Gather Review Feedback
Draft rule: inspect existing checks/feedback only. Do not mark ready for review unless asked.

Run `scripts/fetch_pr_feedback.py` to get categorized feedback already posted on the PR.
### 2. Handle Feedback

### 3. Handle Feedback by LOGAF Priority
Run `uv run scripts/fetch_pr_feedback.py [--pr NUMBER]`.

**Auto-fix (no prompt):**
- `high` - must address (blockers, security, changes requested)
- `medium` - should address (standard feedback)
| Bucket | Action |
|--------|--------|
| `high` | fix |
| `medium` | fix |
| `low` | ask user which to address |
| `bot` | skip informational comments |
| `resolved` | skip |

When fixing feedback:
- Understand the root cause, not just the surface symptom
- Check for similar issues in nearby code or related files
- Fix all instances, not just the one mentioned
Feedback fix checklist:
- verify root cause
- search related code
- fix all instances
- for `review_bot: true`: fix real issues, explain false positives

This includes review bot feedback (items with `review_bot: true`). Treat it the same as human feedback:
- Real issue found → fix it
- False positive → skip, but explain why
- Never silently ignore review bot feedback — always verify the finding

**Prompt user for selection:**
- `low` - present numbered list and ask which to address:

```
Low-priority prompt format:
```text
Found 3 low-priority suggestions:
1. [l] "Consider renaming this variable" - @reviewer in api.py:42
2. [nit] "Could use a list comprehension" - @reviewer in utils.py:18
3. [style] "Add a docstring" - @reviewer in models.py:55

Which would you like to address? (e.g., "1,3" or "all" or "none")
Which should I address? ("1,3", "all", or "none")
```

**Skip silently:**
- `resolved` threads
- `bot` comments (informational only — Codecov, Dependabot, etc.)
### 3. Check CI Status

### 4. Check CI Status
Run `uv run scripts/fetch_pr_checks.py [--pr NUMBER]`.

Run `scripts/fetch_pr_checks.py` to get structured failure data.
| State | Action |
|-------|--------|
| `failed > 0` and `actionable_pending == 0` | fix failures |
| `actionable_pending > 0` | wait; poll feedback while waiting |
| `pending > 0` and `actionable_pending == 0` | report `CHECKS_BLOCKED_BY_REVIEW_GATE` |
| no checks after grace period | report `NO_CHECKS_REGISTERED` or `DRAFT_PR_WITH_NO_CHECKS` |
| all actionable checks passed | run post-CI feedback check |

**Wait if pending:** If review bot checks (sentry, warden, cursor, bugbot, seer, codeql) are still running, wait before proceeding—they post actionable feedback that must be evaluated. Informational bots (codecov) are not worth waiting for.
Wait for actionable review bots: sentry, warden, cursor, bugbot, seer, codeql.
Do not wait for approval, `isDraft`, `REVIEW_REQUIRED`, Codecov, or informational bots.

### 5. Fix CI Failures

**Investigation is mandatory before any fix.** Do not guess, assume, or infer the cause from the check name or a surface-level reading of the error. You must trace the failure to its root cause in the actual code.
### 4. Fix CI Failures

For each failure:
1. read full log: `gh run view <run-id> --log-failed`
2. trace from assertion/exception/lint rule to source
3. state the cause before editing: "fails because X, affected by Y"
4. search related call sites/patterns
5. fix root cause, not symptom
6. add focused test coverage when needed

1. **Read the full log, not just the snippet.** Use `gh run view <run-id> --log-failed` if the snippet is truncated or ambiguous. Identify the exact failing assertion, exception, or lint rule.
2. **Trace backwards from the failure to the cause.** Follow the stack trace or error message into the source code. Read the relevant functions, types, and call sites — not just the line flagged. Do not stop at the first plausible explanation.
3. **Verify your understanding before touching code.** You should be able to state: "This fails because X, which was introduced/affected by Y." If you cannot state that clearly, keep investigating.
4. **Do not assume the feedback is wrong.** If a check flags something that seems incorrect, investigate fully before concluding it's a false positive. Most apparent false positives turn out to be real issues on closer inspection.
5. **Check for related instances.** If a type error, import issue, or logic bug exists at one call site, search for the same pattern in nearby code and related files. Fix all instances.
6. **Fix the root cause with minimal, targeted changes.** Do not paper over the symptom with a workaround.
7. **Extend tests when needed.** If the fix introduces behavior not covered by existing tests, add a test case (not a whole new test file).

### 6. Verify Locally, Then Commit and Push
### 5. Verify Locally, Then Commit and Push

Before committing, verify your fixes locally:
- If you fixed a test failure: re-run that specific test locally
- If you fixed a lint/type error: re-run the linter or type checker on affected files
- For any code fix: run existing tests covering the changed code

If local verification fails, fix before proceeding — do not push known-broken code.
Before commit:
- test fix: rerun specific test
- lint/type fix: rerun affected checker
- code fix: rerun covering tests
- local failure: fix before pushing

```bash
git add <files>
git commit -m "fix: <descriptive message>"
git push
```

### 7. Monitor CI and Address Feedback

Keep monitoring CI status and review feedback in a loop instead of blocking:

1. Run `uv run scripts/fetch_pr_checks.py` to get current CI status
2. If all checks passed, proceed to exit conditions
3. If any checks failed (none pending), return to step 5
4. If checks are still pending:
a. Run `uv run scripts/fetch_pr_feedback.py` for new review feedback
b. Address any new high/medium feedback immediately (same as step 3)
c. If changes were needed, commit and push (this restarts CI), then continue monitoring from the refreshed branch state
d. Sleep 30 seconds (don't increase on subsequent iterations), then repeat from sub-step 1
5. After all checks pass, wait 10 seconds for late-arriving review bot comments, then run `uv run scripts/fetch_pr_feedback.py`. Address any new high/medium feedback — if changes are needed, return to step 6.

If you're in Claude Code, you can replace the sleep-based wait above with `MonitorTool` so the polling happens in the background instead of consuming context. This is a Claude-only optimization, not the default workflow for other agents.

Run the bundled monitor script through `MonitorTool` with `persistent: false`:

```bash
uv run scripts/monitor_pr_checks.py
```

Set `timeout_ms` to match the repository's normal CI duration instead of hardcoding a 15-minute timeout.
### 6. Monitor CI and Address Feedback

After `MonitorTool` reports completion, re-run `uv run scripts/fetch_pr_checks.py`:
- If any checks failed, return to step 5.
- If all checks passed, continue to sub-step 5 above.
Loop:
1. run `uv run scripts/fetch_pr_checks.py`
2. handle table in step 3
3. while `actionable_pending > 0`, run `uv run scripts/fetch_pr_feedback.py`
4. fix new high/medium feedback immediately
5. if changed, verify, commit, push, restart loop
6. otherwise sleep 30 seconds and repeat
7. after checks pass, wait 10 seconds, fetch feedback once more
8. if new high/medium feedback exists, return to step 4

If you pushed new changes while monitoring, start a fresh monitor so it watches the new set of CI runs.

### 8. Repeat

If step 7 required code changes (from new feedback after CI passed), return to step 2 for a fresh cycle. CI failures during monitoring are already handled within step 7's polling loop.
Claude Code optional: run `uv run scripts/monitor_pr_checks.py` through `MonitorTool` with `persistent: false`; set timeout to normal repo CI duration. Restart the monitor after every push.

## Exit Conditions

**Success:** All checks pass, post-CI feedback re-check is clean (no new unaddressed high/medium feedback including review bot findings), user has decided on low-priority items.

**Ask for help:** Same failure after 2 attempts, feedback needs clarification, infrastructure issues.

**Stop:** No PR exists, branch needs rebase.
| Exit | Conditions |
|------|------------|
| Success | actionable CI passed; post-CI feedback clean; low-priority choice handled |
| Ask user | same failure after 2 attempts; feedback unclear; infrastructure issue |
| Stop | no PR; branch needs rebase; no checks; draft no-checks; only human gates remain |

## Fallback

If scripts fail, use `gh` CLI directly:
- `gh pr checks name,state,bucket,link`
- `gh pr view --json number,url,headRefName,isDraft,reviewDecision`
- `gh pr checks --json name,state,bucket,description,link`
- `gh run view <run-id> --log-failed`
- `gh api repos/{owner}/{repo}/pulls/{number}/comments`
93 changes: 93 additions & 0 deletions skills/iterate-pr/SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Iterate PR Specification

## Intent

The `iterate-pr` skill drives a pull request through actionable CI failures and actionable review feedback until the work is locally fixed, pushed, and rechecked.

Its purpose is CI and feedback iteration, not merge readiness. It must not wait indefinitely for human approvals, required review decisions, draft PR state changes, or other gates that an agent cannot resolve by editing code.

## Scope

In scope:

- Identifying the PR for the current branch.
- Fetching and categorizing PR review feedback.
- Fixing high and medium priority review feedback.
- Asking the user which low priority suggestions to address.
- Fetching CI checks, failed logs, and failure snippets.
- Fixing CI failures with local verification before pushing.
- Monitoring checks until they pass, fail, or reach a non-actionable stop state.
- Reporting draft/no-checks and human review/approval gates without polling forever.

Out of scope:

- Waiting for or requesting human approval.
- Marking draft PRs ready for review unless the user explicitly asks.
- Merging PRs.
- Rebasing branches without user direction.
- Treating Codecov, Dependabot, or other informational comments as review feedback.

## Users And Trigger Context

- Primary users: engineers and coding agents iterating on existing pull requests.
- Common user requests: fix CI on this PR, iterate on this PR until checks pass, address PR feedback, keep pushing fixes until green.
- Should not trigger for: creating a PR, writing commits without a PR, reviewing unrelated code, or monitoring merge approval state only.

## Runtime Contract

- Required first actions: resolve the current PR, read `isDraft` and `reviewDecision`, fetch current review feedback, and fetch current CI state before editing.
- Required outputs: concise progress updates, commits and pushes when fixes are made, and a final state that distinguishes passing CI from non-actionable review/draft/approval gates.
- Non-negotiable constraints: investigate failures before editing, verify locally before pushing, do not push known-broken fixes, do not wait for human approval, and do not treat draft PRs with no checks as pending forever.
- Expected bundled files loaded at runtime: `SKILL.md` and, when needed, scripts under `scripts/`.

## Source And Evidence Model

Authoritative sources:

- GitHub CLI PR and checks output.
- Sentry LOGAF review guidance.
- Repository-level agent instructions.
- Bundled script behavior documented in `SKILL.md`.

Useful improvement sources:

- positive examples: PRs where CI failures were fixed and checks passed after the loop.
- negative examples: PRs where the agent waited on draft status, required review, or approval gates.
- issue or PR feedback: reviewer comments about missing fixes, false positives, or feedback categorization.
- validation results: structural skill validation and script syntax checks.

Data that must not be stored:

- secrets
- customer data
- private URLs or identifiers not needed for reproduction
- full CI logs when small failure snippets are enough

## Reference Architecture

- `SKILL.md` contains the runtime workflow, script contracts, feedback handling rules, CI loop, and exit conditions.
- `SPEC.md` contains this maintenance contract.
- `references/` contains no files currently; add focused troubleshooting or evidence references only if runtime guidance becomes noisy.
- `references/evidence/` contains no files currently; use it for durable positive or negative PR-loop examples if regressions recur.
- `scripts/` contains non-interactive helpers for PR checks, PR feedback, check monitoring, and review-thread replies.
- `assets/` contains no files currently.

## Validation

- Lightweight validation: run `uv run skills/skill-writer/scripts/quick_validate.py skills/iterate-pr`.
- Script validation: run `uv run -m py_compile skills/iterate-pr/scripts/*.py` after script changes.
- Holdout examples: include a draft PR with no registered checks, a PR with `reviewDecision: REVIEW_REQUIRED` but passing checks, a PR with an actionable pending CI bot check, and a PR with failed CI logs.
- Acceptance gates: validator passes, scripts compile, draft/no-check states terminate with a report, human review gates are not treated as actionable pending CI, and actionable CI failures still route back to investigation and fixes.

## Known Limitations

- Human-gate detection depends on check names, states, and descriptions exposed by GitHub or CI integrations.
- Some repositories may intentionally model deployment or approval workflows as status checks; this skill reports those as blocked/non-actionable unless the user asks to manage that gate.
- The helper scripts use GitHub CLI output and can drift if `gh pr checks` changes its JSON schema.

## Maintenance Notes

- Update `SKILL.md` when the runtime loop, script contracts, feedback policy, or exit conditions change.
- Update `SPEC.md` when the skill's scope, validation expectations, or non-actionable gate policy changes.
- Add focused reference files only when repeated troubleshooting guidance would make `SKILL.md` hard to scan.
- Keep public inventories pointed at the canonical `skills/iterate-pr` skill, not mirrors or aliases.
Loading
Loading