diff --git a/README.md b/README.md index ff5258e..87f50ca 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,7 @@ Works with Claude Code, Cursor, Cline, GitHub Copilot, and other compatible agen | [find-bugs](skills/find-bugs/SKILL.md) | Find bugs, security vulnerabilities, and code quality issues in local branch changes. | | [gh-review-requests](skills/gh-review-requests/SKILL.md) | Fetch unread GitHub notifications for open PRs where review is requested from a specified team or opened by a team member. | | [gha-security-review](skills/gha-security-review/SKILL.md) | GitHub Actions security review for workflow exploitation vulnerabilities. | -| [iterate-pr](skills/iterate-pr/SKILL.md) | Iterate on a PR until CI passes. | +| [iterate-pr](skills/iterate-pr/SKILL.md) | Iterate on a PR until CI passes and actionable review feedback is addressed. | | [presentation-creator](skills/presentation-creator/SKILL.md) | Create data-driven presentation slides using React, Vite, and Recharts with Sentry branding. | | [pr-writer](skills/pr-writer/SKILL.md) | Canonical workflow to create and update pull requests following Sentry conventions. | | [prompt-optimizer](skills/prompt-optimizer/SKILL.md) | Optimize prompts with evals, model-family adapters, and exact external context references. | diff --git a/skills/iterate-pr/SKILL.md b/skills/iterate-pr/SKILL.md index c526f04..2012042 100644 --- a/skills/iterate-pr/SKILL.md +++ b/skills/iterate-pr/SKILL.md @@ -1,143 +1,111 @@ --- name: iterate-pr -description: Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle. +description: Iterate on a PR until actionable CI passes and high/medium review feedback is addressed. Use for PR CI failures, review feedback, or green-check loops; do not wait for human approval, draft status, or merge gates. --- # Iterate on PR Until CI Passes -Continuously iterate on the current branch until all CI checks pass and review feedback is addressed. +Goal: fix actionable CI failures and high/medium review feedback. Stop and report human approval, draft-readiness, and merge-readiness gates. -**Requires**: GitHub CLI (`gh`) authenticated. - -**Requires**: The `uv` CLI for python package management, install guide at https://docs.astral.sh/uv/getting-started/installation/ - -**Important**: All scripts must be run from the repository root directory (where `.git` is located), not from the skill directory. Script paths like `scripts/fetch_pr_checks.py` are relative to this skill's root directory (the directory containing this SKILL.md), not relative to the target repository. +Requires: +- authenticated `gh` +- `uv` +- target repository root as cwd +- skill-root-relative script paths, for example `scripts/fetch_pr_checks.py` ## Bundled Scripts -### `scripts/fetch_pr_checks.py` - -Fetches CI check status and extracts failure snippets from logs. - -```bash -uv run scripts/fetch_pr_checks.py [--pr NUMBER] -``` - -Returns JSON: -```json -{ - "pr": {"number": 123, "branch": "feat/foo"}, - "summary": {"total": 5, "passed": 3, "failed": 2, "pending": 0}, - "checks": [ - {"name": "tests", "status": "fail", "log_snippet": "...", "run_id": 123}, - {"name": "lint", "status": "pass"} - ] -} -``` - -### `scripts/fetch_pr_feedback.py` - -Fetches and categorizes PR review feedback using the [LOGAF scale](https://develop.sentry.dev/engineering-practices/code-review/#logaf-scale). +| Script | Run | Output | +|--------|-----|--------| +| `scripts/fetch_pr_checks.py` | `uv run scripts/fetch_pr_checks.py [--pr NUMBER]` | JSON: `pr`, `summary`, `checks`, failure snippets | +| `scripts/fetch_pr_feedback.py` | `uv run scripts/fetch_pr_feedback.py [--pr NUMBER]` | JSON buckets: `high`, `medium`, `low`, `bot`, `resolved` | +| `scripts/monitor_pr_checks.py` | `uv run scripts/monitor_pr_checks.py [--pr NUMBER]` | terminal marker plus tab-separated checks | +| `scripts/reply_to_thread.py` | `uv run scripts/reply_to_thread.py THREAD_ID BODY [...]` | JSON reply results | -```bash -uv run scripts/fetch_pr_feedback.py [--pr NUMBER] -``` - -Returns JSON with feedback categorized as: -- `high` - Must address before merge (`h:`, blocker, changes requested) -- `medium` - Should address (`m:`, standard feedback) -- `low` - Optional (`l:`, nit, style, suggestion) -- `bot` - Informational automated comments (Codecov, Dependabot, etc.) -- `resolved` - Already resolved threads - -Review bot feedback (from Sentry, Warden, Cursor, Bugbot, CodeQL, etc.) appears in `high`/`medium`/`low` with `review_bot: true` — it is NOT placed in the `bot` bucket. - -### `scripts/monitor_pr_checks.py` - -Monitors PR checks until they all reach a terminal state. Retries transient `gh` failures, treats `skipping` and `cancel` as terminal states, and waits for checks to register after a fresh push instead of exiting early. - -```bash -uv run scripts/monitor_pr_checks.py [--pr NUMBER] -``` +Check summary fields include `failed`, `pending`, `actionable_pending`, and `human_gate_pending`. -Prints one terminal marker followed by a tab-separated check summary: +Monitor markers: - `ALL_CHECKS_PASSED` - `CHECKS_DONE_WITH_FAILURES` +- `NO_CHECKS_REGISTERED` +- `DRAFT_PR_WITH_NO_CHECKS` +- `CHECKS_BLOCKED_BY_REVIEW_GATE` ## Workflow ### 1. Identify PR +Run: ```bash -gh pr view --json number,url,headRefName +gh pr view --json number,url,headRefName,isDraft,reviewDecision ``` -Stop if no PR exists for the current branch. +Stop when: +- no PR exists +- draft PR has no checks after monitor grace period: report `DRAFT_PR_WITH_NO_CHECKS` -### 2. Gather Review Feedback +Draft rule: inspect existing checks/feedback only. Do not mark ready for review unless asked. -Run `scripts/fetch_pr_feedback.py` to get categorized feedback already posted on the PR. +### 2. Handle Feedback -### 3. Handle Feedback by LOGAF Priority +Run `uv run scripts/fetch_pr_feedback.py [--pr NUMBER]`. -**Auto-fix (no prompt):** -- `high` - must address (blockers, security, changes requested) -- `medium` - should address (standard feedback) +| Bucket | Action | +|--------|--------| +| `high` | fix | +| `medium` | fix | +| `low` | ask user which to address | +| `bot` | skip informational comments | +| `resolved` | skip | -When fixing feedback: -- Understand the root cause, not just the surface symptom -- Check for similar issues in nearby code or related files -- Fix all instances, not just the one mentioned +Feedback fix checklist: +- verify root cause +- search related code +- fix all instances +- for `review_bot: true`: fix real issues, explain false positives -This includes review bot feedback (items with `review_bot: true`). Treat it the same as human feedback: -- Real issue found → fix it -- False positive → skip, but explain why -- Never silently ignore review bot feedback — always verify the finding - -**Prompt user for selection:** -- `low` - present numbered list and ask which to address: - -``` +Low-priority prompt format: +```text Found 3 low-priority suggestions: 1. [l] "Consider renaming this variable" - @reviewer in api.py:42 2. [nit] "Could use a list comprehension" - @reviewer in utils.py:18 3. [style] "Add a docstring" - @reviewer in models.py:55 -Which would you like to address? (e.g., "1,3" or "all" or "none") +Which should I address? ("1,3", "all", or "none") ``` -**Skip silently:** -- `resolved` threads -- `bot` comments (informational only — Codecov, Dependabot, etc.) +### 3. Check CI Status -### 4. Check CI Status +Run `uv run scripts/fetch_pr_checks.py [--pr NUMBER]`. -Run `scripts/fetch_pr_checks.py` to get structured failure data. +| State | Action | +|-------|--------| +| `failed > 0` and `actionable_pending == 0` | fix failures | +| `actionable_pending > 0` | wait; poll feedback while waiting | +| `pending > 0` and `actionable_pending == 0` | report `CHECKS_BLOCKED_BY_REVIEW_GATE` | +| no checks after grace period | report `NO_CHECKS_REGISTERED` or `DRAFT_PR_WITH_NO_CHECKS` | +| all actionable checks passed | run post-CI feedback check | -**Wait if pending:** If review bot checks (sentry, warden, cursor, bugbot, seer, codeql) are still running, wait before proceeding—they post actionable feedback that must be evaluated. Informational bots (codecov) are not worth waiting for. +Wait for actionable review bots: sentry, warden, cursor, bugbot, seer, codeql. +Do not wait for approval, `isDraft`, `REVIEW_REQUIRED`, Codecov, or informational bots. -### 5. Fix CI Failures - -**Investigation is mandatory before any fix.** Do not guess, assume, or infer the cause from the check name or a surface-level reading of the error. You must trace the failure to its root cause in the actual code. +### 4. Fix CI Failures For each failure: +1. read full log: `gh run view --log-failed` +2. trace from assertion/exception/lint rule to source +3. state the cause before editing: "fails because X, affected by Y" +4. search related call sites/patterns +5. fix root cause, not symptom +6. add focused test coverage when needed -1. **Read the full log, not just the snippet.** Use `gh run view --log-failed` if the snippet is truncated or ambiguous. Identify the exact failing assertion, exception, or lint rule. -2. **Trace backwards from the failure to the cause.** Follow the stack trace or error message into the source code. Read the relevant functions, types, and call sites — not just the line flagged. Do not stop at the first plausible explanation. -3. **Verify your understanding before touching code.** You should be able to state: "This fails because X, which was introduced/affected by Y." If you cannot state that clearly, keep investigating. -4. **Do not assume the feedback is wrong.** If a check flags something that seems incorrect, investigate fully before concluding it's a false positive. Most apparent false positives turn out to be real issues on closer inspection. -5. **Check for related instances.** If a type error, import issue, or logic bug exists at one call site, search for the same pattern in nearby code and related files. Fix all instances. -6. **Fix the root cause with minimal, targeted changes.** Do not paper over the symptom with a workaround. -7. **Extend tests when needed.** If the fix introduces behavior not covered by existing tests, add a test case (not a whole new test file). - -### 6. Verify Locally, Then Commit and Push +### 5. Verify Locally, Then Commit and Push -Before committing, verify your fixes locally: -- If you fixed a test failure: re-run that specific test locally -- If you fixed a lint/type error: re-run the linter or type checker on affected files -- For any code fix: run existing tests covering the changed code - -If local verification fails, fix before proceeding — do not push known-broken code. +Before commit: +- test fix: rerun specific test +- lint/type fix: rerun affected checker +- code fix: rerun covering tests +- local failure: fix before pushing ```bash git add @@ -145,51 +113,32 @@ git commit -m "fix: " git push ``` -### 7. Monitor CI and Address Feedback - -Keep monitoring CI status and review feedback in a loop instead of blocking: - -1. Run `uv run scripts/fetch_pr_checks.py` to get current CI status -2. If all checks passed, proceed to exit conditions -3. If any checks failed (none pending), return to step 5 -4. If checks are still pending: - a. Run `uv run scripts/fetch_pr_feedback.py` for new review feedback - b. Address any new high/medium feedback immediately (same as step 3) - c. If changes were needed, commit and push (this restarts CI), then continue monitoring from the refreshed branch state - d. Sleep 30 seconds (don't increase on subsequent iterations), then repeat from sub-step 1 -5. After all checks pass, wait 10 seconds for late-arriving review bot comments, then run `uv run scripts/fetch_pr_feedback.py`. Address any new high/medium feedback — if changes are needed, return to step 6. - -If you're in Claude Code, you can replace the sleep-based wait above with `MonitorTool` so the polling happens in the background instead of consuming context. This is a Claude-only optimization, not the default workflow for other agents. - -Run the bundled monitor script through `MonitorTool` with `persistent: false`: - -```bash -uv run scripts/monitor_pr_checks.py -``` - -Set `timeout_ms` to match the repository's normal CI duration instead of hardcoding a 15-minute timeout. +### 6. Monitor CI and Address Feedback -After `MonitorTool` reports completion, re-run `uv run scripts/fetch_pr_checks.py`: -- If any checks failed, return to step 5. -- If all checks passed, continue to sub-step 5 above. +Loop: +1. run `uv run scripts/fetch_pr_checks.py` +2. handle table in step 3 +3. while `actionable_pending > 0`, run `uv run scripts/fetch_pr_feedback.py` +4. fix new high/medium feedback immediately +5. if changed, verify, commit, push, restart loop +6. otherwise sleep 30 seconds and repeat +7. after checks pass, wait 10 seconds, fetch feedback once more +8. if new high/medium feedback exists, return to step 4 -If you pushed new changes while monitoring, start a fresh monitor so it watches the new set of CI runs. - -### 8. Repeat - -If step 7 required code changes (from new feedback after CI passed), return to step 2 for a fresh cycle. CI failures during monitoring are already handled within step 7's polling loop. +Claude Code optional: run `uv run scripts/monitor_pr_checks.py` through `MonitorTool` with `persistent: false`; set timeout to normal repo CI duration. Restart the monitor after every push. ## Exit Conditions -**Success:** All checks pass, post-CI feedback re-check is clean (no new unaddressed high/medium feedback including review bot findings), user has decided on low-priority items. - -**Ask for help:** Same failure after 2 attempts, feedback needs clarification, infrastructure issues. - -**Stop:** No PR exists, branch needs rebase. +| Exit | Conditions | +|------|------------| +| Success | actionable CI passed; post-CI feedback clean; low-priority choice handled | +| Ask user | same failure after 2 attempts; feedback unclear; infrastructure issue | +| Stop | no PR; branch needs rebase; no checks; draft no-checks; only human gates remain | ## Fallback If scripts fail, use `gh` CLI directly: -- `gh pr checks name,state,bucket,link` +- `gh pr view --json number,url,headRefName,isDraft,reviewDecision` +- `gh pr checks --json name,state,bucket,description,link` - `gh run view --log-failed` - `gh api repos/{owner}/{repo}/pulls/{number}/comments` diff --git a/skills/iterate-pr/SPEC.md b/skills/iterate-pr/SPEC.md new file mode 100644 index 0000000..e4bbb28 --- /dev/null +++ b/skills/iterate-pr/SPEC.md @@ -0,0 +1,93 @@ +# Iterate PR Specification + +## Intent + +The `iterate-pr` skill drives a pull request through actionable CI failures and actionable review feedback until the work is locally fixed, pushed, and rechecked. + +Its purpose is CI and feedback iteration, not merge readiness. It must not wait indefinitely for human approvals, required review decisions, draft PR state changes, or other gates that an agent cannot resolve by editing code. + +## Scope + +In scope: + +- Identifying the PR for the current branch. +- Fetching and categorizing PR review feedback. +- Fixing high and medium priority review feedback. +- Asking the user which low priority suggestions to address. +- Fetching CI checks, failed logs, and failure snippets. +- Fixing CI failures with local verification before pushing. +- Monitoring checks until they pass, fail, or reach a non-actionable stop state. +- Reporting draft/no-checks and human review/approval gates without polling forever. + +Out of scope: + +- Waiting for or requesting human approval. +- Marking draft PRs ready for review unless the user explicitly asks. +- Merging PRs. +- Rebasing branches without user direction. +- Treating Codecov, Dependabot, or other informational comments as review feedback. + +## Users And Trigger Context + +- Primary users: engineers and coding agents iterating on existing pull requests. +- Common user requests: fix CI on this PR, iterate on this PR until checks pass, address PR feedback, keep pushing fixes until green. +- Should not trigger for: creating a PR, writing commits without a PR, reviewing unrelated code, or monitoring merge approval state only. + +## Runtime Contract + +- Required first actions: resolve the current PR, read `isDraft` and `reviewDecision`, fetch current review feedback, and fetch current CI state before editing. +- Required outputs: concise progress updates, commits and pushes when fixes are made, and a final state that distinguishes passing CI from non-actionable review/draft/approval gates. +- Non-negotiable constraints: investigate failures before editing, verify locally before pushing, do not push known-broken fixes, do not wait for human approval, and do not treat draft PRs with no checks as pending forever. +- Expected bundled files loaded at runtime: `SKILL.md` and, when needed, scripts under `scripts/`. + +## Source And Evidence Model + +Authoritative sources: + +- GitHub CLI PR and checks output. +- Sentry LOGAF review guidance. +- Repository-level agent instructions. +- Bundled script behavior documented in `SKILL.md`. + +Useful improvement sources: + +- positive examples: PRs where CI failures were fixed and checks passed after the loop. +- negative examples: PRs where the agent waited on draft status, required review, or approval gates. +- issue or PR feedback: reviewer comments about missing fixes, false positives, or feedback categorization. +- validation results: structural skill validation and script syntax checks. + +Data that must not be stored: + +- secrets +- customer data +- private URLs or identifiers not needed for reproduction +- full CI logs when small failure snippets are enough + +## Reference Architecture + +- `SKILL.md` contains the runtime workflow, script contracts, feedback handling rules, CI loop, and exit conditions. +- `SPEC.md` contains this maintenance contract. +- `references/` contains no files currently; add focused troubleshooting or evidence references only if runtime guidance becomes noisy. +- `references/evidence/` contains no files currently; use it for durable positive or negative PR-loop examples if regressions recur. +- `scripts/` contains non-interactive helpers for PR checks, PR feedback, check monitoring, and review-thread replies. +- `assets/` contains no files currently. + +## Validation + +- Lightweight validation: run `uv run skills/skill-writer/scripts/quick_validate.py skills/iterate-pr`. +- Script validation: run `uv run -m py_compile skills/iterate-pr/scripts/*.py` after script changes. +- Holdout examples: include a draft PR with no registered checks, a PR with `reviewDecision: REVIEW_REQUIRED` but passing checks, a PR with an actionable pending CI bot check, and a PR with failed CI logs. +- Acceptance gates: validator passes, scripts compile, draft/no-check states terminate with a report, human review gates are not treated as actionable pending CI, and actionable CI failures still route back to investigation and fixes. + +## Known Limitations + +- Human-gate detection depends on check names, states, and descriptions exposed by GitHub or CI integrations. +- Some repositories may intentionally model deployment or approval workflows as status checks; this skill reports those as blocked/non-actionable unless the user asks to manage that gate. +- The helper scripts use GitHub CLI output and can drift if `gh pr checks` changes its JSON schema. + +## Maintenance Notes + +- Update `SKILL.md` when the runtime loop, script contracts, feedback policy, or exit conditions change. +- Update `SPEC.md` when the skill's scope, validation expectations, or non-actionable gate policy changes. +- Add focused reference files only when repeated troubleshooting guidance would make `SKILL.md` hard to scan. +- Keep public inventories pointed at the canonical `skills/iterate-pr` skill, not mirrors or aliases. diff --git a/skills/iterate-pr/scripts/fetch_pr_checks.py b/skills/iterate-pr/scripts/fetch_pr_checks.py index 03d1edf..9b19493 100755 --- a/skills/iterate-pr/scripts/fetch_pr_checks.py +++ b/skills/iterate-pr/scripts/fetch_pr_checks.py @@ -6,7 +6,7 @@ Fetch PR CI checks and extract relevant failure snippets. Usage: - python fetch_pr_checks.py [--pr PR_NUMBER] + uv run fetch_pr_checks.py [--pr PR_NUMBER] If --pr is not specified, uses the PR for the current branch. @@ -21,6 +21,17 @@ import sys from typing import Any +HUMAN_GATE_PATTERNS = [ + r"(?i)review\s+required", + r"(?i)required\s+review", + r"(?i)requires\s+review", + r"(?i)required\s+approving\s+review", + r"(?i)approval\s+required", + r"(?i)waiting\s+for\s+approval", + r"(?i)manual\s+approval", + r"(?i)draft\s+(pull\s+request|pr)", +] + def run_gh(args: list[str]) -> dict[str, Any] | list[Any] | None: """Run a gh CLI command and return parsed JSON output.""" @@ -41,17 +52,23 @@ def run_gh(args: list[str]) -> dict[str, Any] | list[Any] | None: def get_pr_info(pr_number: int | None = None) -> dict[str, Any] | None: """Get PR info, optionally by number or for current branch.""" - args = ["pr", "view", "--json", "number,url,headRefName,baseRefName"] + args = [ + "pr", + "view", + "--json", + "number,url,headRefName,baseRefName,isDraft,reviewDecision", + ] if pr_number: args.insert(2, str(pr_number)) return run_gh(args) def get_checks(pr_number: int | None = None) -> list[dict[str, Any]]: - """Get all checks for a PR by parsing tab-separated gh output.""" + """Get all checks for a PR.""" args = ["gh", "pr", "checks"] if pr_number: args.append(str(pr_number)) + args.extend(["--json", "name,bucket,link,workflow,state,description,event"]) try: result = subprocess.run( args, @@ -60,6 +77,12 @@ def get_checks(pr_number: int | None = None) -> list[dict[str, Any]]: ) if not result.stdout.strip(): return [] + try: + checks = json.loads(result.stdout) + return checks if isinstance(checks, list) else [] + except json.JSONDecodeError: + pass + checks = [] for line in result.stdout.strip().split("\n"): if not line.strip(): @@ -77,6 +100,15 @@ def get_checks(pr_number: int | None = None) -> list[dict[str, Any]]: return [] +def is_human_gate_check(check: dict[str, Any]) -> bool: + """Return true when a pending entry is a human review/approval gate.""" + haystack = " ".join( + str(check.get(field, "")) + for field in ("name", "state", "description", "workflow") + ) + return any(re.search(pattern, haystack) for pattern in HUMAN_GATE_PATTERNS) + + def get_failed_runs(branch: str) -> list[dict[str, Any]]: """Get recent failed workflow runs for a branch.""" result = run_gh([ @@ -190,12 +222,20 @@ def main(): failed_runs = None # Lazy load for check in checks: + status = check.get("bucket", check.get("state", "unknown")) + human_gate = status == "pending" and is_human_gate_check(check) processed = { "name": check.get("name", "unknown"), - "status": check.get("bucket", check.get("state", "unknown")), + "status": status, "link": check.get("link", ""), "workflow": check.get("workflow", ""), } + if check.get("state"): + processed["state"] = check["state"] + if check.get("description"): + processed["description"] = check["description"] + if human_gate: + processed["human_gate"] = True # For failures, try to get log snippet if processed["status"] == "fail": @@ -224,17 +264,40 @@ def main(): "url": pr_info.get("url", ""), "branch": branch, "base": pr_info.get("baseRefName", ""), + "is_draft": bool(pr_info.get("isDraft")), + "review_decision": pr_info.get("reviewDecision", ""), }, "summary": { "total": len(processed_checks), "passed": sum(1 for c in processed_checks if c["status"] == "pass"), "failed": sum(1 for c in processed_checks if c["status"] == "fail"), "pending": sum(1 for c in processed_checks if c["status"] == "pending"), + "actionable_pending": sum( + 1 + for c in processed_checks + if c["status"] == "pending" and not c.get("human_gate") + ), + "human_gate_pending": sum( + 1 + for c in processed_checks + if c["status"] == "pending" and c.get("human_gate") + ), "skipped": sum(1 for c in processed_checks if c["status"] in ("skipping", "cancel")), }, "checks": processed_checks, } + if pr_info.get("isDraft") and not processed_checks: + output["action_required"] = "Draft PR has no registered checks; do not wait for CI indefinitely" + elif not processed_checks: + output["action_required"] = "No registered checks; monitor before reporting NO_CHECKS_REGISTERED" + elif output["summary"]["actionable_pending"]: + output["action_required"] = "Wait for actionable checks to finish; poll feedback while waiting" + elif output["summary"]["failed"]: + output["action_required"] = "Address failed checks" + elif output["summary"]["pending"] and not output["summary"]["actionable_pending"]: + output["action_required"] = "Only human review or approval gates remain pending" + print(json.dumps(output, indent=2)) diff --git a/skills/iterate-pr/scripts/fetch_pr_feedback.py b/skills/iterate-pr/scripts/fetch_pr_feedback.py index 80517d3..6186e0b 100755 --- a/skills/iterate-pr/scripts/fetch_pr_feedback.py +++ b/skills/iterate-pr/scripts/fetch_pr_feedback.py @@ -6,7 +6,7 @@ Fetch and categorize PR review feedback. Usage: - python fetch_pr_feedback.py [--pr PR_NUMBER] + uv run fetch_pr_feedback.py [--pr PR_NUMBER] If --pr is not specified, uses the PR for the current branch. diff --git a/skills/iterate-pr/scripts/monitor_pr_checks.py b/skills/iterate-pr/scripts/monitor_pr_checks.py index de2fbce..797e792 100644 --- a/skills/iterate-pr/scripts/monitor_pr_checks.py +++ b/skills/iterate-pr/scripts/monitor_pr_checks.py @@ -13,6 +13,9 @@ Output: - Prints `ALL_CHECKS_PASSED` when all checks finish without failures - Prints `CHECKS_DONE_WITH_FAILURES` when checks finish with failures + - Prints `NO_CHECKS_REGISTERED` when checks do not appear after the grace period + - Prints `DRAFT_PR_WITH_NO_CHECKS` when a draft PR has no checks after the grace period + - Prints `CHECKS_BLOCKED_BY_REVIEW_GATE` when only human review/approval gates remain - Prints a tab-separated check summary after the terminal marker The script stays quiet while polling so background monitor tools do not emit @@ -24,15 +27,28 @@ import argparse import json +import re import subprocess import sys import time from typing import Any +HUMAN_GATE_PATTERNS = [ + r"(?i)review\s+required", + r"(?i)required\s+review", + r"(?i)requires\s+review", + r"(?i)required\s+approving\s+review", + r"(?i)approval\s+required", + r"(?i)waiting\s+for\s+approval", + r"(?i)manual\s+approval", + r"(?i)draft\s+(pull\s+request|pr)", +] + def run_gh_json( args: list[str], allowed_returncodes: tuple[int, ...] = (0,), + empty_stdout_value: list[dict[str, Any]] | dict[str, Any] | None = None, ) -> list[dict[str, Any]] | dict[str, Any] | None: """Run a gh command that returns JSON.""" result = subprocess.run( @@ -45,7 +61,7 @@ def run_gh_json( return None if not result.stdout.strip(): - return None + return empty_stdout_value try: return json.loads(result.stdout) @@ -53,17 +69,24 @@ def run_gh_json( return None -def get_pr_number(pr_number: int | None) -> int | None: - """Resolve the PR number to monitor.""" +def get_pr_info(pr_number: int | None) -> dict[str, Any] | None: + """Resolve the PR to monitor.""" if pr_number is not None: - return pr_number + pr_info = run_gh_json([ + "pr", + "view", + str(pr_number), + "--json", + "number,url,isDraft,reviewDecision", + ]) + else: + pr_info = run_gh_json(["pr", "view", "--json", "number,url,isDraft,reviewDecision"]) - pr_info = run_gh_json(["pr", "view", "--json", "number"]) if not isinstance(pr_info, dict): return None number = pr_info.get("number") - return number if isinstance(number, int) else None + return pr_info if isinstance(number, int) else None def get_checks(pr_number: int) -> list[dict[str, Any]] | None: @@ -73,11 +96,20 @@ def get_checks(pr_number: int) -> list[dict[str, Any]] | None: "checks", str(pr_number), "--json", - "name,bucket,link", - ], allowed_returncodes=(0, 1, 8)) + "name,bucket,link,workflow,state,description", + ], allowed_returncodes=(0, 1, 8, 16), empty_stdout_value=[]) return checks if isinstance(checks, list) else None +def is_human_gate_check(check: dict[str, Any]) -> bool: + """Return true when a pending entry is a human review/approval gate.""" + haystack = " ".join( + str(check.get(field, "")) + for field in ("name", "state", "description", "workflow") + ) + return any(re.search(pattern, haystack) for pattern in HUMAN_GATE_PATTERNS) + + def print_check_summary(checks: list[dict[str, Any]], max_lines: int = 20) -> None: """Print a concise tab-separated check summary.""" for check in checks[:max_lines]: @@ -87,6 +119,17 @@ def print_check_summary(checks: list[dict[str, Any]], max_lines: int = 20) -> No print(f"{name}\t{bucket}\t{link}".rstrip(), flush=True) +def print_no_checks_summary(pr_info: dict[str, Any]) -> None: + number = pr_info.get("number", "unknown") + url = pr_info.get("url", "") + is_draft = str(bool(pr_info.get("isDraft"))).lower() + review_decision = str(pr_info.get("reviewDecision") or "") + print(f"PR #{number}\tno_checks\t{url}".rstrip(), flush=True) + print(f"is_draft\t{is_draft}", flush=True) + if review_decision: + print(f"review_decision\t{review_decision}", flush=True) + + def main() -> int: parser = argparse.ArgumentParser(description="Monitor PR checks until they finish") parser.add_argument("--pr", type=int, help="PR number (defaults to current branch PR)") @@ -102,13 +145,22 @@ def main() -> int: default=15, help="Retry delay when a fresh push has not registered any checks yet", ) + parser.add_argument( + "--no-checks-timeout-seconds", + type=int, + default=180, + help="Maximum time to wait for checks to register before reporting no checks", + ) args = parser.parse_args() - pr_number = get_pr_number(args.pr) - if pr_number is None: + pr_info = get_pr_info(args.pr) + if pr_info is None: print("No PR found for current branch", file=sys.stderr) return 1 + pr_number = pr_info["number"] + no_checks_started_at: float | None = None + while True: checks = get_checks(pr_number) if checks is None: @@ -116,20 +168,37 @@ def main() -> int: continue if not checks: + now = time.monotonic() + if no_checks_started_at is None: + no_checks_started_at = now + if now - no_checks_started_at >= args.no_checks_timeout_seconds: + marker = "DRAFT_PR_WITH_NO_CHECKS" if pr_info.get("isDraft") else "NO_CHECKS_REGISTERED" + print(marker, flush=True) + print_no_checks_summary(pr_info) + return 0 time.sleep(args.no_checks_seconds) continue - pending = sum(1 for check in checks if check.get("bucket") == "pending") - if pending: - time.sleep(args.poll_seconds) - continue + no_checks_started_at = None + pending_checks = [check for check in checks if check.get("bucket") == "pending"] failed = sum(1 for check in checks if check.get("bucket") == "fail") + actionable_pending = [ + check for check in pending_checks if not is_human_gate_check(check) + ] + if actionable_pending: + time.sleep(args.poll_seconds) + continue if failed: print("CHECKS_DONE_WITH_FAILURES", flush=True) - else: - print("ALL_CHECKS_PASSED", flush=True) - + print_check_summary(checks) + return 0 + if pending_checks: + print("CHECKS_BLOCKED_BY_REVIEW_GATE", flush=True) + print_check_summary(checks) + return 0 + + print("ALL_CHECKS_PASSED", flush=True) print_check_summary(checks) return 0 diff --git a/skills/iterate-pr/scripts/reply_to_thread.py b/skills/iterate-pr/scripts/reply_to_thread.py index 24237ce..51bfa6a 100644 --- a/skills/iterate-pr/scripts/reply_to_thread.py +++ b/skills/iterate-pr/scripts/reply_to_thread.py @@ -6,14 +6,14 @@ Reply to PR review threads. Usage: - python reply_to_thread.py THREAD_ID BODY [THREAD_ID BODY ...] + uv run reply_to_thread.py THREAD_ID BODY [THREAD_ID BODY ...] Accepts one or more (thread_id, body) pairs as positional arguments. Batches all replies into a single GraphQL mutation for efficiency. Example: - python reply_to_thread.py PRRT_abc "Fixed the issue.\n\n*— Claude Code*" - python reply_to_thread.py PRRT_abc "Fixed." PRRT_def "Also fixed." + uv run reply_to_thread.py PRRT_abc "Fixed the issue.\n\n*— Claude Code*" + uv run reply_to_thread.py PRRT_abc "Fixed." PRRT_def "Also fixed." """ from __future__ import annotations