Skip fork PRs in AI review workflow (closes CodeQL #11, #12)

igerber · claude · igerber · commit f761add3897b · 2026-05-13T20:33:50.000-04:00
GitHub Code Scanning flagged two errors on `.github/workflows/ai_pr_review.yml`: - #11 actions/untrusted-checkout/high — "Checkout of untrusted code in trusted context" - #12 actions/untrusted-checkout-toctou/high — "Insufficient protection against execution of untrusted code on a privileged workflow (issue_comment)" CodeQL flags the structural pattern: the workflow runs `actions/checkout@v6` with the PR head's repo+sha in a context that has access to OPENAI_API_KEY and write permissions on PRs/issues. Today we don't execute the checked-out code (just read it via `git diff`), but the pattern is risky — a future edit adding "run tests" or "install deps" would turn it into an active RCE on fork PRs. The PR-author-vs-commenter mismatch (#12) is real: a maintainer commenting `/ai-review` on a fork PR passes the author_association check, but the code being checked out is the fork's, which the maintainer didn't author. Fix: skip fork PRs entirely. Two-layer gate (different mechanics required for the three event types): 1. Workflow `if:` for `pull_request` events: require `github.event.pull_request.head.repo.full_name == github.repository`. Fork PRs opened against this repo no longer start a workflow run. 2. For `issue_comment` and `pull_request_review_comment` events (event payloads don't include head-repo info — must API-fetch), the resolve-pr step now sets a new `is_fork` output. All 7 post-resolve steps are gated on `steps.pr.outputs.is_fork == 'false'`: - actions/checkout (closed PR path) - actions/checkout (open PR path) — the line CodeQL flags - Pre-fetch base SHA - Fetch previous AI review - Build review prompt - Run Codex - Post PR comment For comment-triggered events on fork PRs, the resolve-pr step prints a yellow `core.notice()` banner in the Actions UI explaining the skip; no checkout, no codex, no comment. Background: per a prior thread on the same workflow, fork PRs already fail today (secrets aren't passed to fork-PR `pull_request` events; comment triggers on fork PRs would crash because OPENAI_API_KEY is empty). So skipping cleanly just makes the failure honest — no real loss of behavior. Tests added (TestWorkflowForkSkip class, 3 contract tests): - test_workflow_pull_request_if_block_excludes_fork_prs - test_workflow_resolve_pr_step_sets_is_fork_output - test_workflow_post_resolve_steps_gated_on_is_fork (asserts ≥7 occurrences of `is_fork == 'false'` so a future workflow refactor adding a new PR-content-touching step must extend the gate) Out of scope: - Adding `pull_request_target` for fork PRs (would need much more hardening; not justified for our use case) - Posting a PR comment explaining the skip (Actions UI notice is enough) Verification: - `pytest tests/test_openai_review.py -q` → 214 passed (3 new). - `python3 -c "import yaml; yaml.safe_load(...)"` → no errors. - 7 step-level `is_fork == 'false'` gates confirmed. - CodeQL on this PR should NOT re-flag #11/#12 — early validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/.github/workflows/ai_pr_review.yml b/.github/workflows/ai_pr_review.yml
@@ -22,10 +22,18 @@ jobs:
     runs-on: ubuntu-latest
 
     # Run if:
-    # - PR opened, OR
+    # - PR opened (same-repo only — fork PRs are skipped here at the workflow
+    #   level, since `head.repo.full_name` is available on `pull_request`
+    #   events). For comment-triggered events, the same-repo check happens at
+    #   the step level via `steps.pr.outputs.is_fork` (we have to API-fetch
+    #   the PR there because `issue_comment` event payloads don't include
+    #   head-repo info). Closes CodeQL alerts #11, #12 (untrusted checkout).
     # - Comment "/ai-review" on a PR by a collaborator/member/owner (issue or inline diff comment)
     if: |
-      (github.event_name == 'pull_request') ||
+      (
+        github.event_name == 'pull_request' &&
+        github.event.pull_request.head.repo.full_name == github.repository
+      ) ||
       (
         github.event_name == 'issue_comment' &&
         github.event.issue.pull_request != null &&
@@ -75,6 +83,27 @@ jobs:
             const headRepoFullName =
               pr.data.head.repo?.full_name || `${owner}/${repo}`;
 
+            // Fork detection for comment-triggered events. Workflow-level
+            // `if:` already excludes fork PRs for `pull_request` events
+            // (uses `github.event.pull_request.head.repo.full_name`), but
+            // `issue_comment` / `pull_request_review_comment` payloads
+            // don't include head-repo info — we have to API-fetch.
+            // Subsequent steps gate on `steps.pr.outputs.is_fork == 'false'`.
+            // Uses raw `pr.data.head.repo?.full_name` (NOT the fallback
+            // `headRepoFullName`): a deleted fork should still skip the
+            // workflow because the code IS untrusted from CodeQL's
+            // perspective. `undefined !== "owner/repo"` -> is_fork = true.
+            // Closes CodeQL alerts #11, #12 (untrusted checkout TOCTOU).
+            const isFork =
+              pr.data.head.repo?.full_name !== `${owner}/${repo}`;
+            if (isFork) {
+              core.notice(
+                `AI review skipped: PR head is from fork ` +
+                `${pr.data.head.repo?.full_name || "(deleted)"}. ` +
+                `See CodeQL alerts #11, #12 for the security rationale.`
+              );
+            }
+
             core.setOutput("number", prNumber);
             core.setOutput("title", pr.data.title || "");
             core.setOutput("body", pr.data.body || "");
@@ -84,14 +113,15 @@ jobs:
             core.setOutput("head_ref", pr.data.head.ref);
             core.setOutput("head_repo_full_name", headRepoFullName);
             core.setOutput("state", pr.data.state);
+            core.setOutput("is_fork", isFork ? "true" : "false");
 
       # Closed/merged PR (e.g. `/ai-review` rerun on a merged PR):
       # use the base-repo mirror of the PR head, which GitHub keeps
       # durably even after the fork is deleted or branches removed.
       # The previous workflow used `refs/pull/<N>/merge`, which is
       # garbage-collected on closed PRs — this path replaces that.
       - uses: actions/checkout@v6
-        if: steps.pr.outputs.state != 'open'
+        if: steps.pr.outputs.state != 'open' && steps.pr.outputs.is_fork == 'false'
         with:
           ref: refs/pull/${{ steps.pr.outputs.number }}/head
 
@@ -102,12 +132,13 @@ jobs:
       # (see .claude/commands/submit-pr.md:327-345). head_sha is
       # guaranteed to exist on the head repo for an open PR.
       - uses: actions/checkout@v6
-        if: steps.pr.outputs.state == 'open'
+        if: steps.pr.outputs.state == 'open' && steps.pr.outputs.is_fork == 'false'
         with:
           repository: ${{ steps.pr.outputs.head_repo_full_name }}
           ref: ${{ steps.pr.outputs.head_sha }}
 
       - name: Pre-fetch base SHA
+        if: steps.pr.outputs.is_fork == 'false'
         run: |
           set -euo pipefail
           # base_sha lives on the base repo (github.repository), which differs
@@ -119,6 +150,7 @@ jobs:
 
       - name: Fetch previous AI review (if any)
         id: prev_review
+        if: steps.pr.outputs.is_fork == 'false'
         uses: actions/github-script@v9
         with:
           script: |
@@ -136,6 +168,7 @@ jobs:
             core.setOutput("found", last ? "true" : "false");
 
       - name: Build review prompt with PR context + diff
+        if: steps.pr.outputs.is_fork == 'false'
         env:
           PR_TITLE: ${{ steps.pr.outputs.title }}
           PR_BODY: ${{ steps.pr.outputs.body }}
@@ -415,6 +448,7 @@ jobs:
 
       - name: Run Codex
         id: run_codex
+        if: steps.pr.outputs.is_fork == 'false'
         uses: openai/codex-action@v1
         with:
           openai-api-key: ${{ secrets.OPENAI_API_KEY }}
@@ -426,6 +460,7 @@ jobs:
           effort: xhigh
 
       - name: Post PR comment (new on every event except initial open)
+        if: steps.pr.outputs.is_fork == 'false'
         uses: actions/github-script@v9
         env:
           CODEX_FINAL_MESSAGE: ${{ steps.run_codex.outputs.final-message }}
diff --git a/tests/test_openai_review.py b/tests/test_openai_review.py
@@ -2557,6 +2557,72 @@ def test_notice_caps_output_at_10_files(
         assert "and 15 more" in err
 
 
+class TestWorkflowForkSkip:
+    """The AI review workflow must skip PRs from forks to avoid the
+    untrusted-checkout pattern that CodeQL flagged as alerts #11 and #12.
+    Two-layer skip:
+      1. Workflow-level `if:` gates `pull_request` events on
+         `head.repo.full_name == github.repository`
+      2. The resolve-pr step sets `is_fork` output (via API fetch);
+         all 7 post-resolve steps gate on `is_fork == 'false'`.
+
+    These contract tests pin both layers — without them, a future workflow
+    refactor could drop the gate and re-introduce the CodeQL alerts."""
+
+    @pytest.fixture
+    def workflow_text(self):
+        assert _SCRIPT_PATH is not None
+        repo_root = _SCRIPT_PATH.parent.parent.parent
+        wf = repo_root / ".github" / "workflows" / "ai_pr_review.yml"
+        if not wf.exists():
+            pytest.skip("workflow not found")
+        return wf.read_text()
+
+    def test_workflow_pull_request_if_block_excludes_fork_prs(self, workflow_text):
+        """Layer 1: the workflow `if:` block for `pull_request` events must
+        require head.repo.full_name == github.repository so fork PRs never
+        start a workflow run."""
+        assert (
+            "github.event.pull_request.head.repo.full_name == github.repository"
+            in workflow_text
+        ), (
+            "workflow `if:` for pull_request events must check that the PR "
+            "head is from the same repo (not a fork) — required to clear "
+            "CodeQL alerts #11/#12 (untrusted checkout)."
+        )
+
+    def test_workflow_resolve_pr_step_sets_is_fork_output(self, workflow_text):
+        """Layer 2: the resolve-pr github-script step must set the `is_fork`
+        output that subsequent steps gate on. Comment-triggered events
+        (`issue_comment`, `pull_request_review_comment`) can't be gated at
+        the workflow `if:` level (event payload doesn't include head-repo
+        info), so the gate happens at the step level via this output."""
+        assert 'core.setOutput("is_fork"' in workflow_text, (
+            "resolve-pr step must set `is_fork` output so post-resolve steps "
+            "can gate on `steps.pr.outputs.is_fork == 'false'`."
+        )
+
+    def test_workflow_post_resolve_steps_gated_on_is_fork(self, workflow_text):
+        """All steps that run AFTER resolve-pr (and could touch untrusted
+        PR contents) must have `if: steps.pr.outputs.is_fork == 'false'`.
+
+        Count derivation: 7 gated steps total — 2 checkouts (closed PR
+        path, open PR path) + Pre-fetch base SHA + Fetch previous AI
+        review + Build review prompt + Run Codex + Post PR comment.
+        Adding a new step that handles PR contents must extend this
+        count.
+
+        Allow ≥7 to also count incidental references in code comments
+        (e.g. the JS doc comment in the resolve-pr step itself)."""
+        count = workflow_text.count("is_fork == 'false'")
+        assert count >= 7, (
+            f"Expected at least 7 occurrences of `is_fork == 'false'` (one "
+            f"per gated post-resolve step); found {count}. A new step that "
+            f"reads PR contents must also be gated to keep CodeQL alerts "
+            f"#11/#12 closed."
+        )
+
+
 class TestExtractResponseText:
     def test_prefers_output_text_field(self, review_mod):
         result = {"output_text": "Direct text.", "output": []}