Merge pull request #436 from igerber/dismiss-codeql-14

igerber · web-flow · commit c56dbcb62eba · 2026-05-15T09:06:52.000-04:00
Dismiss CodeQL #14 (untrusted-checkout-toctou) with guard test
diff --git a/.github/workflows/ai_pr_review.yml b/.github/workflows/ai_pr_review.yml
@@ -55,6 +55,66 @@ jobs:
       )
 
     steps:
+      # ─────────────────────────────────────────────────────────────────
+      # CodeQL alert #14 (untrusted-checkout-toctou/critical) was
+      # dismissed on 2026-05-14 as "won't fix" with the rationale
+      # documented below. The dismissal is valid ONLY while ALL of
+      # the following hold:
+      #   1. The Codex action runs with sandbox: read-only (see the
+      #      Run Codex step below). No step EXECUTES PR-head FILE
+      #      BYTES — no `pip install -e .`, `pytest`, `npm install`,
+      #      `cargo run`, `make`, `./configure`, `maturin develop`,
+      #      or `python3 <pr-file>.py` against checked-out PR-head
+      #      files. Inline `python3 -c '...'` invocations operating
+      #      on sanitized environment variables (PR_TITLE, PR_BODY,
+      #      PREV_REVIEW) are SAFE — the script body comes from base,
+      #      not from PR head.
+      #   2. The fork-skip gate (`is_fork == 'false'`) gates every
+      #      post-resolve step, including both actions/checkout
+      #      invocations.
+      #   3. The author_association check on issue_comment /
+      #      review-comment events restricts triggers to
+      #      OWNER/MEMBER/COLLABORATOR.
+      #   4. head_sha is API-pinned in this resolve-pr step before
+      #      checkout.
+      #
+      # If you are adding a step that VIOLATES any of these
+      # invariants, this dismissal is invalid. Either remove the
+      # offending step OR restructure the workflow to checkout
+      # BASE_SHA only and use `git show` for PR-head reads (degrades
+      # reviewer quality — see plan archive).
+      #
+      # Guard test: tests/test_openai_review.py
+      #   ::TestWorkflowDoesNotExecutePRHeadCode
+      # Catches the common ACCIDENTAL-regression forms: pip / npm /
+      # yarn / cargo / make / maturin / poetry / pdm / uv / tox /
+      # setup.py install/run, python file execution against PR-head
+      # paths, bash -c / sh -c (including compound flags -lc/-ec/-exc)
+      # with python/cp inside, env-var prefixes (`VAR=1 python3 ...`),
+      # wrapper commands (`env`, `nohup`, `exec`, `time`, `command`),
+      # shell negation (`!`), backslash line continuations, single-line
+      # `python3 -c` payloads (literal-allowlisted, currently empty),
+      # subshell `(...)`, brace group `{ ...; }`, and write-overwrites
+      # of allowlisted /tmp paths between staging and execution
+      # (`>`, `>>`, `cp`, `mv`, `tee`, `ln`).
+      #
+      # NOT exhaustive against an adversarial maintainer. Known
+      # unmodeled paths (any of these CAN execute PR-head bytes
+      # without tripping CI; a maintainer reviewing this workflow
+      # MUST refuse changes that introduce them):
+      #   - bash <script> / sh <script> / ./<script> / source <script>
+      #     / . <script>  — direct shell-script execution
+      #   - multi-line `python3 -c` bodies (line-by-line shlex can't
+      #     reassemble across newlines; the workflow's existing 5
+      #     sanitizer bodies are therefore exempt by invisibility)
+      #   - variable expansion: `SCRIPT="$X"; python3 "$SCRIPT"`
+      #   - `eval`, `find -exec`, `xargs -I {}`
+      # The dismissal's primary defense is THIS comment block plus
+      # the `dismissed_comment` field on alert #14. The guard test
+      # is belt-and-suspenders for accidental regressions, not a
+      # complete adversarial parser.
+      # See TODO.md for the long-term tracking of this gap.
+      # ─────────────────────────────────────────────────────────────────
       - name: Resolve PR number + metadata
         id: pr
         uses: actions/github-script@v9
diff --git a/TODO.md b/TODO.md
@@ -141,6 +141,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | HonestDiD `test_m0_short_circuit` uses wall-clock `elapsed < 0.5s` as a proxy for "short-circuit path taken" instead of calling the full optimizer. Replace with a direct correctness signal (mock/spy the optimizer or check a state flag) so the test doesn't depend on CI timing. Not flaky today at 500ms, but load-bearing correctness on a timing proxy is brittle. | `tests/test_methodology_honest_did.py:246` | — | Low |
 | SyntheticDiD: rename internal `placebo_effects` variable to `variance_effects` (or `resampled_effects`). Misleading name across the placebo/bootstrap/jackknife dispatch paths — holds three different contents depending on variance method. Low-risk refactor; user-facing field rename should preserve `placebo_effects` as a deprecated alias for one release. | `synthetic_did.py`, `results.py` | follow-up | Medium |
 | AI review CI: pin workflow contract via test (uses `openai/codex-action@v1`, passes `prompt-file`, reads `steps.run_codex.outputs.final-message`, preserves diff-exclude paths and comment markers). Currently only the wrapper-tag and closing-tag-escape strings are asserted. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #416 | Low |
+| `TestWorkflowDoesNotExecutePRHeadCode` (CodeQL #14 dismissal guard) does not model: `bash <script>` / `sh <script>` / `./<script>` / `source <script>` / `. <script>` direct shell-script execution; multi-line `python3 -c` bodies (line-by-line shlex can't reassemble across newlines — the workflow's 5 sanitizer bodies are exempt by invisibility); shell-variable-expansion indirection (`SCRIPT="$X"; python3 "$SCRIPT"`); `eval`; `find -exec`; `xargs -I {}`. Each represents a path by which PR-head bytes COULD execute without the test failing. The guard catches accidental regressions of common forms (16 tests covering pip/npm/cargo/maturin/etc. installs, python file exec, bash -c indirection with compound flags, env-var prefixes, line continuations, subshells/brace groups, single-line python -c, write-overwrites of allowlisted /tmp paths). Closing the residuals would require multi-line shell parsing with command-substitution awareness + script-execution allowlists — significant work for diminishing return given the dismissal's primary defense is the documented threat model on the alert and in `.github/workflows/ai_pr_review.yml` comment block. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #436 | Low |
 | Render `docs/methodology/REPORTING.md` and `docs/methodology/REGISTRY.md` as in-site Sphinx pages so cross-references can use `:doc:` instead of off-site GitHub `blob/main` URLs. Current state (#410 fix-audit-r2) restores navigable links via `blob/main`, but stable-docs readers can land on a different revision than the package version they are reading. Two viable paths: (a) add `myst-parser` to `docs/conf.py` extensions + docs extras and link with `:doc:`, or (b) convert both files to `.rst`. | `docs/conf.py`, `docs/api/business_report.rst`, `docs/api/diagnostic_report.rst`, `docs/tutorials/18_geo_experiments.ipynb`, `docs/tutorials/19_dcdh_marketing_pulse.ipynb` | follow-up | Low |
 
 ---
diff --git a/tests/test_openai_review.py b/tests/test_openai_review.py