Skip to content

Wire CI AI reviewer to see tutorial notebook prose#414

Open
igerber wants to merge 1 commit into
mainfrom
ci-workflow-ipynb-markdown-extraction
Open

Wire CI AI reviewer to see tutorial notebook prose#414
igerber wants to merge 1 commit into
mainfrom
ci-workflow-ipynb-markdown-extraction

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 10, 2026

Summary

  • Replace the blanket :!docs/tutorials/*.ipynb exclusion in .github/workflows/ai_pr_review.yml with a markdown-extraction step that loops over changed tutorial notebooks and appends prose + code + executed outputs to the reviewer prompt.
  • Drop docs/tutorials/*.ipynb from the DO-NOT list at .github/codex/prompts/pr_review.md and add a pointer to the new "Tutorial notebook prose" prompt block.
  • Reap the temporary T21 review aid at docs/_review/t21_notebook_extract.md and the _review entry in docs/conf.py:exclude_patterns.
  • New tools/notebook_md_extract.py (stdlib-only — no nbformat dep, no pip install step in the workflow) with a _to_str() helper that coerces nbformat raw JSON's list-or-string source / text fields. text/html-only outputs, image/* data, and raw cells are intentionally dropped (see module docstring).
  • --max-output-chars argument caps individual outputs with a truncation marker. Workflow invocation passes --max-output-chars 20000 to bound prompt growth.
  • Per-notebook extraction is fail-soft (|| echo "(extraction failed ...)") so one malformed notebook degrades to a placeholder line rather than killing the AI review job. Job is best-effort (no merge gating).

Methodology references (required if estimator / math changes)

  • Method name(s): N/A — no methodology changes
  • Paper / source link(s): N/A
  • Any intentional deviations from the source (and why): N/A

Validation

  • Tests added/updated: tests/test_notebook_md_extract.py (9 cases: list-vs-string source coercion, _to_str helper, HTML-only / image / raw / error output handling, --max-output-chars truncation, CLI --input/--output, CLI stdout). Inline-fixture pattern with skip-guard on tools/notebook_md_extract.py existence so the test runs cleanly in rust-test.yml's isolated-install matrix.
  • Backtest / simulation / notebook evidence (if applicable): Manual extraction on T20 + T21 confirms markdown headers, code fences, and **Output:** blocks render correctly. Workflow YAML validated via yaml.safe_load. Sphinx config still parses with _review removed from exclude_patterns.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

Blocker — one unmitigated P0 security issue and one P1 sibling-surface regression.

Executive Summary

  • No estimator, weighting, variance/SE, identification, or methodology code is changed.
  • The new CI workflow executes tools/notebook_md_extract.py from the PR checkout before the later Codex step receives secrets.OPENAI_API_KEY; this is unsafe for pull_request reruns, especially issue_comment reruns on fork PRs.
  • The prompt text changed, but .claude/scripts/openai_review.py still exact-matches the old block containing docs/tutorials/*.ipynb, so local AI review prompt adaptation now fails.
  • I confirmed the local substitution failure with a read-only Python smoke check: it emits Warning: prompt substitution did not match, leaves CI grep instructions in the local prompt, and omits the local-mode note.
  • Pattern-wide inference/NaN greps found no new estimator inference anti-patterns in the changed code.

Methodology

No findings. This PR does not change estimators, math, weighting, variance/SE, identification assumptions, or statistical defaults. docs/methodology/REGISTRY.md was checked for relevance; no method registry update is required.

Code Quality

P1 — Local reviewer prompt adaptation is now stale

  • Location: .github/codex/prompts/pr_review.md:L87-L95, .claude/scripts/openai_review.py:L943-L978, tests/test_openai_review.py:L212-L236
  • Impact: The PR removes the docs/tutorials/*.ipynb bullet and adds the tutorial-prose paragraph, but _adapt_review_criteria() still exact-matches the old CI mandate block. That substitution now fails, so local review keeps CI-only instructions to run shell greps/load files and does not insert the “Local Review” no-tool-access note. The existing tests at tests/test_openai_review.py:L212-L236 should catch this.
  • Concrete fix: Update the exact substitution string in .claude/scripts/openai_review.py to match the new prompt block, or replace the brittle exact-block replacement with marker-based replacement around the Single-Pass mandate.

Performance

No P1/P2 findings. The per-output --max-output-chars 20000 cap is reasonable for the stated goal. A total notebook/prompt cap would be a useful hardening improvement, but not a blocker here.

Maintainability

P3 — Extractor policy is documented

  • Location: tools/notebook_md_extract.py:L1-L24
  • Impact: Dropping HTML-only outputs, images, and raw cells is a deliberate documented limitation. This is acceptable as an implementation policy, not a defect.
  • Concrete fix: No action required.

Tech Debt

No findings. I did not find a relevant TODO.md mitigation for the P0/P1 items, and these are not deferrable test/documentation gaps.

Security

P0 — Workflow executes PR-controlled Python before a secret-bearing step

  • Location: .github/workflows/ai_pr_review.yml:L12-L14, .github/workflows/ai_pr_review.yml:L81-L83, .github/workflows/ai_pr_review.yml:L166-L172, .github/workflows/ai_pr_review.yml:L181-L182
  • Impact: The workflow checks out the PR merge commit, then runs python3 tools/notebook_md_extract.py from that checkout. Future PRs can modify that script and add/change a tutorial notebook, causing arbitrary PR-controlled code to execute in the AI review job. This is especially risky for /ai-review issue_comment reruns on fork PRs, which run in the base repository workflow context, and the same job later invokes Codex with secrets.OPENAI_API_KEY. Background-process or workspace-tampering attacks can target later secret-bearing steps.
  • Concrete fix: Do not execute code from the PR checkout while building the prompt. Use a trusted extractor from the base/default branch or inline the extraction code in the workflow, set actions/checkout persist-credentials: false, and keep any untrusted notebook parsing isolated from the later secret-bearing Codex step.

Documentation/Tests

P1 — Existing local-review tests need to be updated with the prompt change

  • Location: tests/test_openai_review.py:L212-L250
  • Impact: These tests assert that every prompt substitution applies and that local mode strips CI-only audit instructions. Because the prompt block changed without the local substitution mirror, these tests should now fail.
  • Concrete fix: After updating .claude/scripts/openai_review.py, keep or extend these tests so they assert the new “Tutorial notebook prose” wording does not break local prompt adaptation.

Path To Approval

  1. Replace the workflow’s execution of PR-checkout tools/notebook_md_extract.py with trusted extraction code from the base/default branch or an inline workflow script, and prevent untrusted code from running before the Codex step receives secrets.OPENAI_API_KEY.
  2. Add persist-credentials: false to the checkout unless later git operations explicitly require persisted credentials.
  3. Update .claude/scripts/openai_review.py so _adapt_review_criteria() matches the new Single-Pass mandate text or uses a robust marker-based replacement.
  4. Run tests/test_openai_review.py::TestAdaptReviewCriteria and tests/test_notebook_md_extract.py after the fixes.

igerber added a commit that referenced this pull request May 10, 2026
P0 (Security): The previous workflow ran `python3 tools/notebook_md_extract.py`
from the PR checkout, before the secret-bearing Codex step. A malicious PR could
modify the script + add a tutorial notebook to tamper with the prompt or attack
the later step. Particularly risky for `/ai-review` `issue_comment` reruns on
fork PRs (workflow runs in base-repo context).

Fix: stage the trusted extractor from the BASE branch via
`git show "$BASE_SHA:tools/notebook_md_extract.py" > /tmp/notebook_md_extract.py`
in a dedicated step BEFORE prompt building. The PR-controlled version of the
script is no longer executed at all. Bootstrap case (this PR — base does not
yet contain the script): extraction is skipped with a placeholder line; once
this PR merges, all subsequent PRs see the trusted extractor on base.

Defense in depth: add `persist-credentials: false` to `actions/checkout@v6`
so the GITHUB_TOKEN is not embedded in `.git/config` during prompt building
(blocks secondary token exfiltration via `git config` reads). The pre-fetch
step is updated to pass the token explicitly via
`http.extraheader="Authorization: bearer ${GH_TOKEN}"`.

P1 (Substitution drift): `_adapt_review_criteria()` in
`.claude/scripts/openai_review.py` exact-matches the CI Single-Pass mandate
block, including the `docs/tutorials/*.ipynb` line that was removed. The
substitution silently failed on the prior commit, leaving CI-only audit
instructions in the local-review prompt and emitting a stderr warning.

Fix: update the expected text in `_SUBSTITUTIONS` to match the current
`pr_review.md` block (drop the tutorials line; include the new
"Tutorial notebook prose" paragraph). All 6 tests in
`tests/test_openai_review.py::TestAdaptReviewCriteria` now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5ea4cd2cffa763468c275d1554a1b9cdbcaf8510


Overall Assessment

⚠️ Needs changes — previous P0/P1 findings are addressed, but one newly identified P1 review-integrity issue remains.

Executive Summary

  • No estimator, math, weighting, variance/SE, identification, or default behavior changes were introduced.
  • Prior P0 is resolved: the workflow now stages the notebook extractor from BASE_SHA and no longer runs the PR checkout’s extractor before the Codex step.
  • Prior P1 is resolved: local prompt adaptation now matches the updated review prompt block.
  • P1: PR-controlled review content can still alter reviewer instructions: the workflow uses the PR checkout’s pr_review.md, and the new notebook prose block is appended without an untrusted-content boundary.
  • P2: tools/notebook_md_extract.py is tested, but rust-test.yml path filters do not include tools/**, so future tool-only changes can skip CI.

Methodology

No findings.

  • Severity: P3
  • Impact: No methodology-bearing code changed. docs/methodology/REGISTRY.md was checked for relevance, and no registry update is required.
  • Concrete fix: None.

Code Quality

No unmitigated findings.

Prior local-review prompt regression is resolved: .claude/scripts/openai_review.py:L935-L987 mirrors .github/codex/prompts/pr_review.md:L57-L95, and a direct smoke check produced no substitution warnings.

Performance

P2 — Notebook extraction has only per-output caps, not total prompt caps

  • Location: .github/workflows/ai_pr_review.yml:L189-L198, tools/notebook_md_extract.py:L41-L45, tools/notebook_md_extract.py:L71-L76
  • Impact: A PR can add many large markdown/code cells or many outputs each under 20000 chars and still produce a very large prompt, making the review slow, expensive, or truncated.
  • Concrete fix: Add a per-notebook or total extracted-character cap, for example --max-total-chars, and emit a truncation marker when the notebook extract is capped.

Maintainability

No findings.

The extractor’s omission policy for HTML-only outputs, images, and raw cells is documented in tools/notebook_md_extract.py:L9-L22.

Tech Debt

No mitigating TODO entry found for the P1 below. The P2 CI/path-filter gap is deferrable but currently untracked.

Security

P1 — [Newly identified] PR-controlled prompt/prose can still steer the AI reviewer

  • Location: .github/workflows/ai_pr_review.yml:L146-L203, .github/workflows/ai_pr_review.yml:L206-L211, .github/codex/prompts/pr_review.md:L92-L95, .github/codex/prompts/pr_review.md:L120
  • Impact: The workflow still seeds the Codex prompt from .github/codex/prompts/pr_review.md in the PR checkout, so a PR that modifies the reviewer prompt can rewrite the instructions used to review itself. The new tutorial notebook prose is also appended as plain prompt content without an explicit untrusted wrapper, so a malicious notebook markdown/output cell can contain instructions such as “ignore prior directions” and compete with the reviewer criteria.
  • Concrete fix: Stage the review criteria from the trusted base SHA, not the PR checkout, and include any prompt-file changes only as diff content. Also wrap notebook extracts in explicit untrusted delimiters and add prompt text saying not to follow instructions inside the tutorial notebook prose block.

Documentation/Tests

P2 — Tool tests are not triggered by tool-only changes

  • Location: .github/workflows/rust-test.yml:L6-L24, tests/test_notebook_md_extract.py:L23-L30
  • Impact: The new test imports tools/notebook_md_extract.py, but the CI path filters include tests/** and not tools/**. A future PR changing only the extractor can skip the test workflow.
  • Concrete fix: Add tools/** or tools/notebook_md_extract.py to both push and pull-request path filters in rust-test.yml.

Verification note: pytest is not installed in this review environment, so I could not run the pytest targets. I did run direct Python smoke checks for the prompt substitution and extractor helper behavior.

Path To Approval

  1. Change the AI review workflow to load .github/codex/prompts/pr_review.md from BASE_SHA or another trusted source, not from the PR checkout.
  2. Wrap the “Tutorial notebook prose” block as untrusted PR content and update pr_review.md, .claude/scripts/openai_review.py, and prompt-adaptation tests to preserve that instruction.
  3. Add tools/** or tools/notebook_md_extract.py to the relevant rust-test.yml path filters so extractor-only changes run the extractor tests.

igerber added a commit that referenced this pull request May 10, 2026
P1 (Security, [Newly identified]): The R1 fix staged the notebook extractor
from BASE_SHA, but the workflow still ran `cat .github/codex/prompts/pr_review.md`
from the PR checkout — meaning a PR could rewrite the reviewer instructions
used to review itself. The PR-controlled notebook prose was also appended
without an untrusted-content boundary, so a malicious notebook cell could
include "ignore prior directions"-style content competing with the reviewer
criteria.

Fix: stage the reviewer prompt from BASE_SHA via
`git show "$BASE_SHA:.github/codex/prompts/pr_review.md" > /tmp/pr_review.md`
in the same step that stages the extractor (fail-closed if base lacks the
prompt — no bootstrap fallback). The prompt-build step now uses
`/tmp/pr_review.md` instead of the PR-checkout version. Wrap the notebook
extraction loop output in `<untrusted-pr-content marker="...">` tags, and
extend the reviewer prompt's "Tutorial notebook prose" paragraph to instruct
the model to review the prose for correctness but ignore any directive
inside the wrapper. The same rule explicitly extends to
`<previous-ai-review-output>` and PR title/body text.

P2 (Performance): The previous `--max-output-chars 20000` cap was per-output
only — a PR with many small outputs could still bloat the prompt. Add
`--max-total-chars` to the extractor with a single truncation marker
appended after the cap. Workflow invocation passes
`--max-total-chars 200000` for a per-notebook cap; the loop accumulates
across multiple changed notebooks but each individual notebook is bounded.

P2 (Tests/CI): The `rust-test.yml` path filters covered `tests/**` but not
`tools/**`. A future PR that changes only `tools/notebook_md_extract.py`
would skip the test job. Added `tools/**` to both push and pull_request
path filters.

`_SUBSTITUTIONS` in `.claude/scripts/openai_review.py` updated to mirror the
new prompt block. New test `test_max_total_chars_truncates_whole_notebook`
locks the per-notebook truncation behavior. All 16 affected tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e52cdf2bba2362da1c91b290b0be34f672a1c567


Overall Assessment

Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • No estimator, math, weighting, variance/SE, identification, or default behavior changes were introduced.
  • Prior security finding is resolved: the workflow now stages both pr_review.md and the notebook extractor from BASE_SHA, not the PR checkout.
  • Prior untrusted-content finding is resolved: tutorial notebook prose is wrapped in <untrusted-pr-content> and the reviewer prompt tells the model not to follow instructions inside it.
  • Prior CI coverage gap is resolved: tools/** is included in rust-test.yml path filters.
  • Prior prompt-growth concern is resolved by --max-total-chars 200000.

Methodology

No findings.

  • Severity: P3
  • Impact: No methodology-bearing library code changed. docs/methodology/REGISTRY.md was checked; no registry or paper cross-check is required for this workflow/tooling-only PR.
  • Concrete fix: None.

Code Quality

No findings.

  • Severity: P3
  • Impact: The extractor is small, stdlib-only, and its omission policy is documented in tools/notebook_md_extract.py:L9-L22.
  • Concrete fix: None.

Performance

No findings.

  • Severity: P3
  • Impact: The previous per-output-only cap concern is addressed by --max-total-chars in tools/notebook_md_extract.py:L71-L115 and workflow use at .github/workflows/ai_pr_review.yml:L217-L218.
  • Concrete fix: None.

Maintainability

No findings.

  • Severity: P3
  • Impact: CI/local prompt surfaces are mirrored: .github/codex/prompts/pr_review.md:L92-L100 and .claude/scripts/openai_review.py:L970-L978 both carry the notebook-prose untrusted-content instruction.
  • Concrete fix: None.

Tech Debt

No findings.

  • Severity: P3
  • Impact: The first-run bootstrap behavior, where extraction is skipped if the extractor is absent on base, is explicit at .github/workflows/ai_pr_review.yml:L121-L125 and is not a blocker for this PR because no tutorial notebook is changed.
  • Concrete fix: None.

Security

No findings.

  • Severity: P3
  • Impact: The workflow uses persist-credentials: false, fetches with an explicit http.extraheader, stages trusted prompt/extractor from base, and wraps notebook prose as untrusted content at .github/workflows/ai_pr_review.yml:L81-L125 and .github/workflows/ai_pr_review.yml:L159-L225.
  • Concrete fix: None.

Documentation/Tests

No findings.

  • Severity: P3
  • Impact: tools/** is now in both push and PR path filters at .github/workflows/rust-test.yml:L6-L26; transitive workflow dependencies checked include tests/conftest.py via tests/** and pyproject.toml via the existing path filter.
  • Concrete fix: None.

Verification: pytest is not installed in this review environment, so I could not run tests/test_notebook_md_extract.py. I did run the extractor CLI on an existing tutorial, git diff --check, AST/import checks for the modified Python files, and pattern/security greps for the changed surfaces.

Closes the gap from PR #409 where the CI AI reviewer ran 3+ rounds blind to
tutorial notebook prose because the workflow excluded `docs/tutorials/*.ipynb`
from the diff.

**Extractor** (`tools/notebook_md_extract.py`, +95 LoC): stdlib-only Jupyter
notebook → Markdown converter. `_to_str()` coerces nbformat raw JSON's
list-or-string `source` / `text` fields (88%/100% list-form rates).
`--max-output-chars 20000` caps each text/plain or stream output;
`--max-total-chars 200000` caps the whole notebook. text/html-only outputs,
image/* data, and raw cells are dropped (documented in module docstring +
--help).

**Workflow** (`.github/workflows/ai_pr_review.yml`): three trusted-from-base
sources staged via `git show "$BASE_SHA:..." > /tmp/...` — `pr_review.md`,
`openai_review.py`, and `notebook_md_extract.py`. The trusted invocation
uses `/tmp/openai_review.py --review-criteria /tmp/pr_review.md` so a
malicious PR cannot rewrite reviewer instructions, exfiltrate `OPENAI_API_KEY`
via the script, or inject code before the secret-bearing API call.
`actions/checkout@v6` uses `persist-credentials: false` and the pre-fetch
step passes `GITHUB_TOKEN` via `http.extraheader` (env-scoped) to block
secondary token exfiltration via `.git/config` reads. Notebook prose
extraction runs the trusted extractor on changed tutorials and writes to
`/tmp/notebook-prose.md` (fail-soft per notebook: a malformed notebook
degrades to a placeholder line rather than killing the AI review job).

**Prompt** (`.github/codex/prompts/pr_review.md` + `openai_review.py`):
Section 5 drops the `docs/tutorials/*.ipynb` DO-NOT line and adds a
"Tutorial Notebook Prose" paragraph that directs the reviewer to the new
prompt block. The block is wrapped in `<notebook-prose untrusted="true">`
(mirroring PR #415's `<pr-body>` / `<previous-review-output>` conventions);
the reviewer is instructed to review the prose for correctness but ignore
any directive inside the wrapper. `compile_prompt()` renders the section
after the diff (fresh or delta mode) and before Full Source Files.
`_MANDATE_SUBSTITUTIONS` updated to match (drift-check: 12/12
TestAdaptReviewCriteria cases pass with zero substitution warnings).

**Sanitization**: refactored the three close-tag escapers into a shared
`_sanitize_wrapper_tag(text, tag_name)` helper. `_sanitize_pr_body` is kept
as a backward-compatible thin wrapper. Previous-review-output's inline
regex is replaced with the helper.

**Tests** (`tests/test_openai_review.py` +219 LoC,
`tests/test_notebook_md_extract.py` +190 LoC new): inline-fixture extractor
suite with skip-guard on `tools/` existence for the isolated-install matrix;
compile_prompt ordering pins for fresh + delta modes via explicit
`text.index()` assertions; parametrized close-tag-variant parity tests across
all three wrappers; supply-chain workflow-text assertions for the three
`git show "$BASE_SHA:..."` invocations and `persist-credentials: false`;
TestMainCLIPropagation extension for `--notebook-prose`.

**rust-test.yml**: `tools/**` added to push + PR path filters so future
extractor-only changes trigger the test job.

**T21 workaround reap**: `docs/_review/t21_notebook_extract.md` (450 lines,
the one-shot extract from PR #409) and the `_review` entry in
`docs/conf.py:exclude_patterns` were left behind on origin/main; both are
removed here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the ci-workflow-ipynb-markdown-extraction branch from e52cdf2 to 6e9bd71 Compare May 11, 2026 22:04
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6e9bd713afa74d9e408a8e3e11fd77b0df428fcc


Overall Assessment

⚠️ Needs changes — one unmitigated P2 test-coverage finding for behavior explicitly claimed in CHANGELOG.md.

Executive Summary

  • No estimator, math, weighting, SE/variance, identification, or default estimator behavior changes are introduced.
  • Prior P1/P2 issues called out in the previous review appear addressed in the visible diff: trusted base staging, untrusted notebook wrapper, token hardening, tools/** path filters, and total prompt cap are implemented.
  • The workflow/tooling design is generally sound: prompt/script/extractor are staged from BASE_SHA, notebook content is wrapped as untrusted, and wrapper close-tags are sanitized through a shared helper.
  • Newly identified P2: the PR explicitly claims workflow-level notebook extraction and prompt-growth caps, but the visible workflow contract tests do not pin the extraction loop / cap flags / --notebook-prose workflow propagation. This is a claim-vs-test gap under the review policy.

Methodology

No methodology defects found.

  • Severity: P3
  • Impact: The PR changes CI/review tooling, prompt assembly, docs config, changelog, and tests. It does not alter any estimator, mathematical formula, weighting, variance/SE, identification assumption, or default estimator behavior.
  • Concrete fix: None required.

Code Quality

No blocking findings.

  • Severity: P3
  • Impact: tools/notebook_md_extract.py is small, stdlib-only, and documents intentional omissions (text/html-only outputs, images, raw cells). _to_str() handles nbformat list/string raw JSON fields, and openai_review.py consolidates wrapper close-tag sanitization through _sanitize_wrapper_tag().
  • Concrete fix: None required.

Performance

No blocking findings.

  • Severity: P3
  • Impact: The extractor supports both per-output and per-notebook caps (--max-output-chars, --max-total-chars), and the workflow passes both caps in the visible implementation at .github/workflows/ai_pr_review.yml in the notebook extraction block.
  • Concrete fix: None required.

Maintainability

No blocking findings.

  • Severity: P3
  • Impact: The CI and local prompt surfaces are kept consistent: both .github/codex/prompts/pr_review.md and .claude/scripts/openai_review.py instruct the reviewer to treat ## Tutorial Notebook Prose as untrusted PR-controlled content.
  • Concrete fix: None required.

Tech Debt

No blocking findings.

  • Severity: P3
  • Impact: The one-shot bootstrap behavior is explicit: if tools/notebook_md_extract.py is absent on BASE_SHA, notebook prose extraction is skipped for that run. This is acceptable for the PR adding the extractor and should self-resolve after merge.
  • Concrete fix: None required.

Security

No blocking findings.

  • Severity: P3
  • Impact: The workflow hardening claims are reflected in implementation:
    • actions/checkout uses persist-credentials: false.
    • token use is env-scoped to the fetch step via http.extraheader.
    • prompt, prompt-builder script, and notebook extractor are staged from BASE_SHA, not the PR checkout.
    • notebook prose is wrapped in <notebook-prose untrusted="true">.
    • close-tag injection is sanitized for pr-body, previous-review-output, and notebook-prose.
  • Concrete fix: None required.

Documentation/Tests

[Newly identified] Missing workflow-contract tests for explicitly shipped notebook extraction behavior

  • Severity: P2
  • Location: .github/workflows/ai_pr_review.yml notebook extraction block; tests/test_openai_review.py::TestWorkflowContract
  • Impact: CHANGELOG.md explicitly claims that the CI AI reviewer now extracts changed tutorial notebooks, applies --max-output-chars 20000 and --max-total-chars 200000, writes /tmp/notebook-prose.md, and passes it to openai_review.py via --notebook-prose. The visible tests cover the extractor itself and openai_review.py prompt rendering, and they pin some workflow hardening invariants, but they do not visibly pin the workflow-level extraction loop or cap/prose propagation flags. A future workflow edit could silently drop the notebook extraction step or one of the cap flags while extractor/unit tests continue to pass.
  • Concrete fix: Add workflow text/contract tests in tests/test_openai_review.py::TestWorkflowContract that assert the workflow contains, at minimum:
    1. python3 /tmp/notebook_md_extract.py --input "$nb"
    2. --max-output-chars 20000
    3. --max-total-chars 200000
    4. output redirection to /tmp/notebook-prose.md
    5. ARGS+=(--notebook-prose /tmp/notebook-prose.md)
    6. the raw .ipynb diff exclusion remains paired with markdown extraction.

Path to Approval

  1. Add workflow-contract tests pinning the notebook extraction loop and cap/prose propagation flags listed above.
  2. Re-run the relevant test module(s), especially tests/test_openai_review.py and tests/test_notebook_md_extract.py, and update the PR if any workflow text assumptions need adjustment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant