Address CI R1 P3 on PR-C: provenance fail-closed + fixture-3 clarification

igerber · claude · igerber · commit 5b90559d1942 · 2026-05-19T13:33:57.000-04:00
CI R1 codex review verdict: ✅ no P0/P1. Two P3 findings addressed:

- P3 (Maintainability): the R generator hardcoded
  `pretrends_commit = "122731d082"` into the JSON but only verified
  `packageVersion("pretrends") &gt;= "0.1.0"`. A future rerun could
  silently regenerate goldens from a drifted revision while still
  stamping the artifact with the original commit. Fix: replace the
  loose version gate with an exact `packageVersion == "0.1.0"` check
  plus a `startsWith(packageDescription("pretrends")$RemoteSha,
  PRETRENDS_COMMIT)` provenance assertion that fails closed with a
  reinstall instruction if the installed revision drifts. Verified
  via positive (RemoteSha = `122731d082a5990e274f57fd9af0968e44977e7a`)
  and negative (synthetic `deadbeef` prefix) checks.

- P3 (Documentation/Tests): the `anticipation_shifted` fixture's
  comment described it as validating anticipation-window filtering,
  but the fixture omits the `t=-1` anticipation window and the parity
  assertions consume prefiltered `Sigma_22` / weights directly — the
  CS/SA-level `_extract_pre_period_params` anticipation filter
  (`if t &lt; _pre_cutoff` in `pretrends.py`) is NOT R-parity-locked by
  this fixture. Fix: rename the comment / R `cat()` print / JSON
  meta.description to "K=4 shifted-grid case", and document the
  non-coverage explicitly in the file-header comment with a forward
  reference to the existing PR-B MC-based and full-VCV coverage in
  `TestPretrendsPropositions` / `TestPretrendsCovarianceSource`,
  plus a deferred follow-up for a CS/SA-level
  `anticipation=1 + R-parity` test (would need a synthetic
  `CallawaySantAnnaResults` with a t=-1 entry that gets filtered
  before reaching `_compute_power_nis`). Test class docstring
  tolerance-rationale prose flipped "K=4 anticipation fixture" →
  "K=4 shifted-grid fixture" to match.

The fixture's JSON key (`anticipation_shifted`) is unchanged to
preserve the test-side reference; only the prose contract is
clarified.

All 4 parity tests still pass; black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/benchmarks/R/generate_pretrends_golden.R b/benchmarks/R/generate_pretrends_golden.R
@@ -23,11 +23,11 @@
 #       (R) and scipy.stats.multivariate_normal.cdf (Python) use Genz-Bretz
 #       randomized-lattice rules with different absolute-error defaults
 #       (abseps ~ 1e-3 vs 1e-5). The empirical NIS power gap is bounded by
-#       ~5e-5 on the K=4 anticipation fixture; ~3e-5 on K=3 fixtures; ~2e-5
+#       ~5e-5 on the K=4 shifted-grid fixture; ~3e-5 on K=3 fixtures; ~2e-5
 #       on K=1. atol=1e-4 is the realistic atol without tightening
 #       thresholdTstat.Pretest in R or relaxing the Genz tolerances.
 #   (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular,
-#       anticipation, and K=1 grids, at atol=1e-4. R uniroot defaults to
+#       shifted-grid, and K=1 grids, at atol=1e-4. R uniroot defaults to
 #       tol = .Machine$double.eps^0.25 ~= 1.22e-4 vs Python brentq xtol=2e-12;
 #       the inverse-solver tolerance gap dominates, so 1e-4 is the realistic
 #       atol without tightening either solver.
@@ -42,9 +42,20 @@
 #      never-treated control. Default-case parity baseline.
 #   2. irregular_pre_periods — K=3 with relative_times = [-5, -3, -1].
 #      Exercises the PR-B gamma-unit linear-pattern fix end-to-end.
-#   3. anticipation_shifted — K=4 with anticipation=1 (pre-cutoff at t<-1,
-#      so pre-periods are {-5, -4, -3, -2}). Verifies the pre-period filter
-#      logic in `_extract_pre_period_params`.
+#   3. anticipation_shifted — K=4 shifted-grid case (pre-periods at
+#      {-5, -4, -3, -2}, leaving a notional t=-1 anticipation gap to the
+#      reference period 0). Locks R parity at a larger K than the K=3
+#      fixtures and on a non-adjacent-to-reference grid. NOTE: the R-side
+#      `pretrends` package has no anticipation parameter, so this fixture
+#      does NOT exercise the Python-side `_extract_pre_period_params`
+#      CS/SA anticipation filter (`if t < _pre_cutoff` in pretrends.py)
+#      against R goldens — that filter is exercised by the existing
+#      `TestPretrendsCovarianceSource` suite (PR-B Step 3) and by the
+#      MC-based `TestPretrendsPropositions` tests, but a CS/SA-level
+#      anticipation+R-parity test would need a synthetic
+#      CallawaySantAnnaResults with `anticipation=1` and a t=-1 entry
+#      that gets filtered before reaching `_compute_power_nis`. Deferred
+#      to a follow-up.
 #   4. single_pre_period_closed_form — K=1 with diagonal Sigma = 0.25*I
 #      (Roth Proposition 2 univariate truncated-normal closed form). Locks
 #      the scalar fast-path against R AND against the analytical expression
@@ -58,10 +69,26 @@ suppressPackageStartupMessages({
   library(jsonlite)
 })
 
-stopifnot(packageVersion("pretrends") >= "0.1.0")
-
 PRETRENDS_COMMIT <- "122731d082"
 
+# Provenance fail-closed: refuse to regenerate goldens unless the installed
+# pretrends matches the pinned (version, commit) pair. The JSON stamps the
+# pinned commit string into meta.pretrends_commit, so without this check a
+# future rerun could silently regenerate goldens from a drifted revision
+# while still labeling the artifact with the original commit.
+stopifnot(packageVersion("pretrends") == "0.1.0")
+.installed_sha <- packageDescription("pretrends")$RemoteSha
+if (is.null(.installed_sha) || !startsWith(.installed_sha, PRETRENDS_COMMIT)) {
+  stop(sprintf(
+    "pretrends provenance mismatch: expected RemoteSha to start with '%s' but got '%s'. Reinstall with: remotes::install_github('jonathandroth/pretrends', ref = '%s')",
+    PRETRENDS_COMMIT,
+    if (is.null(.installed_sha)) "<missing — not installed via install_github>" else .installed_sha,
+    PRETRENDS_COMMIT
+  ))
+}
+cat("pretrends provenance verified: version 0.1.0, RemoteSha =",
+    .installed_sha, "\n")
+
 # ---------------------------------------------------------------------------
 # DGP helper: build a synthetic event-study coefficient vector + VCV under a
 # stylized null DGP (beta = 0, Sigma_22 ~ correlated). Mirrors the simulation
@@ -186,10 +213,13 @@ f2 <- build_event_study_fixture(
 )
 fixture_2 <- extract_pretrends(f2, "irregular_pre_periods")
 
-cat("Building fixture 3: anticipation_shifted...\n")
-# K=4 pre-periods with anticipation=1. Real pre-treatment cutoff is t < -1,
-# so the {-5, -4, -3, -2} cells are the genuine pre-periods; t=-1 is the
-# anticipation window. Tests the pre-period filtering logic.
+cat("Building fixture 3: anticipation_shifted (K=4 shifted-grid)...\n")
+# K=4 pre-periods at {-5, -4, -3, -2} — a shifted-grid case (gap at t=-1
+# between the last pre-period and the reference period 0). Locks R parity
+# at a larger K and on a non-adjacent-to-reference grid. Note: does NOT
+# exercise the Python-side `_extract_pre_period_params` CS/SA
+# anticipation-filter path against R goldens — see the file-header
+# comment for the rationale and deferred follow-up.
 f3 <- build_event_study_fixture(
   pre_periods = c(-5L, -4L, -3L, -2L),
   post_periods = c(1L, 2L, 3L),
@@ -230,7 +260,7 @@ out <- list(
       "scipy MVN CDF Genz-Bretz randomized-lattice differences bound the",
       "K=4 NIS power gap at ~5e-5);",
       "(2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular,",
-      "irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol",
+      "irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol",
       "vs Python brentq xtol gap dominates);",
       "(3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
       "in Roth's gamma units exactly, matching R's slope_for_power().",
diff --git a/benchmarks/data/r_pretrends_golden.json b/benchmarks/data/r_pretrends_golden.json
@@ -4,7 +4,7 @@
     "pretrends_version": "0.1.0",
     "pretrends_commit": "122731d082",
     "r_version": "R version 4.5.2 (2025-10-31)",
-    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
+    "description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
   },
   "uniform_3_pre_periods_no_anticipation": {
     "panel": {
diff --git a/tests/test_methodology_pretrends.py b/tests/test_methodology_pretrends.py
@@ -1120,7 +1120,7 @@ class TestPretrendsParityR:
     ``scipy.stats.multivariate_normal.cdf`` (Python) use Genz-Bretz
     randomized-lattice rules with different absolute-error defaults
     (abseps ≈ 1e-3 vs 1e-5). Combined, the empirical NIS power gap is
-    bounded by ~5e-5 in the K=4 anticipation fixture (smaller for K∈{1,3}).
+    bounded by ~5e-5 in the K=4 shifted-grid fixture (smaller for K∈{1,3}).
     For the inverse path (γ_p), R's ``slope_for_power`` uses
     ``uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)`` versus Python
     ``brentq(xtol=2e-12)``; the inverse-solver tolerance gap dominates the