Skip to content

Commit 5b90559

Browse files
igerberclaude
andcommitted
Address CI R1 P3 on PR-C: provenance fail-closed + fixture-3 clarification
CI R1 codex review verdict: ✅ no P0/P1. Two P3 findings addressed: - P3 (Maintainability): the R generator hardcoded `pretrends_commit = "122731d082"` into the JSON but only verified `packageVersion("pretrends") >= "0.1.0"`. A future rerun could silently regenerate goldens from a drifted revision while still stamping the artifact with the original commit. Fix: replace the loose version gate with an exact `packageVersion == "0.1.0"` check plus a `startsWith(packageDescription("pretrends")$RemoteSha, PRETRENDS_COMMIT)` provenance assertion that fails closed with a reinstall instruction if the installed revision drifts. Verified via positive (RemoteSha = `122731d082a5990e274f57fd9af0968e44977e7a`) and negative (synthetic `deadbeef` prefix) checks. - P3 (Documentation/Tests): the `anticipation_shifted` fixture's comment described it as validating anticipation-window filtering, but the fixture omits the `t=-1` anticipation window and the parity assertions consume prefiltered `Sigma_22` / weights directly — the CS/SA-level `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py`) is NOT R-parity-locked by this fixture. Fix: rename the comment / R `cat()` print / JSON meta.description to "K=4 shifted-grid case", and document the non-coverage explicitly in the file-header comment with a forward reference to the existing PR-B MC-based and full-VCV coverage in `TestPretrendsPropositions` / `TestPretrendsCovarianceSource`, plus a deferred follow-up for a CS/SA-level `anticipation=1 + R-parity` test (would need a synthetic `CallawaySantAnnaResults` with a t=-1 entry that gets filtered before reaching `_compute_power_nis`). Test class docstring tolerance-rationale prose flipped "K=4 anticipation fixture" → "K=4 shifted-grid fixture" to match. The fixture's JSON key (`anticipation_shifted`) is unchanged to preserve the test-side reference; only the prose contract is clarified. All 4 parity tests still pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 50a0882 commit 5b90559

3 files changed

Lines changed: 44 additions & 14 deletions

File tree

benchmarks/R/generate_pretrends_golden.R

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@
2323
# (R) and scipy.stats.multivariate_normal.cdf (Python) use Genz-Bretz
2424
# randomized-lattice rules with different absolute-error defaults
2525
# (abseps ~ 1e-3 vs 1e-5). The empirical NIS power gap is bounded by
26-
# ~5e-5 on the K=4 anticipation fixture; ~3e-5 on K=3 fixtures; ~2e-5
26+
# ~5e-5 on the K=4 shifted-grid fixture; ~3e-5 on K=3 fixtures; ~2e-5
2727
# on K=1. atol=1e-4 is the realistic atol without tightening
2828
# thresholdTstat.Pretest in R or relaxing the Genz tolerances.
2929
# (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular,
30-
# anticipation, and K=1 grids, at atol=1e-4. R uniroot defaults to
30+
# shifted-grid, and K=1 grids, at atol=1e-4. R uniroot defaults to
3131
# tol = .Machine$double.eps^0.25 ~= 1.22e-4 vs Python brentq xtol=2e-12;
3232
# the inverse-solver tolerance gap dominates, so 1e-4 is the realistic
3333
# atol without tightening either solver.
@@ -42,9 +42,20 @@
4242
# never-treated control. Default-case parity baseline.
4343
# 2. irregular_pre_periods — K=3 with relative_times = [-5, -3, -1].
4444
# Exercises the PR-B gamma-unit linear-pattern fix end-to-end.
45-
# 3. anticipation_shifted — K=4 with anticipation=1 (pre-cutoff at t<-1,
46-
# so pre-periods are {-5, -4, -3, -2}). Verifies the pre-period filter
47-
# logic in `_extract_pre_period_params`.
45+
# 3. anticipation_shifted — K=4 shifted-grid case (pre-periods at
46+
# {-5, -4, -3, -2}, leaving a notional t=-1 anticipation gap to the
47+
# reference period 0). Locks R parity at a larger K than the K=3
48+
# fixtures and on a non-adjacent-to-reference grid. NOTE: the R-side
49+
# `pretrends` package has no anticipation parameter, so this fixture
50+
# does NOT exercise the Python-side `_extract_pre_period_params`
51+
# CS/SA anticipation filter (`if t < _pre_cutoff` in pretrends.py)
52+
# against R goldens — that filter is exercised by the existing
53+
# `TestPretrendsCovarianceSource` suite (PR-B Step 3) and by the
54+
# MC-based `TestPretrendsPropositions` tests, but a CS/SA-level
55+
# anticipation+R-parity test would need a synthetic
56+
# CallawaySantAnnaResults with `anticipation=1` and a t=-1 entry
57+
# that gets filtered before reaching `_compute_power_nis`. Deferred
58+
# to a follow-up.
4859
# 4. single_pre_period_closed_form — K=1 with diagonal Sigma = 0.25*I
4960
# (Roth Proposition 2 univariate truncated-normal closed form). Locks
5061
# the scalar fast-path against R AND against the analytical expression
@@ -58,10 +69,26 @@ suppressPackageStartupMessages({
5869
library(jsonlite)
5970
})
6071

61-
stopifnot(packageVersion("pretrends") >= "0.1.0")
62-
6372
PRETRENDS_COMMIT <- "122731d082"
6473

74+
# Provenance fail-closed: refuse to regenerate goldens unless the installed
75+
# pretrends matches the pinned (version, commit) pair. The JSON stamps the
76+
# pinned commit string into meta.pretrends_commit, so without this check a
77+
# future rerun could silently regenerate goldens from a drifted revision
78+
# while still labeling the artifact with the original commit.
79+
stopifnot(packageVersion("pretrends") == "0.1.0")
80+
.installed_sha <- packageDescription("pretrends")$RemoteSha
81+
if (is.null(.installed_sha) || !startsWith(.installed_sha, PRETRENDS_COMMIT)) {
82+
stop(sprintf(
83+
"pretrends provenance mismatch: expected RemoteSha to start with '%s' but got '%s'. Reinstall with: remotes::install_github('jonathandroth/pretrends', ref = '%s')",
84+
PRETRENDS_COMMIT,
85+
if (is.null(.installed_sha)) "<missing — not installed via install_github>" else .installed_sha,
86+
PRETRENDS_COMMIT
87+
))
88+
}
89+
cat("pretrends provenance verified: version 0.1.0, RemoteSha =",
90+
.installed_sha, "\n")
91+
6592
# ---------------------------------------------------------------------------
6693
# DGP helper: build a synthetic event-study coefficient vector + VCV under a
6794
# stylized null DGP (beta = 0, Sigma_22 ~ correlated). Mirrors the simulation
@@ -186,10 +213,13 @@ f2 <- build_event_study_fixture(
186213
)
187214
fixture_2 <- extract_pretrends(f2, "irregular_pre_periods")
188215

189-
cat("Building fixture 3: anticipation_shifted...\n")
190-
# K=4 pre-periods with anticipation=1. Real pre-treatment cutoff is t < -1,
191-
# so the {-5, -4, -3, -2} cells are the genuine pre-periods; t=-1 is the
192-
# anticipation window. Tests the pre-period filtering logic.
216+
cat("Building fixture 3: anticipation_shifted (K=4 shifted-grid)...\n")
217+
# K=4 pre-periods at {-5, -4, -3, -2} — a shifted-grid case (gap at t=-1
218+
# between the last pre-period and the reference period 0). Locks R parity
219+
# at a larger K and on a non-adjacent-to-reference grid. Note: does NOT
220+
# exercise the Python-side `_extract_pre_period_params` CS/SA
221+
# anticipation-filter path against R goldens — see the file-header
222+
# comment for the rationale and deferred follow-up.
193223
f3 <- build_event_study_fixture(
194224
pre_periods = c(-5L, -4L, -3L, -2L),
195225
post_periods = c(1L, 2L, 3L),
@@ -230,7 +260,7 @@ out <- list(
230260
"scipy MVN CDF Genz-Bretz randomized-lattice differences bound the",
231261
"K=4 NIS power gap at ~5e-5);",
232262
"(2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular,",
233-
"irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol",
263+
"irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol",
234264
"vs Python brentq xtol gap dominates);",
235265
"(3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV",
236266
"in Roth's gamma units exactly, matching R's slope_for_power().",

benchmarks/data/r_pretrends_golden.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"pretrends_version": "0.1.0",
55
"pretrends_commit": "122731d082",
66
"r_version": "R version 4.5.2 (2025-10-31)",
7-
"description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, anticipation, and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
7+
"description": "Roth (2022) PreTrendsPower parity goldens for diff-diff compute_pretrends_power / PreTrendsPower (PR-C). Three-tier parity contract, both numeric tiers at atol=1e-4: (1) NIS box probability at fixed gamma values on all 4 fixtures (atol=1e-4; R hardcodes thresholdTstat=1.96 while Python uses qnorm(0.975) = 1.959963984540054, and mvtnorm::pmvnorm vs scipy MVN CDF Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5); (2) gamma_p MDV (slope at target power 0.5 and 0.8) on regular, irregular, shifted-grid (K=4), and K=1 grids (atol=1e-4; R uniroot tol vs Python brentq xtol gap dominates); (3) gamma-unit MDV invariance: PR-B's skip-L2-norm path produces MDV in Roth's gamma units exactly, matching R's slope_for_power(). See diff-diff/docs/methodology/papers/roth-2022-review.md for the full derivation."
88
},
99
"uniform_3_pre_periods_no_anticipation": {
1010
"panel": {

tests/test_methodology_pretrends.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1120,7 +1120,7 @@ class TestPretrendsParityR:
11201120
``scipy.stats.multivariate_normal.cdf`` (Python) use Genz-Bretz
11211121
randomized-lattice rules with different absolute-error defaults
11221122
(abseps ≈ 1e-3 vs 1e-5). Combined, the empirical NIS power gap is
1123-
bounded by ~5e-5 in the K=4 anticipation fixture (smaller for K∈{1,3}).
1123+
bounded by ~5e-5 in the K=4 shifted-grid fixture (smaller for K∈{1,3}).
11241124
For the inverse path (γ_p), R's ``slope_for_power`` uses
11251125
``uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)`` versus Python
11261126
``brentq(xtol=2e-12)``; the inverse-solver tolerance gap dominates the

0 commit comments

Comments
 (0)