Skip to content

Commit 5fcaad7

Browse files
authored
Merge pull request #466 from igerber/feature/pretrends-implementation-audit
PreTrendsPower implementation audit (Roth 2022 — PR-B)
2 parents 69c3c47 + 853e523 commit 5fcaad7

19 files changed

Lines changed: 3053 additions & 423 deletions

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Large diffs are not rendered by default.

METHODOLOGY_REVIEW.md

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
8080
|------|--------|-------------|--------|-------------|
8181
| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** | 2026-05-16 |
8282
| HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 |
83-
| PreTrendsPower | `pretrends.py` | `pretrends` package | **In Progress** | |
83+
| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** (R parity pending) | 2026-05-18 |
8484
| PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** ||
8585
| PlaceboTests | `diagnostics.py` | (no canonical reference) | **In Progress** ||
8686

@@ -1047,18 +1047,29 @@ and covariate-adjusted specifications.)
10471047
| Module | `pretrends.py` |
10481048
| Primary Reference | Roth (2022), *Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends*, AER:I 4(3), 305-322 |
10491049
| R Reference | `pretrends` package |
1050-
| Status | **In Progress** |
1051-
| Last Review | |
1050+
| Status | **Complete** (R parity pending) |
1051+
| Last Review | 2026-05-18 |
10521052

10531053
**Documentation in place:**
1054-
- REGISTRY.md section: `## PreTrendsPower` (MDV at target power, four violation types — linear/constant/last_period/custom, power curve plotting, HonestDiD integration)
1055-
- Implementation: `tests/test_pretrends.py` (point-estimator, MDV, power curve, sensitivity) plus event-study coverage in `tests/test_pretrends_event_study.py`
1056-
- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17; non-authoritative source audit — registry entry remains authoritative until the follow-up audit PR)
1054+
- REGISTRY.md section: `## PreTrendsPower` — NIS-framed audit per Roth (2022) Section II.A-B with full equation blocks for both NIS and Wald forms; paper-supported alternative + γ-unit MDV + full-Σ_22 routing all locked.
1055+
- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17 via PR #463).
1056+
- Implementation: `tests/test_pretrends.py` (67 tests — point-estimator, MDV, power curve, sensitivity, plus the PR-A R18 silent-failure regression and the PR-B custom-weight persistence regression) + event-study coverage in `tests/test_pretrends_event_study.py` (27 tests).
1057+
- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes, 30-40 tests covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API).
10571058

1058-
**Outstanding for promotion:**
1059-
- Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
1060-
- R parity fixture against the `pretrends` R package at a **pinned revision** (TODO.md tracks the revision-pin follow-up; until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` are provisional). Covers the four power calculations: linear, constant, last-period, custom. Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
1061-
- Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
1059+
**Verified Components:**
1060+
- [x] NIS box probability implemented via `scipy.stats.multivariate_normal.cdf` (Roth Section II.A-B primary form)
1061+
- [x] Wald noncentral-χ² form retained as paper-supported alternative (Propositions 1+3+4 all apply — convex ellipsoid acceptance region)
1062+
- [x] Both forms produce form-consistent MDV via doubling + brentq bisection with 1000-cap non-convergence fallback
1063+
- [x] Non-bootstrap CS adapter consumes full `event_study_vcov` sub-block (not diag)
1064+
- [x] Non-bootstrap SA adapter consumes full `event_study_vcov` sub-block (W-matrix construction `event_study_vcov = W @ vcov_cohort @ W.T` added to `SunAbrahamResults`)
1065+
- [x] Bootstrap CS/SA and replicate-weight survey paths fall through to `diag(ses^2)` (analytical VCV cleared to prevent mixing with bootstrap/replicate SE overrides)
1066+
- [x] `_get_violation_weights('linear')` honors actual pre-period relative-time labels via `fit()` threading → reported MDV is in Roth's γ units on irregular and anticipation-shifted grids. For `MultiPeriodDiDResults`, supported label types are numeric (`int` / `float` / `np.int64`) and `pandas.Period` / `pandas.Timestamp` / `np.datetime64`; **genuinely non-numeric labels** (string period IDs, unranked categoricals) emit an explicit `UserWarning` and fall through to the legacy count-based normalized direction (MDV is NOT in γ units in that case — re-fit with numeric labels)
1067+
- [x] `PreTrendsPowerResults` persists fitted `violation_weights` + `pretest_form` + `nis_box_probability`; `power_at(M)` works for all four violation types on fresh fits
1068+
- [x] Helper API (`compute_pretrends_power`, `compute_mdv`) accepts `violation_weights` and `pretest_form`; closes the PR-A R18 helper/class API gap
1069+
- [x] Summary, `to_dict`, `to_dataframe` dispatch on `pretest_form` (NIS prints box probability; Wald prints noncentrality)
1070+
1071+
**Outstanding for promotion to fully Complete:**
1072+
- R parity fixture against the `pretrends` R package at a **pinned revision** (deferred to PR-C). The generator script `benchmarks/R/generate_pretrends_golden.R` is committed in PR-B with a placeholder commit reference; PR-C will install the package, generate the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate `TestPretrendsParityR` (currently skips when goldens missing), and record the audited R-package revision. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` Gaps section remain provisional.
10621073

10631074
---
10641075

TODO.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,11 +94,9 @@ Deferred items from PR reviews that were not addressed before merge.
9494
| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
9595
| WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
9696
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
97-
| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
98-
| PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
99-
| PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
100-
| PreTrendsPower: `PreTrendsPowerResults.power_at()` does not yet support `violation_type="custom"`. **Silent-failure path was mitigated** in PR-A (2026-05-17, R18 of the codex review): `power_at()` now raises `NotImplementedError` for custom fits rather than returning equal-weights output, locked in by `test_power_at_raises_on_custom_violation_type`. Remaining follow-up: persist the normalized fitted `violation_weights` on `PreTrendsPowerResults` (currently absent at `pretrends.py:77-90`) and re-enable `power_at()` for custom fits, with a parity test comparing `results.power_at(M)` to a fresh `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, ~196-235, ~878-892` | PR-A (Roth paper review, 2026-05-17) | Medium |
101-
| PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
97+
| PreTrendsPower R parity goldens (PR-C): pin the R `pretrends` package commit/release, run `benchmarks/R/generate_pretrends_golden.R` (committed in PR-B), commit the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate the `TestPretrendsParityR` class in `tests/test_methodology_pretrends.py` (currently skips when goldens missing), and flip the METHODOLOGY_REVIEW.md `PreTrendsPower` row from `**Complete** (R parity pending)``**Complete**`. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` remain provisional. | `benchmarks/R/generate_pretrends_golden.R`, `benchmarks/data/r_pretrends_golden.json` (new), `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-C (PreTrendsPower R parity) | Low |
98+
<!-- The remaining four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV) were all resolved in PR-B 2026-05-18 — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
99+
102100
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
103101
| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
104102
| HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gate: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally). The DiD sub-gate and the MultiPeriodDiD sub-gate were both lifted via auto-route to `fixed_effects=` internally (DiD: PR #458, ~1e-10 vs clubSandwich; MPD: this release, ~1e-10 vs sandwich::vcovHC and clubSandwich::vcovCR). TWFE has no equivalent `fixed_effects=` code path (always within-transforms), so the same auto-route surgery is not directly applicable — lifting requires either building the full-dummy design inline or refactoring TWFE to delegate to DiD. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit` | follow-up | Medium |

0 commit comments

Comments
 (0)