Skip to content

Commit 79e0962

Browse files
igerberclaude
andcommitted
linalg: add cluster-aware CR2 Bell-McCaffrey contrast DOF; wire MPD avg_att inference
Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657 previously raised NotImplementedError because _compute_cr2_bm returns per-coefficient Satterthwaite DOF only — the post-period-average ATT (`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound contrast that needed a cluster-aware contrast DOF helper. New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra (`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`, `DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to call the new helper via a private _cr2_bm_dof_inner with `contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the per-coefficient DOFs are preserved (matmul ordering differs slightly from the prior inline loop). MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced in PR igerber#459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation length n and are NOT subscripted by the rank-deficient column-drop mask `_kept` (which indexes coefficients, not observations). R parity verified at atol=1e-10 against clubSandwich's Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new mpd_clustered_avg_att_dof fixture in benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the BM Satterthwaite DOF. The pre-flight smoke test against this same R target passed at atol=1e-13 before any source edits. Tests: - TestCR2BMContrastDOF (4 new tests): refactor regression vs library, R-parity for compound contrast, shape validation, cluster-count validation. - test_multi_period_cluster_plus_hc2_bm_rejected flipped to test_multi_period_cluster_plus_hc2_bm_produces_finite_inference (end-to-end MPD wire-through with finite avg_att / period_effects inference assertions). After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb igerber#458, MPD-absorb igerber#459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted HC2-BM (Gates 4-5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f6499db commit 79e0962

8 files changed

Lines changed: 389 additions & 55 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` now supported** (`diff_diff/estimators.py:1657`). Pre-PR the combination raised `NotImplementedError` because the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF for the post-period-average ATT (`avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t`) was not implemented — only the per-coefficient case existed in `_compute_cr2_bm`. New `_compute_cr2_bm_contrast_dof` helper in `diff_diff/linalg.py` generalizes the per-coefficient loop to arbitrary `(k, m)` contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra; `_compute_cr2_bm` is refactored to call it with `contrasts=eye(k)` so the existing per-coefficient parity to clubSandwich's `coef_test$df_Satt` is preserved (refactor regression at atol=1e-10). `MultiPeriodDiD.fit()` extends its existing avg_att DOF block to branch on `effective_cluster_ids`: one-way `_compute_bm_dof_from_contrasts` when None, cluster-aware `_compute_cr2_bm_contrast_dof` otherwise. Cluster IDs are per-observation length `n` and are NOT subscripted by the rank-deficient column-drop mask. R parity verified at atol=1e-10 against clubSandwich's `Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` on the new `mpd_clustered_avg_att_dof` fixture in `benchmarks/data/clubsandwich_cr2_golden.json` (Wald_test's HTZ on a 1-row constraint matrix yields the Satterthwaite t-test DOF). Per-coefficient `period_effects[t].p_value` / `conf_int` and `avg_att` `avg_p_value` / `avg_conf_int` now reflect the correct Satterthwaite DOF rather than the n-k fallback under cluster+hc2_bm. Weighted CR2-BM (`survey_design=` paths) remains a separate gate. New tests: `tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF` (4 tests: refactor regression, R-parity, shape validation, cluster-count validation); existing `test_multi_period_cluster_plus_hc2_bm_rejected` flipped to behavioral `test_multi_period_cluster_plus_hc2_bm_produces_finite_inference`.
1112
- **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:1476`). Mirrors the DiD-absorb auto-route shipped earlier in this release: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, `MultiPeriodDiD.fit()` promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture (new `tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `mpd_absorbed_fe_did`). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` rejection remains as a follow-up (different fit-path structure — no `fixed_effects=` equivalent inside TWFE). **Behavioral note (full `MultiPeriodDiDResults` surface change under auto-route):** under the auto-route, the entire returned `MultiPeriodDiDResults` reflects the full-dummy fit rather than the within-transformed fit — `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, `result.r_squared` all include the FE-dummy entries / un-demeaned values. `result.period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` and `result.avg_att` / `.avg_se` are invariant to this routing (FWL guarantee). MPD requires a time-invariant ever-treated indicator that lies in the span of the intercept and the post-auto-route unit FE dummies (the exact alias depends on the omitted FE reference category under `pd.get_dummies(drop_first=True)`, not just on "the sum of treated-cohort unit dummies"), so `solve_ols` drops one column from that collinear set under R-style rank-deficiency handling. Which specific column is dropped is pivot-order and dummy-coding dependent (in the shipped parity fixture it is a never-treated unit dummy, not the `treated` main effect itself). The per-period interaction coefficients (`treated:period_X`) and `avg_att` are identified and invariant to that choice; parity tests target those rather than the `treated` main effect. **Survey-design scope (replicate weights):** when `survey_design=` uses replicate weights, the auto-route short-circuits the absorb-refit branch at `estimators.py:1693` and routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design — correct because the design does not depend on replicate weights so no per-replicate refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `<time>_<X>` dummies for that entry because the design already absorbs the time dimension via the non-reference period dummies; without the skip, the two blocks would collide on dummy names and the `coefficients` dict would silently collapse duplicates under `var_names`-keyed construction, breaking the coefficients-vs-vcov alignment that downstream consumers rely on. This applies to both the new `absorb=` auto-route and the pre-existing `fixed_effects=[<time_col>]` invocation.
1213
- **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:382`). Previously raised `NotImplementedError` because the HC2 leverage correction and CR2 Bell-McCaffrey DOF depend on the FULL FE hat matrix, while within-transformation (FWL) preserves coefficients and residuals but not the hat. Lift via internal auto-route: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, the fit promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` at ~1e-10 (verified via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `absorbed_fe_did`, with the R generator using the singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF). HC1/CR1 paths unchanged. `MultiPeriodDiD(absorb=...)` and `TwoWayFixedEffects` rejections remain as follow-ups (different fit-path structure). **Behavioral note (full `DiDResults` surface change under auto-route):** under the auto-route, the entire returned `DiDResults` reflects the full-dummy fit rather than the within-transformed fit. Specifically, `result.coefficients` and `result.vcov` include the FE-dummy entries (matching the `fixed_effects=` path), `result.residuals` and `result.fitted_values` are on the un-demeaned outcome scale, and `result.r_squared` is computed on the un-demeaned outcome (so it absorbs the FE variance and will typically be higher than the within-R²). `result.att` is invariant to this routing (FWL guarantee). Downstream consumers reading `result.att` are unaffected; consumers reading the broader result surface should expect the full-dummy values. **Survey-design scope:** the auto-route changes the FE handling (and removes the prior absorbed-FE rejection), but `survey_design=` continues to drive its own variance path (Taylor-series linearization or replicate-weight variance, per the existing survey contract) rather than the analytical HC2/HC2-BM sandwich. The auto-route is therefore methodologically meaningful for non-survey fits and for the FE-handling side of survey fits; analytical small-sample inference under `vcov_type in {"hc2","hc2_bm"}` is bypassed when a survey design is supplied.
1314
- **`SpilloverDiD` Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D).** Closes the documented Wave B/C "SEs biased downward by a few percent" caveat. **Documented synthesis** of Butts (2021) Section 3.1 (the IF construction for spillover-aware DiD) + Gardner (2022) Section 4 (the two-stage GMM sandwich) + Conley (1999) (the spatial kernel). No reference software combines all three — `did2s` (Butts & Gardner) implements the Gardner correction without rings or Conley; `conleyreg` and `acreg` implement Conley without the two-stage correction. Wave D is the synthesis. Applies unconditionally under `vcov_type ∈ {"hc1", "conley", "cluster"}` for both `event_study=False` AND `event_study=True`. **Formula** (Butts 2021 §3.1 + Gardner 2022 §4): `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` where `gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)` is the stage-1-projection-of-stage-2 cross-moment; meat = `Psi' K Psi` with `K` dispatched by `vcov_type` (identity for HC1, block-indicator for cluster, spatial kernel for Conley); vcov = `(X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}`. **Finite-sample multipliers:** `n/(n-p)` for HC1; `G/(G-1) * (n-1)/(n-p)` for cluster CR1; no multiplier for Conley (preserves `conleyreg` / Wave B convention). **Public surface:** `vcov_type="classical"` now raises `NotImplementedError` upfront (the Wave D synthesis has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`); REGISTRY's "vcov_type restrictions" block updated accordingly. **Point estimates unchanged** (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk` are byte-identical to Wave B/C); SE values shift upward by 1-few percent depending on first-stage residual variance. **Implementation:** new module-level helper `_compute_gmm_corrected_meat` in `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new module-level helper `_build_butts_fe_design_csr` in `diff_diff/spillover.py`; new module-level helper `_compute_conley_meat` in `diff_diff/conley.py` factored out of `_compute_conley_vcov` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`) cases. **No new public API kwarg** — the correction is unconditional. Wave D variance mode dispatch derives from the public contract: `vcov_type="conley"` → `"conley"`; `cluster=<col>` → `"cluster"` (CR1); otherwise `"hc1"`. **Wave B/C SE goldens re-pinned** at `tests/test_spillover.py::TestSpilloverDiDEventStudyBackwardCompat` (constants renamed `_WAVE_B_GOLDEN_*` → `_WAVE_D_GOLDEN_*`; pre-Wave-D references retained as commented baselines for the directional inflation invariant `_WAVE_B_UNCORRECTED_*`). **Tests:** new test classes `TestSpilloverDiDWaveDGmmCorrectedHc1Hand` (hand-derived `Psi` on a 4-unit × 3-period over-identified panel — matches at `atol=1e-12`), `TestSpilloverDiDWaveDGmmCorrectedEventStudy` (vcov shape on event-study path), `TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract` (rank-deficient column propagation), `TestSpilloverDiDWaveDGmmCorrectedValidatorWiring` (Conley validator fires from the new helper), `TestSpilloverDiDWaveDGmmCorrectedFitIdempotence` (clone + repeat-fit bit-identity per `feedback_fit_does_not_mutate_config`), `TestSpilloverDiDWaveDPublicVarianceContract` (end-to-end public `cluster=<col>` CR1 routing, single-cluster rejection, classical NotImplementedError). Closes the Gardner-GMM follow-up row in `TODO.md`.

benchmarks/R/generate_clubsandwich_golden.R

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,66 @@ output$mpd_absorbed_fe_did <- list(
172172
target_period = 4L
173173
)
174174

175+
# --- MPD clustered avg_att DOF scenario (Gate 6 lift PR) ---------------------
176+
# Pins clubSandwich's compound-contrast Satterthwaite DOF for the post-period-
177+
# average ATT under cluster-robust CR2. Mirrors MultiPeriodDiD(cluster=unit,
178+
# vcov_type='hc2_bm', fixed_effects=['unit']) parameterization. Per-coefficient
179+
# DOFs use coef_test()$df_Satt (the canonical Satterthwaite per-coef API);
180+
# the compound contrast DOF uses Wald_test(constraints=matrix(c_avg, 1),
181+
# test='HTZ')$df_denom — on a 1-row constraint matrix HTZ reduces to a
182+
# Satterthwaite t-test and its df_denom IS the BM Satterthwaite DOF.
183+
184+
d_mpd_cl <- make_mpd_panel(n_total = 15, units_per_cohort = 5, n_periods = 4,
185+
seed = 20260517)
186+
d_mpd_cl$period_f <- relevel(factor(d_mpd_cl$period), ref = "1")
187+
for (p in 2:4) {
188+
d_mpd_cl[[paste0("treated_period_", p)]] <-
189+
d_mpd_cl$treated * (d_mpd_cl$period == p)
190+
}
191+
fit_mpd_cl <- lm(y ~ treated + period_f +
192+
treated_period_2 + treated_period_3 + treated_period_4 +
193+
factor(unit),
194+
data = d_mpd_cl)
195+
vcov_mpd_cr2 <- vcovCR(fit_mpd_cl, cluster = d_mpd_cl$unit, type = "CR2")
196+
# Per-coefficient DOF via coef_test (canonical Satterthwaite API).
197+
ct_mpd_cr2 <- coef_test(fit_mpd_cl, vcov = vcov_mpd_cr2)
198+
# Compound post-period-average contrast: (1/3) * (e_treated_period_2
199+
# + e_treated_period_3 + e_treated_period_4). Build full-width vector
200+
# matching coef(fit) order, with zeros on the NA-dropped column.
201+
all_coef_names <- names(coef(fit_mpd_cl))
202+
n_coef <- length(all_coef_names)
203+
c_avg_vec <- setNames(rep(0, n_coef), all_coef_names)
204+
post_names <- c("treated_period_2", "treated_period_3", "treated_period_4")
205+
c_avg_vec[post_names] <- 1 / length(post_names)
206+
# Wald_test ignores NA-dropped coefficients; subset the constraint vector
207+
# to the non-NA coefficients (clubSandwich's coef_test convention).
208+
finite_mask <- !is.na(coef(fit_mpd_cl))
209+
c_avg_kept <- c_avg_vec[finite_mask]
210+
dof_avg_compound <- Wald_test(
211+
fit_mpd_cl,
212+
constraints = matrix(c_avg_kept, 1),
213+
vcov = vcov_mpd_cr2,
214+
test = "HTZ"
215+
)$df_denom
216+
output$mpd_clustered_avg_att_dof <- list(
217+
unit = d_mpd_cl$unit,
218+
period = d_mpd_cl$period,
219+
treated = d_mpd_cl$treated,
220+
y = d_mpd_cl$y,
221+
cluster = d_mpd_cl$unit,
222+
coef = as.numeric(coef(fit_mpd_cl)),
223+
coef_names = all_coef_names,
224+
finite_coef_names = all_coef_names[finite_mask],
225+
vcov_cr2 = as.numeric(vcov_mpd_cr2),
226+
vcov_cr2_shape = dim(vcov_mpd_cr2),
227+
dof_per_coef = as.numeric(ct_mpd_cr2$df_Satt),
228+
c_avg = as.numeric(c_avg_kept),
229+
dof_avg = unname(dof_avg_compound),
230+
post_interaction_names = post_names,
231+
reference_period = 1L,
232+
n_post_periods = length(post_names)
233+
)
234+
175235
output$meta <- list(
176236
source = "clubSandwich",
177237
clubSandwich_version = as.character(packageVersion("clubSandwich")),

0 commit comments

Comments
 (0)