Skip to content

Commit f65f02e

Browse files
igerberclaude
andcommitted
Add CBWSDID covariate balancing to StackedDiD (Ustyuzhanin 2026)
Implement Covariate-Balanced Weighted Stacked Difference-in-Differences (Ustyuzhanin 2026, arXiv:2604.02293) as a covariate-balancing path on the existing StackedDiD estimator rather than a new class. A new constructor parameter balance="entropy" plus fit(..., covariates=[...]) add a within- sub-experiment design stage: entropy balancing (Hainmueller 2012) reweights the clean controls toward the treated cohort's covariate means (read at the last pre-treatment period t=a-1-anticipation, so design weights use only pre-treatment information), and the resulting nonnegative design weights b_sa compose with the Wing et al. (2024) corrective weights via the EFFECTIVE control mass into the final stacked weights: W_sa = b_sa * (N^D_a/N^D_Omega) / (Ntilde^C_a/Ntilde^C_Omega) (controls) W_sa = 1 (treated) injected at the single composed_weights point so the existing WLS + cluster-robust vcov produce the estimate and conditional-on-weights SEs. This is control-only reweighting, so it estimates untreated trends under conditional parallel trends while preserving the trimmed-aggregate-ATT estimand (it reduces exactly to weighted stacked DID at b_sa=1). A naive b_sa*Q_aggregate multiply is NOT equivalent — it would aggregate control means with the wrong cohort weights — so W_sa is computed directly from the effective control mass. New module diff_diff/balancing.py provides the entropy-balancing solver (convex-dual damped Newton with L-BFGS fallback). Scope (v1): balance="entropy" requires weighting="aggregate"; balance + population/sample_share/survey_design= raise NotImplementedError; matching-based balancing and the repeated-treatment extension are out of scope. Infeasible cohorts fail closed with a clear, cohort-named error rather than silently dropping a cohort. Validation: closed-form DID^CBWSDID_e paper-formula anchor (R-independent, 1e-8), an effective-mass-is-load-bearing test on a varying-control-count panel, the estimand-integrity-under-heterogeneity test (recovers the treated-average ATT where plain StackedDiD is biased), and cross-language parity against the R cbwsdid package (weightit/ebal) at ~3e-7 (golden fixture committed). Docs, REGISTRY, references, guides, and CHANGELOG updated; llms-full signature pinned in test_guides. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent fbdcbb9 commit f65f02e

19 files changed

Lines changed: 2907 additions & 4 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`StackedDiD` covariate balancing (CBWSDID; Ustyuzhanin 2026, arXiv:2604.02293).** New constructor parameter `balance="entropy"` plus `fit(..., covariates=[...])` add a within-sub-experiment design stage: entropy balancing (Hainmueller 2012) reweights the clean controls toward the treated cohort's covariate means (read at the last pre-treatment period), and the resulting design weights `b_sa` compose with the Wing et al. (2024) corrective weights via the effective control mass into the final stacked weights `W_sa`. This is **control-only reweighting**, so it estimates untreated trends under *conditional* parallel trends while preserving the trimmed-aggregate-ATT estimand (at `b_sa=1` it reduces to the paper's unit-count weighted stacked DID, equal to `StackedDiD(weighting="aggregate")` on balanced event windows). Inference reuses the existing conditional-on-weights cluster-robust path. Scope: requires `weighting="aggregate"` and **balanced event windows** (ragged windows raise — the unit-count vs observation-count convention is unresolved off balanced panels); `population`/`sample_share`/`survey_design=` and matching-based balancing / the repeated-treatment extension are not supported (raise `NotImplementedError`). Infeasible cohorts fail closed with a clear error. New `diff_diff/balancing.py` (entropy-balancing solver). Estimand validated end-to-end against the closed-form CBWSDID formula (`tests/test_methodology_stacked_did.py`).
1112
- **`SyntheticControl` conformal inference (Chernozhukov, Wüthrich & Zhu 2021, *JASA* 116(536)).** Three opt-in `SyntheticControlResults` methods give valid p-values for the post-period effect trajectory and pointwise confidence intervals — what the in-space placebo / Firpo-Possebom test-inversion paths cannot. Unlike the Firpo path (which re-ranks the cross-unit placebo gaps), the conformal layer fits its **own** time-permutation-invariant constrained-LS synthetic-control proxy (CWZ §2.3 eqs 3–4 — simplex weights on raw outcomes over **all** periods under the null, no `V`-matrix, no intercept) and permutes residuals **over time** for the single treated unit (CWZ's exactness theory requires a time-symmetric proxy, which the headline ADH `V`-matrix fit is not). **`conformal_test(effect, q=1, scheme="moving_block", n_iid=10000, seed=None)`** computes the joint sharp-null permutation p-value (eqs 1–2) of `S_q(û) = ((1/√T*)·Σ_{t>T0}|û_t|^q)^{1/q}` (`q ∈ {1, 2, ∞}`); the proxy is fit once and only residuals are permuted (footnote 7). **`conformal_confidence_intervals(alpha=0.1, scheme="moving_block", bounds=None, n_grid=100, seed=None)`** returns pointwise per-period CIs by test inversion (Algorithm 1 — each period `t` uses `Z = (pre-periods, t)` with the other post-periods dropped, a clean `T*=1` test). **`conformal_average_effect(alpha=0.1, scheme="moving_block", bounds=None, n_grid=200, seed=None)`** returns a CI for the average post-period effect by collapsing the panel into non-overlapping `T*`-blocks and permuting the block residuals (Appendix A.1). Permutation schemes: `"moving_block"` (`Π_→` cyclic shifts, valid under serial dependence — the default) and `"iid"` (`Π_all`, sampled, finer p-values); both include the identity so the p-value floor is `1/|Π|` (no extra `+1`). Fail-closed handling for `<1` donor / unpickled result / non-finite panel / non-converged grid points (treated as indeterminate, not rejected) / grid-limited / empty / unbounded sets; a single donor and `T*≥T0` warn. Surfaced under `conformal_inference` / `get_conformal_grid_df()` and `DiagnosticReport`'s `estimator_native_diagnostics`; the analytical `se`/`t_stat`/`p_value`/`conf_int`/`is_significant` stay NaN throughout. Core in the new `diff_diff/conformal.py` (reuses the Frank-Wolfe simplex solver). *Deferred:* one-sided variants (§7), covariates folded into the proxy, and the AR/innovation-permutation path (Lemmas 5–7).
1213

1314
### Changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
112112
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
113113
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
114114
- [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html) - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲ = 0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲ > 0`, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release - repeated cross-sections rejected by the validator. Alias `HAD`.
115-
- [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html) - Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments
115+
- [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html) - Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments; optional covariate balancing (Ustyuzhanin 2026)
116116
- [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html) - Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs
117117
- [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html) - Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment
118118
- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered.html#staggeredtripledifference) - Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ Deferred items from PR reviews that were not addressed before merge.
7474

7575
| Issue | Location | PR | Priority |
7676
|-------|----------|----|----------|
77+
| CBWSDID covariate balancing (`StackedDiD(balance="entropy")`) v1 supports only balanced event windows + `weighting="aggregate"`; unbalanced/ragged panels fail closed (the unit-count vs observation-count corrector convention is unresolved off balanced panels). Matching-based balancing and the repeated `0→1`/`1→0` episode extension are also deferred (out-of-scope guards raise). Documented in REGISTRY.md StackedDiD "Covariate balancing (CBWSDID)" Notes. | `stacked_did.py`, `balancing.py`, `docs/methodology/REGISTRY.md` | follow-up | Low |
7778
| `SyntheticControl` cv: `in_space_placebo()` / `leave_one_out()` report a cv refit excluded for STRUCTURAL infeasibility (donor-indistinguishable re-aggregated window) with the generic `status="failed"` — same machine-readable status as a genuine inner-solver non-convergence. The failure warnings now distinguish the two causes (and the correct remediation) under cv, and `in_time_placebo()` already splits structural→`"infeasible"` vs `"failed"`, but in-space/LOO do not yet emit a separate machine-readable status/reason-code. Thread a reason code from `_outer_solve_V_cv()`/`_placebo_fit_unit()` and add an `"infeasible"` status + count to the in-space/LOO outputs (mirror the in-time split). | `synthetic_control.py`, `synthetic_control_results.py` | follow-up | Low |
7879
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
7980
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
#!/usr/bin/env Rscript
2+
# Generate the cross-language golden fixture for StackedDiD's covariate-balancing
3+
# (CBWSDID) path against the reference R package `cbwsdid` (Ustyuzhanin 2026).
4+
#
5+
# Unlike generate_stacked_did_golden.R (which operates on a PRE-stacked CSV so the
6+
# R side is independent of Python stacking logic), `cbwsdid` does its OWN stacking
7+
# + balancing, so this harness hands it the raw panel and dumps the dynamic
8+
# event-study ATTs. The Python side (StackedDiD(balance="entropy", ...)) reproduces
9+
# them via its independent entropy-balancing solver + effective-mass W_sa.
10+
#
11+
# Refinement: refinement.method="weightit", method="ebal" = entropy balancing
12+
# (Hainmueller 2012) on covs.formula=~x, matching StackedDiD(balance="entropy",
13+
# covariates=["x"]). Install: remotes::install_github("vadvu/cbwsdid").
14+
#
15+
# Usage: Rscript benchmarks/R/generate_cbwsdid_golden.R
16+
17+
suppressMessages({
18+
library(cbwsdid)
19+
library(jsonlite)
20+
})
21+
22+
# Run from the repository root: Rscript benchmarks/R/generate_cbwsdid_golden.R
23+
panel_csv <- "benchmarks/data/cbwsdid_balance_panel.csv"
24+
out_json <- "benchmarks/data/cbwsdid_golden.json"
25+
26+
df <- read.csv(panel_csv)
27+
28+
m <- cbwsdid(
29+
data = df, y = "y", d = "d", id = c("unit", "time"),
30+
kappa = c(-2, 2), design = "absorbing", post_path = "stable",
31+
refinement.method = "weightit", covs.formula = ~x,
32+
refinement.args = list(method = "ebal"), pooled = TRUE
33+
)
34+
qoi <- cbwsdid_qoi(m, type = "dynamic")
35+
36+
golden <- list(
37+
meta = list(
38+
package = "cbwsdid",
39+
R_version = R.version.string,
40+
panel = "benchmarks/data/cbwsdid_balance_panel.csv",
41+
estimator = "cbwsdid(design='absorbing', refinement.method='weightit', method='ebal', covs.formula=~x)",
42+
kappa = c(-2L, 2L)
43+
),
44+
dynamic = list(
45+
event_time = as.integer(qoi$et),
46+
estimate = as.numeric(qoi$estimate),
47+
std_error = as.numeric(qoi$std.error)
48+
)
49+
)
50+
write_json(golden, out_json, auto_unbox = TRUE, digits = 15, pretty = TRUE)
51+
cat("wrote", out_json, "\n")
52+
print(data.frame(et = qoi$et, estimate = qoi$estimate, se = qoi$std.error))

0 commit comments

Comments
 (0)