Add CBWSDID covariate balancing to StackedDiD (Ustyuzhanin 2026)#534
Conversation
Overall AssessmentExecutive Summary
MethodologyFinding M1 — P1Severity: P1 Finding M2 — P3 InformationalSeverity: P3 Code QualityFinding C1 — P2Severity: P2 PerformanceNo unmitigated findings. The per-cohort entropy balancing loop is expected for this v1 design. MaintainabilityNo unmitigated findings. The new balancing code is isolated and the estimator parameter is threaded through Tech DebtDocumented CBWSDID deferrals are properly tracked in SecurityNo findings. The added benchmark files and R script do not show secrets or credential material. Documentation/TestsFinding T1 — P2Severity: P2 Path to Approval
|
82722ff to
f65f02e
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyFinding M1 — P3 InformationalSeverity: P3 Prior Finding M1 — ResolvedSeverity: Resolved prior P1 Code QualityPrior Finding C1 — ResolvedSeverity: Resolved prior P2 PerformanceNo findings. The per-cohort entropy-balancing loop is expected for this v1 implementation. MaintainabilityNo findings. The new Tech DebtFinding TD1 — P3 InformationalSeverity: P3 SecurityNo findings. The new benchmark R script and data fixtures do not expose secrets or credential material. Documentation/TestsPrior Finding T1 — ResolvedSeverity: Resolved prior P2 Verification Note — P3 InformationalSeverity: P3 |
Implement Covariate-Balanced Weighted Stacked Difference-in-Differences
(Ustyuzhanin 2026, arXiv:2604.02293) as a covariate-balancing path on the
existing StackedDiD estimator rather than a new class. A new constructor
parameter balance="entropy" plus fit(..., covariates=[...]) add a within-
sub-experiment design stage: entropy balancing (Hainmueller 2012) reweights
the clean controls toward the treated cohort's covariate means (read at the
last pre-treatment period t=a-1-anticipation, so design weights use only
pre-treatment information), and the resulting nonnegative design weights b_sa
compose with the Wing et al. (2024) corrective weights via the EFFECTIVE
control mass into the final stacked weights:
W_sa = b_sa * (N^D_a/N^D_Omega) / (Ntilde^C_a/Ntilde^C_Omega) (controls)
W_sa = 1 (treated)
injected at the single composed_weights point so the existing WLS +
cluster-robust vcov produce the estimate and conditional-on-weights SEs. This
is control-only reweighting, so it estimates untreated trends under conditional
parallel trends while preserving the trimmed-aggregate-ATT estimand (it reduces
exactly to weighted stacked DID at b_sa=1). A naive b_sa*Q_aggregate multiply
is NOT equivalent — it would aggregate control means with the wrong cohort
weights — so W_sa is computed directly from the effective control mass.
New module diff_diff/balancing.py provides the entropy-balancing solver
(convex-dual damped Newton with L-BFGS fallback). Scope (v1): balance="entropy"
requires weighting="aggregate"; balance + population/sample_share/survey_design=
raise NotImplementedError; matching-based balancing and the repeated-treatment
extension are out of scope. Infeasible cohorts fail closed with a clear,
cohort-named error rather than silently dropping a cohort.
Validation: closed-form DID^CBWSDID_e paper-formula anchor (R-independent, 1e-8),
an effective-mass-is-load-bearing test on a varying-control-count panel, the
estimand-integrity-under-heterogeneity test (recovers the treated-average ATT
where plain StackedDiD is biased), and cross-language parity against the R
cbwsdid package (weightit/ebal) at ~3e-7 (golden fixture committed). Docs,
REGISTRY, references, guides, and CHANGELOG updated; llms-full signature pinned
in test_guides.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f65f02e to
7a90da5
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyFinding M1 — P3 InformationalSeverity: P3 Prior Finding M1 — ResolvedSeverity: Resolved prior P1 Code QualityPrior Finding C1 — ResolvedSeverity: Resolved prior P2 PerformanceNo findings. The per-sub-experiment entropy-balancing loop is expected for the documented v1 scope. MaintainabilityNo findings. Tech DebtFinding TD1 — P3 InformationalSeverity: P3 SecurityNo findings. The new benchmark data/script do not expose credentials or sensitive material. Documentation/TestsPrior Finding T1 — ResolvedSeverity: Resolved prior P2 Finding T2 — P3 InformationalSeverity: P3 |
Summary
StackedDiD, not a new estimator class. New constructor parameterbalance="entropy"plusfit(..., covariates=[...])add a within-sub-experiment design stage: entropy balancing (Hainmueller 2012) reweights the clean controls toward the treated cohort's covariate means (read at the last pre-treatment periodt=a-1-anticipation, so design weights use only pre-treatment information). The resulting nonnegative design weightsb_sacompose with the Wing et al. (2024) corrective weights via the effective control mass into the final stacked weightsW_sa = b_sa·(N^D_a/N^D_Ω)/(Ñ^C_a/Ñ^C_Ω), injected at the singlecomposed_weightspoint so the existing WLS + cluster-robust vcov produce the estimate and conditional-on-weights SEs.b_sa·Q_aggregatemultiply is not equivalent and would bias the estimand under non-uniform weights —W_sais computed directly from the effective control mass; a dedicated test guards this.diff_diff/balancing.pyprovides the entropy-balancing solver (convex-dual damped Newton with an L-BFGS fallback).balance="entropy"requiresweighting="aggregate"and balanced event windows;population/sample_share/survey_design=, ragged/unbalanced windows, matching-based balancing, and the repeated0→1/1→0episode extension are out of scope and fail closed (NotImplementedError/ValueError). Infeasible cohorts (treated covariate profile outside the clean-control hull) fail closed with a clear, cohort-named error rather than silently dropping a cohort (which would shift the estimand). Deferrals tracked inTODO.md.Methodology references (required if estimator / math changes)
StackedDiD(entropy balancing within sub-experiments, composed with Wing-et-al. corrective stacked weights)docs/methodology/papers/ustyuzhanin-2026-review.md); Hainmueller (2012) Political Analysis 20(1) 25–46; Wing, Freedman & Hollingsworth (2024) NBER WP 32054docs/methodology/REGISTRY.mdunder the StackedDiD "Covariate balancing (CBWSDID)" subsection with- **Note:**labels, and the unit-count vs observation-count corrector convention off balanced panels is the documented (deferred) choice.Validation
tests/test_balancing.py(entropy solver: exact moment balance, base weights, collinearity, infeasibility);tests/test_stacked_did.py::TestStackedDiDCovariateBalance(guards, cross-validation, constraint satisfaction, constant-covariate reduction, covariate-scale invariance, default-aggregate mode, ragged-window rejection, diagnostics);tests/test_methodology_stacked_did.py(closed-formDID^CBWSDID_epaper-formula anchor; effective-mass-is-load-bearing on a varying-control-count panel; estimand-integrity under heterogeneity; cross-language parity vs the Rcbwsdidpackage);tests/test_guides.py::TestLLMsFullStackedDiDCoverage(signature pin).cbwsdid(refinement.method="weightit",method="ebal") at ~3e-7 on estimates and ~0.3% on SEs (golden fixturebenchmarks/data/cbwsdid_golden.json, regenerate viabenchmarks/R/generate_cbwsdid_golden.R). Full test suite: 7658 passed.Security / privacy
🤖 Generated with Claude Code