You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend SDID coverage MC with stratified-survey DGP; regenerate artifact
Capstone of PR #352. Validates the new weighted-FW + Rao-Wu bootstrap
composition and propagates the landed capability across the
documentation surfaces.
Coverage MC harness (benchmarks/python/coverage_sdid.py):
- Add ``stratified_survey`` as a 4th DGP in ``ALL_DGPS``. Uses
``generate_survey_did_data`` to produce an N=40 (strata=2, PSU=2/
stratum) null-treatment panel with moderate weight variation and
modest ICC (``psu_re_sd=1.5``). Cohort 7 → post = 7..11 (5 post
periods). Converts per-observation ``treated`` to a unit-level
ever-treated indicator (SDID's block-treatment requirement).
- Extend ``DGPSpec`` with an optional ``survey_design_factory``
callable that returns ``(SurveyDesign, supported_methods_tuple)``.
For ``stratified_survey``: bootstrap only — placebo / jackknife
reject strata/PSU/FPC at fit-time, so the harness skips them
rather than catching the NotImplementedError inside ``_fit_one``.
- ``_fit_one`` gains an optional ``survey_design`` kwarg routed
through ``SyntheticDiD.fit(survey_design=)``. ``_run_dgp`` calls
the factory once per seed (DataFrame contents don't affect
columns) and gates methods on the supported set.
Regenerated ``benchmarks/data/sdid_coverage.json`` via
``python benchmarks/python/coverage_sdid.py --n-seeds 500 --n-bootstrap
200``. Total wall-clock 2421 s (~40 min on M-series Mac, Rust backend);
aer63 remains the long tail at 2237 s, stratified_survey adds only
33 s.
Calibration gate (plan §2.7): ``stratified_survey × bootstrap`` at
α=0.05 returns 0.042 (500 seeds × B=200), inside the calibration
band [0.02, 0.10]. ``mean SE / true SD = 1.25`` indicates the
bootstrap is slightly conservative (overestimates empirical
sampling SD by ~25%) — the safer direction under Rao-Wu rescaling
with only 4 PSUs total. Validates the weighted-FW + Rao-Wu
composition end-to-end.
REGISTRY.md §SyntheticDiD:
- Add ``stratified_survey`` row to the coverage MC table and a
paragraph under it documenting the calibration verdict, the
conservatism direction, and why placebo/jackknife rows are NaN.
- Replace the survey-support bullet with a truth-table matrix (PR
#352 shape); add a ``Note (survey + bootstrap composition)``
documenting the weighted-FW objective (unit and time forms), the
ω_eff composition, the argmin-set caveat, the per-draw rw
dispatch (pweight-only vs Rao-Wu), and the single-PSU
short-circuit.
- Update the ``Note (default variance_method deviation from R)`` to
drop the "bootstrap rejects surveys" framing (no longer accurate).
- Update the ``Note (coverage Monte Carlo calibration)`` header to
say "4 representative null-panel DGPs" and flag stratified_survey
as bootstrap-only.
User-facing docs:
- ``docs/methodology/survey-theory.md``: restore SDID in the Rao-Wu
Rescaled Bootstrap list; describe the weighted-FW composition.
- ``docs/survey-roadmap.md``: Phase 5 SDID row updated to reflect
full-design bootstrap support via PR #352; Phase 6 Rao-Wu bullet
restores SDID.
- ``docs/tutorials/16_survey_did.ipynb`` cell-35: support matrix
table row for SyntheticDiD switches from "pweight only (placebo/
jackknife)" to "bootstrap only (PR #352) for strata/PSU/FPC";
"Note on SyntheticDiD" block rewritten for the landed contract.
- ``diff_diff/synthetic_did.py`` ``__init__`` docstring: bootstrap
bullet now describes survey support and the ω_eff composition.
- ``diff_diff/guides/llms-full.txt``: survey-aware bootstrap bullet
includes SDID in the Rao-Wu list with the weighted-FW formula.
CHANGELOG.md:
- Retain the PR #351 regression Changed entry but annotate it as
"restored in PR #352"; add new Added/Changed PR #352 entries
documenting the weighted-FW kernel, survey helpers, _bootstrap_se
Rao-Wu composition, and the new coverage MC row.
TODO.md:
- Row 103 (SDID + survey designs) → closed by PR #352; replaced
with a narrower follow-up for placebo/jackknife + strata/PSU/FPC
(Low priority, no concrete sketch yet).
Tests:
- ``TestCoverageMCArtifact`` extended: 4 DGPs asserted (including
``stratified_survey``); new explicit assertions that the
stratified_survey bootstrap row has ≥100 successful fits and
α=0.05 rejection ∈ [0.02, 0.10]; placebo/jackknife rows
n_successful_fits == 0 (strata/PSU/FPC rejection contract).
Verified: TestCoverageMCArtifact passes against the regenerated
artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
18
18
- SyntheticDiD bootstrap now retries degenerate resamples (all-control or all-treated, or non-finite `τ_b`) until exactly `n_bootstrap` valid replicates are accumulated, matching R's `synthdid::bootstrap_sample` and Arkhangelsky et al. (2021) Algorithm 2. Previously the Python path counted attempts (with degenerate draws silently dropped), producing fewer valid replicates than requested. A bounded-attempt guard (`20 × n_bootstrap`) prevents pathological-input hangs.
19
19
20
20
### Changed
21
-
-**SyntheticDiD bootstrap no longer supports survey designs** (capability regression). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the new paper-faithful refit bootstrap rejects all survey designs (including pweight-only) with `NotImplementedError`. Pweight-only users can switch to `variance_method="placebo"` or `"jackknife"`. Strata/PSU/FPC users have no SDID variance option on this release. Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation requires a separate derivation (weighted FW solver); sketch and reusable scaffolding pointers are in `docs/methodology/REGISTRY.md` §SyntheticDiD and `TODO.md`.
21
+
-**SyntheticDiD bootstrap no longer supports survey designs** (capability regression in PR #351, **restored in PR #352** — see Added/Changed entries directly below). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the PR #351 paper-faithful refit bootstrap initially rejected all survey designs (including pweight-only) with `NotImplementedError`. PR #352 restores the capability via a weighted-FW + Rao-Wu composition; the lock-out window applies only to the v3.2.x line that ships PR #351 alone (without PR #352). Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation: see `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap composition)`.
22
+
23
+
### Added (PR #352)
24
+
- **SDID `variance_method="bootstrap"` survey support restored** via weighted Frank-Wolfe + Rao-Wu rescaling. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` Rao-Wu branch composes Rao-Wu rescaled weights per draw (or constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion).
25
+
26
+
### Changed (PR #352)
27
+
-**SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR).
|`HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. |`diff_diff/had.py::_validate_had_panel_event_study`| Phase 2b | Low |
106
106
|`HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. |`diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference`| Phase 2a | Medium |
107
-
| **SDID + survey designs** (capability regression in this release; both pweight-only AND strata/PSU/FPC). The previous release's fixed-weight bootstrap accepted strata/PSU/FPC via Rao-Wu rescaled bootstrap; the new paper-faithful refit bootstrap rejects all survey designs because Rao-Wu composed with Frank-Wolfe re-estimation requires its own derivation. The follow-up needs a **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in the loss and regularization (`Σ rw_i ω_i Y_i,pre` / `ζ² Σ rw_i ω_i²`), threaded through `compute_sdid_unit_weights` / `compute_time_weights`. Reusable scaffolding (`generate_rao_wu_weights`, split into `rw_control` / `rw_treated`, degenerate-retry, treated-mean weighting) is recoverable from the pre-rewrite `_bootstrap_se` body via `git show 91082e5:diff_diff/synthetic_did.py` (PR #351 "Replace SDID fixed-weight bootstrap with paper-faithful refit"). Compose-after-unweighted-FW does not work — silently reproduces the fixed-weight Rao-Wu behavior we removed. Validation: re-use the coverage MC harness with a stratified DGP, confirm near-nominal rejection rates against placebo-SE tracking. See REGISTRY.md §SyntheticDiD `Note (deferred survey + bootstrap composition)` for the sketch. | `synthetic_did.py::fit`, `synthetic_did.py::_bootstrap_se`, `utils.py::_sc_weight_fw` | follow-up | Medium |
107
+
|**SDID + placebo/jackknife + strata/PSU/FPC** (capability gap remaining after PR #352). PR #352 restored survey-bootstrap support via weighted Frank-Wolfe + Rao-Wu composition; the same composition for `placebo` (which permutes control indices) and `jackknife` (which leaves out one unit at a time) requires its own derivations: placebo's allocator needs a weighted permutation distribution that respects PSU clustering; jackknife needs PSU-level LOO + stratum aggregation. Both reuse the weighted-FW kernel from PR #352 (`_sc_weight_fw(reg_weights=)`); the genuinely new work is the per-method allocator. Tracked but no concrete sketch yet — defer until user demand surfaces. |`synthetic_did.py::_placebo_variance_se`, `synthetic_did.py::_jackknife_se`| follow-up | Low |
108
108
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. |`benchmarks/R/`, `benchmarks/julia/`, `tests/`| follow-up | Low |
- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
1675
1675
- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH)
1676
-
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, TROP). SyntheticDiD bootstrap is non-survey only: the paper-faithful refit path re-estimates weights via Frank-Wolfe per draw, and Rao-Wu + refit composition is not yet implemented (tracked in TODO.md)
1676
+
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #352): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo and jackknife paths still reject strata/PSU/FPC (separate methodology gap, tracked in TODO.md)
0 commit comments