Skip to content

Commit 6d64ec1

Browse files
igerberclaude
andcommitted
Extend SDID coverage MC with stratified-survey DGP; regenerate artifact
Capstone of PR #352. Validates the new weighted-FW + Rao-Wu bootstrap composition and propagates the landed capability across the documentation surfaces. Coverage MC harness (benchmarks/python/coverage_sdid.py): - Add ``stratified_survey`` as a 4th DGP in ``ALL_DGPS``. Uses ``generate_survey_did_data`` to produce an N=40 (strata=2, PSU=2/ stratum) null-treatment panel with moderate weight variation and modest ICC (``psu_re_sd=1.5``). Cohort 7 → post = 7..11 (5 post periods). Converts per-observation ``treated`` to a unit-level ever-treated indicator (SDID's block-treatment requirement). - Extend ``DGPSpec`` with an optional ``survey_design_factory`` callable that returns ``(SurveyDesign, supported_methods_tuple)``. For ``stratified_survey``: bootstrap only — placebo / jackknife reject strata/PSU/FPC at fit-time, so the harness skips them rather than catching the NotImplementedError inside ``_fit_one``. - ``_fit_one`` gains an optional ``survey_design`` kwarg routed through ``SyntheticDiD.fit(survey_design=)``. ``_run_dgp`` calls the factory once per seed (DataFrame contents don't affect columns) and gates methods on the supported set. Regenerated ``benchmarks/data/sdid_coverage.json`` via ``python benchmarks/python/coverage_sdid.py --n-seeds 500 --n-bootstrap 200``. Total wall-clock 2421 s (~40 min on M-series Mac, Rust backend); aer63 remains the long tail at 2237 s, stratified_survey adds only 33 s. Calibration gate (plan §2.7): ``stratified_survey × bootstrap`` at α=0.05 returns 0.042 (500 seeds × B=200), inside the calibration band [0.02, 0.10]. ``mean SE / true SD = 1.25`` indicates the bootstrap is slightly conservative (overestimates empirical sampling SD by ~25%) — the safer direction under Rao-Wu rescaling with only 4 PSUs total. Validates the weighted-FW + Rao-Wu composition end-to-end. REGISTRY.md §SyntheticDiD: - Add ``stratified_survey`` row to the coverage MC table and a paragraph under it documenting the calibration verdict, the conservatism direction, and why placebo/jackknife rows are NaN. - Replace the survey-support bullet with a truth-table matrix (PR #352 shape); add a ``Note (survey + bootstrap composition)`` documenting the weighted-FW objective (unit and time forms), the ω_eff composition, the argmin-set caveat, the per-draw rw dispatch (pweight-only vs Rao-Wu), and the single-PSU short-circuit. - Update the ``Note (default variance_method deviation from R)`` to drop the "bootstrap rejects surveys" framing (no longer accurate). - Update the ``Note (coverage Monte Carlo calibration)`` header to say "4 representative null-panel DGPs" and flag stratified_survey as bootstrap-only. User-facing docs: - ``docs/methodology/survey-theory.md``: restore SDID in the Rao-Wu Rescaled Bootstrap list; describe the weighted-FW composition. - ``docs/survey-roadmap.md``: Phase 5 SDID row updated to reflect full-design bootstrap support via PR #352; Phase 6 Rao-Wu bullet restores SDID. - ``docs/tutorials/16_survey_did.ipynb`` cell-35: support matrix table row for SyntheticDiD switches from "pweight only (placebo/ jackknife)" to "bootstrap only (PR #352) for strata/PSU/FPC"; "Note on SyntheticDiD" block rewritten for the landed contract. - ``diff_diff/synthetic_did.py`` ``__init__`` docstring: bootstrap bullet now describes survey support and the ω_eff composition. - ``diff_diff/guides/llms-full.txt``: survey-aware bootstrap bullet includes SDID in the Rao-Wu list with the weighted-FW formula. CHANGELOG.md: - Retain the PR #351 regression Changed entry but annotate it as "restored in PR #352"; add new Added/Changed PR #352 entries documenting the weighted-FW kernel, survey helpers, _bootstrap_se Rao-Wu composition, and the new coverage MC row. TODO.md: - Row 103 (SDID + survey designs) → closed by PR #352; replaced with a narrower follow-up for placebo/jackknife + strata/PSU/FPC (Low priority, no concrete sketch yet). Tests: - ``TestCoverageMCArtifact`` extended: 4 DGPs asserted (including ``stratified_survey``); new explicit assertions that the stratified_survey bootstrap row has ≥100 successful fits and α=0.05 rejection ∈ [0.02, 0.10]; placebo/jackknife rows n_successful_fits == 0 (strata/PSU/FPC rejection contract). Verified: TestCoverageMCArtifact passes against the regenerated artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4c4a81b commit 6d64ec1

11 files changed

Lines changed: 247 additions & 56 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1818
- SyntheticDiD bootstrap now retries degenerate resamples (all-control or all-treated, or non-finite `τ_b`) until exactly `n_bootstrap` valid replicates are accumulated, matching R's `synthdid::bootstrap_sample` and Arkhangelsky et al. (2021) Algorithm 2. Previously the Python path counted attempts (with degenerate draws silently dropped), producing fewer valid replicates than requested. A bounded-attempt guard (`20 × n_bootstrap`) prevents pathological-input hangs.
1919

2020
### Changed
21-
- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the new paper-faithful refit bootstrap rejects all survey designs (including pweight-only) with `NotImplementedError`. Pweight-only users can switch to `variance_method="placebo"` or `"jackknife"`. Strata/PSU/FPC users have no SDID variance option on this release. Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation requires a separate derivation (weighted FW solver); sketch and reusable scaffolding pointers are in `docs/methodology/REGISTRY.md` §SyntheticDiD and `TODO.md`.
21+
- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression in PR #351, **restored in PR #352** — see Added/Changed entries directly below). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the PR #351 paper-faithful refit bootstrap initially rejected all survey designs (including pweight-only) with `NotImplementedError`. PR #352 restores the capability via a weighted-FW + Rao-Wu composition; the lock-out window applies only to the v3.2.x line that ships PR #351 alone (without PR #352). Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation: see `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap composition)`.
22+
23+
### Added (PR #352)
24+
- **SDID `variance_method="bootstrap"` survey support restored** via weighted Frank-Wolfe + Rao-Wu rescaling. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` Rao-Wu branch composes Rao-Wu rescaled weights per draw (or constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion).
25+
26+
### Changed (PR #352)
27+
- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR).
2228

2329
## [3.2.0] - 2026-04-19
2430

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Deferred items from PR reviews that were not addressed before merge.
104104
| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
105105
| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
106106
| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
107-
| **SDID + survey designs** (capability regression in this release; both pweight-only AND strata/PSU/FPC). The previous release's fixed-weight bootstrap accepted strata/PSU/FPC via Rao-Wu rescaled bootstrap; the new paper-faithful refit bootstrap rejects all survey designs because Rao-Wu composed with Frank-Wolfe re-estimation requires its own derivation. The follow-up needs a **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in the loss and regularization (`Σ rw_i ω_i Y_i,pre` / `ζ² Σ rw_i ω_i²`), threaded through `compute_sdid_unit_weights` / `compute_time_weights`. Reusable scaffolding (`generate_rao_wu_weights`, split into `rw_control` / `rw_treated`, degenerate-retry, treated-mean weighting) is recoverable from the pre-rewrite `_bootstrap_se` body via `git show 91082e5:diff_diff/synthetic_did.py` (PR #351 "Replace SDID fixed-weight bootstrap with paper-faithful refit"). Compose-after-unweighted-FW does not work — silently reproduces the fixed-weight Rao-Wu behavior we removed. Validation: re-use the coverage MC harness with a stratified DGP, confirm near-nominal rejection rates against placebo-SE tracking. See REGISTRY.md §SyntheticDiD `Note (deferred survey + bootstrap composition)` for the sketch. | `synthetic_did.py::fit`, `synthetic_did.py::_bootstrap_se`, `utils.py::_sc_weight_fw` | follow-up | Medium |
107+
| **SDID + placebo/jackknife + strata/PSU/FPC** (capability gap remaining after PR #352). PR #352 restored survey-bootstrap support via weighted Frank-Wolfe + Rao-Wu composition; the same composition for `placebo` (which permutes control indices) and `jackknife` (which leaves out one unit at a time) requires its own derivations: placebo's allocator needs a weighted permutation distribution that respects PSU clustering; jackknife needs PSU-level LOO + stratum aggregation. Both reuse the weighted-FW kernel from PR #352 (`_sc_weight_fw(reg_weights=)`); the genuinely new work is the per-method allocator. Tracked but no concrete sketch yet — defer until user demand surfaces. | `synthetic_did.py::_placebo_variance_se`, `synthetic_did.py::_jackknife_se` | follow-up | Low |
108108
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
109109

110110
#### Performance

benchmarks/data/sdid_coverage.json

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
"n_bootstrap": 200,
55
"library_version": "3.2.0",
66
"backend": "rust",
7-
"generated_at": "2026-04-22T20:48:18.361220+00:00",
8-
"total_elapsed_sec": 2424.92,
7+
"generated_at": "2026-04-24T00:58:22.180577+00:00",
8+
"total_elapsed_sec": 2420.61,
99
"methods": [
1010
"placebo",
1111
"bootstrap",
@@ -20,7 +20,8 @@
2020
"dgps": {
2121
"balanced": "Balanced / exchangeable: N_co=20, N_tr=3, T_pre=8, T_post=4",
2222
"unbalanced": "Unbalanced: N_co=30, N_tr=8, heterogeneous unit-FE variance",
23-
"aer63": "Arkhangelsky et al. (2021) AER \u00a76.3: N=100, N1=20, T=120, T1=5, rank=2, \u03c3=2"
23+
"aer63": "Arkhangelsky et al. (2021) AER \u00a76.3: N=100, N1=20, T=120, T1=5, rank=2, \u03c3=2",
24+
"stratified_survey": "BRFSS-style: N=40, strata=2, PSU=2/stratum, psu_re_sd=1.5 (PR #352)"
2425
},
2526
"per_dgp": {
2627
"balanced": {
@@ -42,9 +43,9 @@
4243
"0.05": 0.078,
4344
"0.10": 0.116
4445
},
45-
"mean_se": 0.21962976414466187,
46+
"mean_se": 0.2195984748876297,
4647
"true_sd_tau_hat": 0.2093529148687405,
47-
"se_over_truesd": 1.0490886371578094
48+
"se_over_truesd": 1.0489391801652868
4849
},
4950
"jackknife": {
5051
"n_successful_fits": 500,
@@ -57,7 +58,7 @@
5758
"true_sd_tau_hat": 0.2093529148687405,
5859
"se_over_truesd": 1.0756639338270981
5960
},
60-
"_elapsed_sec": 78.62
61+
"_elapsed_sec": 71.24
6162
},
6263
"unbalanced": {
6364
"placebo": {
@@ -78,9 +79,9 @@
7879
"0.05": 0.038,
7980
"0.10": 0.08
8081
},
81-
"mean_se": 0.15072674925763238,
82+
"mean_se": 0.15070173940119225,
8283
"true_sd_tau_hat": 0.135562270427217,
83-
"se_over_truesd": 1.1118635648593473
84+
"se_over_truesd": 1.1116790750572711
8485
},
8586
"jackknife": {
8687
"n_successful_fits": 500,
@@ -93,7 +94,7 @@
9394
"true_sd_tau_hat": 0.135562270427217,
9495
"se_over_truesd": 0.990639682456852
9596
},
96-
"_elapsed_sec": 90.61
97+
"_elapsed_sec": 78.91
9798
},
9899
"aer63": {
99100
"placebo": {
@@ -114,9 +115,9 @@
114115
"0.05": 0.04,
115116
"0.10": 0.078
116117
},
117-
"mean_se": 0.28291769703671454,
118+
"mean_se": 0.28265726432861016,
118119
"true_sd_tau_hat": 0.2696262336703088,
119-
"se_over_truesd": 1.0492958833622181
120+
"se_over_truesd": 1.0483299806584672
120121
},
121122
"jackknife": {
122123
"n_successful_fits": 500,
@@ -129,7 +130,43 @@
129130
"true_sd_tau_hat": 0.2696262336703088,
130131
"se_over_truesd": 0.9015870263136688
131132
},
132-
"_elapsed_sec": 2255.69
133+
"_elapsed_sec": 2237.29
134+
},
135+
"stratified_survey": {
136+
"placebo": {
137+
"n_successful_fits": 0,
138+
"rejection_rate": {
139+
"0.01": NaN,
140+
"0.05": NaN,
141+
"0.10": NaN
142+
},
143+
"mean_se": NaN,
144+
"true_sd_tau_hat": NaN,
145+
"se_over_truesd": NaN
146+
},
147+
"bootstrap": {
148+
"n_successful_fits": 500,
149+
"rejection_rate": {
150+
"0.01": 0.014,
151+
"0.05": 0.042,
152+
"0.10": 0.088
153+
},
154+
"mean_se": 0.5689806467245018,
155+
"true_sd_tau_hat": 0.45569672831386343,
156+
"se_over_truesd": 1.2485949785722699
157+
},
158+
"jackknife": {
159+
"n_successful_fits": 0,
160+
"rejection_rate": {
161+
"0.01": NaN,
162+
"0.05": NaN,
163+
"0.10": NaN
164+
},
165+
"mean_se": NaN,
166+
"true_sd_tau_hat": NaN,
167+
"se_over_truesd": NaN
168+
},
169+
"_elapsed_sec": 33.17
133170
}
134171
}
135172
}

benchmarks/python/coverage_sdid.py

Lines changed: 95 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def _get_backend_from_args() -> str:
6464
from diff_diff import HAS_RUST_BACKEND, SyntheticDiD # noqa: E402
6565

6666
ALL_METHODS = ("placebo", "bootstrap", "jackknife")
67-
ALL_DGPS = ("balanced", "unbalanced", "aer63")
67+
ALL_DGPS = ("balanced", "unbalanced", "aer63", "stratified_survey")
6868
ALPHAS = (0.01, 0.05, 0.10)
6969

7070

@@ -74,6 +74,12 @@ class DGPSpec:
7474
description: str
7575
# generator returns (DataFrame, post_periods)
7676
generator: Callable[[int], Tuple[pd.DataFrame, List[int]]]
77+
# Optional factory: returns (SurveyDesign, methods_supported_set) given a
78+
# DataFrame. None for non-survey DGPs. The methods_supported set lets the
79+
# harness skip the methods that raise NotImplementedError on this design
80+
# (e.g., placebo/jackknife under strata/PSU/FPC). For non-survey DGPs all
81+
# methods are supported.
82+
survey_design_factory: Optional[Callable[[pd.DataFrame], Tuple[Any, Tuple[str, ...]]]] = None
7783

7884

7985
def _balanced_dgp(seed: int) -> Tuple[pd.DataFrame, List[int]]:
@@ -169,6 +175,53 @@ def _aer63_dgp(seed: int) -> Tuple[pd.DataFrame, List[int]]:
169175
return pd.DataFrame(rows), list(range(n_pre, n_pre + n_post))
170176

171177

178+
def _stratified_survey_dgp(seed: int) -> Tuple[pd.DataFrame, List[int]]:
179+
"""BRFSS/ACS-style stratified survey panel, null treatment (PR #352).
180+
181+
N=40 (10 per PSU × 4 PSUs across 2 strata), T=12 (6 pre, 6 post),
182+
moderate weight variation (Kish DEFF ≈ 1.4), psu_re_sd=1.5 (modest
183+
ICC). Each unit is a respondent with constant per-unit survey
184+
weight, stratum, and PSU columns. Used to validate the SDID
185+
survey-bootstrap calibration: the bootstrap row should land near
186+
nominal at α=0.05 (PR #352 §3c calibration gate, [0.02, 0.10]).
187+
"""
188+
from diff_diff.prep_dgp import generate_survey_did_data
189+
df = generate_survey_did_data(
190+
n_units=40,
191+
n_periods=12,
192+
cohort_periods=[7],
193+
never_treated_frac=0.2,
194+
treatment_effect=0.0, # null for coverage MC
195+
n_strata=2,
196+
psu_per_stratum=2,
197+
fpc_per_stratum=200.0,
198+
weight_variation="moderate",
199+
psu_re_sd=1.5,
200+
psu_period_factor=0.5,
201+
seed=seed,
202+
)
203+
# generate_survey_did_data emits per-observation 'treated' (post-only
204+
# for treated units); SDID requires a unit-level ever-treated indicator
205+
# (constant across time). Derive from 'first_treat' (cohort, 0 for
206+
# never-treated). Block-treatment cohort is 7 → post = 7..11.
207+
df = df.copy()
208+
df["treated"] = (df["first_treat"] > 0).astype(int)
209+
return df, list(range(7, 12))
210+
211+
212+
def _stratified_survey_design(df: pd.DataFrame) -> Tuple[Any, Tuple[str, ...]]:
213+
"""Build the SurveyDesign for the stratified_survey DGP.
214+
215+
Methods supported: bootstrap only — placebo / jackknife reject
216+
strata/PSU/FPC at fit-time (separate methodology gap).
217+
"""
218+
from diff_diff import SurveyDesign
219+
return (
220+
SurveyDesign(weights="weight", strata="stratum", psu="psu", fpc="fpc"),
221+
("bootstrap",),
222+
)
223+
224+
172225
DGPS: Dict[str, DGPSpec] = {
173226
"balanced": DGPSpec(
174227
"balanced",
@@ -185,6 +238,12 @@ def _aer63_dgp(seed: int) -> Tuple[pd.DataFrame, List[int]]:
185238
"Arkhangelsky et al. (2021) AER §6.3: N=100, N1=20, T=120, T1=5, rank=2, σ=2",
186239
_aer63_dgp,
187240
),
241+
"stratified_survey": DGPSpec(
242+
"stratified_survey",
243+
"BRFSS-style: N=40, strata=2, PSU=2/stratum, psu_re_sd=1.5 (PR #352)",
244+
_stratified_survey_dgp,
245+
survey_design_factory=_stratified_survey_design,
246+
),
188247
}
189248

190249

@@ -194,23 +253,33 @@ def _fit_one(
194253
post_periods: List[int],
195254
n_bootstrap: int,
196255
seed: int,
256+
survey_design: Optional[Any] = None,
197257
) -> Tuple[Optional[float], Optional[float], Optional[float]]:
198-
"""Fit SDID and return (att, se, p_value); (None, None, None) on failure."""
258+
"""Fit SDID and return (att, se, p_value); (None, None, None) on failure.
259+
260+
For survey DGPs the harness passes a SurveyDesign via ``survey_design``;
261+
fit() routes it through the bootstrap survey path (PR #352) when
262+
method=='bootstrap'. The DGP's ``survey_design_factory`` declares which
263+
methods are supported, so the caller skips unsupported methods entirely
264+
rather than catching the resulting NotImplementedError here.
265+
"""
199266
try:
200267
with warnings.catch_warnings():
201268
warnings.simplefilter("ignore")
202-
r = SyntheticDiD(
203-
variance_method=method,
204-
n_bootstrap=n_bootstrap,
205-
seed=seed,
206-
).fit(
207-
df,
269+
fit_kwargs = dict(
208270
outcome="outcome",
209271
treatment="treated",
210272
unit="unit",
211273
time="period",
212274
post_periods=post_periods,
213275
)
276+
if survey_design is not None:
277+
fit_kwargs["survey_design"] = survey_design
278+
r = SyntheticDiD(
279+
variance_method=method,
280+
n_bootstrap=n_bootstrap,
281+
seed=seed,
282+
).fit(df, **fit_kwargs)
214283
att = float(r.att) if np.isfinite(r.att) else None
215284
se = float(r.se) if np.isfinite(r.se) else None
216285
p_value = float(r.p_value) if np.isfinite(r.p_value) else None
@@ -258,7 +327,14 @@ def _run_dgp(
258327
n_bootstrap: int,
259328
methods: Tuple[str, ...],
260329
) -> Dict[str, Any]:
261-
"""Run all methods × n_seeds for one DGP. Returns summary dict."""
330+
"""Run all methods × n_seeds for one DGP. Returns summary dict.
331+
332+
For survey DGPs (``spec.survey_design_factory is not None``) the harness
333+
constructs the SurveyDesign once per seed (it depends only on the column
334+
names, not the DataFrame contents) and skips methods not in
335+
``supported_methods`` — those rows in the artifact have
336+
``n_successful_fits=0``.
337+
"""
262338
print(f"\n=== DGP: {name} ({spec.description}) ===", flush=True)
263339

264340
# Preallocate per-method arrays
@@ -269,8 +345,17 @@ def _run_dgp(
269345
start = time.time()
270346
for seed in range(n_seeds):
271347
df, post = spec.generator(seed)
348+
if spec.survey_design_factory is not None:
349+
survey_design, supported_methods = spec.survey_design_factory(df)
350+
else:
351+
survey_design = None
352+
supported_methods = methods
272353
for method in methods:
273-
att, se, p = _fit_one(method, df, post, n_bootstrap, seed)
354+
if method not in supported_methods:
355+
# Method-specific guard fires (e.g., placebo + strata).
356+
# Leave NaN; the summary will report n_successful_fits=0.
357+
continue
358+
att, se, p = _fit_one(method, df, post, n_bootstrap, seed, survey_design)
274359
if att is not None:
275360
atts[method][seed] = att
276361
if se is not None:

diff_diff/guides/llms-full.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1673,7 +1673,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F'
16731673
**Key features:**
16741674
- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
16751675
- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH)
1676-
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, TROP). SyntheticDiD bootstrap is non-survey only: the paper-faithful refit path re-estimates weights via Frank-Wolfe per draw, and Rao-Wu + refit composition is not yet implemented (tracked in TODO.md)
1676+
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #352): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo and jackknife paths still reject strata/PSU/FPC (separate methodology gap, tracked in TODO.md)
16771677
- DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`)
16781678
- Repeated cross-sections: `CallawaySantAnna(panel=False)`
16791679
- Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section

0 commit comments

Comments
 (0)