igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎TODO.md‎
Lines changed: 1 addition & 1 deletion b/‎TODO.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎benchmarks/data/sdid_coverage.json‎
Lines changed: 49 additions & 12 deletions b/‎benchmarks/data/sdid_coverage.json‎
Lines changed: 49 additions & 12 deletions
@@ -23,7 +23,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 - **`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path.
-- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the new paper-faithful refit bootstrap rejects all survey designs (including pweight-only) with `NotImplementedError`. Pweight-only users can switch to `variance_method="placebo"` or `"jackknife"`. Strata/PSU/FPC users have no SDID variance option on this release. Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation requires a separate derivation (weighted FW solver); sketch and reusable scaffolding pointers are in `docs/methodology/REGISTRY.md` §SyntheticDiD and `TODO.md`.
+- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression in PR #351, **restored in PR #352** — see Added/Changed entries directly below). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the PR #351 paper-faithful refit bootstrap initially rejected all survey designs (including pweight-only) with `NotImplementedError`. PR #352 restores the capability via a weighted-FW + Rao-Wu composition; the lock-out window applies only to the v3.2.x line that ships PR #351 alone (without PR #352). Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation: see `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap composition)`.
+
+### Added (PR #352)
+- **SDID `variance_method="bootstrap"` survey support restored** via a hybrid pairs-bootstrap + Rao-Wu rescaling composed with a weighted Frank-Wolfe kernel. Each bootstrap draw first performs the unit-level pairs-bootstrap resampling specified by Arkhangelsky et al. (2021) Algorithm 2 (`boot_idx = rng.choice(n_total)`), and *then* applies Rao-Wu rescaled per-unit weights (Rao & Wu 1988) sliced over the resampled units — NOT a standalone Rao-Wu bootstrap. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; weighted column-centering + row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` survey branch composes the per-draw `rw` (Rao-Wu rescaling for full designs, constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests, zero-treated-mass retry, deterministic Rao-Wu × boot_idx slice) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion). See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` for the full objective and the argmin-set caveat.
+
+### Changed (PR #352)
+- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR).
 
 ## [3.2.0] - 2026-04-19
 
 
@@ -104,7 +104,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
 | `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
 | `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
-| **SDID + survey designs** (capability regression in this release; both pweight-only AND strata/PSU/FPC). The previous release's fixed-weight bootstrap accepted strata/PSU/FPC via Rao-Wu rescaled bootstrap; the new paper-faithful refit bootstrap rejects all survey designs because Rao-Wu composed with Frank-Wolfe re-estimation requires its own derivation. The follow-up needs a **weighted Frank-Wolfe** variant of `_sc_weight_fw` accepting per-unit weights in the loss and regularization (`Σ rw_i ω_i Y_i,pre` / `ζ² Σ rw_i ω_i²`), threaded through `compute_sdid_unit_weights` / `compute_time_weights`. Reusable scaffolding (`generate_rao_wu_weights`, split into `rw_control` / `rw_treated`, degenerate-retry, treated-mean weighting) is recoverable from the pre-rewrite `_bootstrap_se` body via `git show 91082e5:diff_diff/synthetic_did.py` (PR #351 "Replace SDID fixed-weight bootstrap with paper-faithful refit"). Compose-after-unweighted-FW does not work — silently reproduces the fixed-weight Rao-Wu behavior we removed. Validation: re-use the coverage MC harness with a stratified DGP, confirm near-nominal rejection rates against placebo-SE tracking. See REGISTRY.md §SyntheticDiD `Note (deferred survey + bootstrap composition)` for the sketch. | `synthetic_did.py::fit`, `synthetic_did.py::_bootstrap_se`, `utils.py::_sc_weight_fw` | follow-up | Medium |
+| **SDID + placebo/jackknife + strata/PSU/FPC** (capability gap remaining after PR #352). PR #352 restored survey-bootstrap support via weighted Frank-Wolfe + Rao-Wu composition; the same composition for `placebo` (which permutes control indices) and `jackknife` (which leaves out one unit at a time) requires its own derivations: placebo's allocator needs a weighted permutation distribution that respects PSU clustering; jackknife needs PSU-level LOO + stratum aggregation. Both reuse the weighted-FW kernel from PR #352 (`_sc_weight_fw(reg_weights=)`); the genuinely new work is the per-method allocator. Tracked but no concrete sketch yet — defer until user demand surfaces. | `synthetic_did.py::_placebo_variance_se`, `synthetic_did.py::_jackknife_se` | follow-up | Low |
 | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
 
 #### Performance
 
@@ -4,8 +4,8 @@
     "n_bootstrap": 200,
     "library_version": "3.2.0",
     "backend": "rust",
-    "generated_at": "2026-04-22T20:48:18.361220+00:00",
-    "total_elapsed_sec": 2424.92,
+    "generated_at": "2026-04-24T13:01:54.876774+00:00",
+    "total_elapsed_sec": 2420.61,
     "methods": [
       "placebo",
       "bootstrap",
@@ -20,7 +20,8 @@
   "dgps": {
     "balanced": "Balanced / exchangeable: N_co=20, N_tr=3, T_pre=8, T_post=4",
     "unbalanced": "Unbalanced: N_co=30, N_tr=8, heterogeneous unit-FE variance",
-    "aer63": "Arkhangelsky et al. (2021) AER \u00a76.3: N=100, N1=20, T=120, T1=5, rank=2, \u03c3=2"
+    "aer63": "Arkhangelsky et al. (2021) AER \u00a76.3: N=100, N1=20, T=120, T1=5, rank=2, \u03c3=2",
+    "stratified_survey": "BRFSS-style: N=40, strata=2, PSU=2/stratum, psu_re_sd=1.5 (PR #352)"
   },
   "per_dgp": {
     "balanced": {
@@ -42,9 +43,9 @@
           "0.05": 0.078,
           "0.10": 0.116
         },
-        "mean_se": 0.21962976414466187,
+        "mean_se": 0.2195984748876297,
         "true_sd_tau_hat": 0.2093529148687405,
-        "se_over_truesd": 1.0490886371578094
+        "se_over_truesd": 1.0489391801652868
       },
       "jackknife": {
         "n_successful_fits": 500,
@@ -57,7 +58,7 @@
         "true_sd_tau_hat": 0.2093529148687405,
         "se_over_truesd": 1.0756639338270981
       },
-      "_elapsed_sec": 78.62
+      "_elapsed_sec": 71.24
     },
     "unbalanced": {
       "placebo": {
@@ -78,9 +79,9 @@
           "0.05": 0.038,
           "0.10": 0.08
         },
-        "mean_se": 0.15072674925763238,
+        "mean_se": 0.15070173940119225,
         "true_sd_tau_hat": 0.135562270427217,
-        "se_over_truesd": 1.1118635648593473
+        "se_over_truesd": 1.1116790750572711
       },
       "jackknife": {
         "n_successful_fits": 500,
@@ -93,7 +94,7 @@
         "true_sd_tau_hat": 0.135562270427217,
         "se_over_truesd": 0.990639682456852
       },
-      "_elapsed_sec": 90.61
+      "_elapsed_sec": 78.91
     },
     "aer63": {
       "placebo": {
@@ -114,9 +115,9 @@
           "0.05": 0.04,
           "0.10": 0.078
         },
-        "mean_se": 0.28291769703671454,
+        "mean_se": 0.28265726432861016,
         "true_sd_tau_hat": 0.2696262336703088,
-        "se_over_truesd": 1.0492958833622181
+        "se_over_truesd": 1.0483299806584672
       },
       "jackknife": {
         "n_successful_fits": 500,
@@ -129,7 +130,43 @@
         "true_sd_tau_hat": 0.2696262336703088,
         "se_over_truesd": 0.9015870263136688
       },
-      "_elapsed_sec": 2255.69
+      "_elapsed_sec": 2237.29
+    },
+    "stratified_survey": {
+      "placebo": {
+        "n_successful_fits": 0,
+        "rejection_rate": {
+          "0.01": null,
+          "0.05": null,
+          "0.10": null
+        },
+        "mean_se": null,
+        "true_sd_tau_hat": null,
+        "se_over_truesd": null
+      },
+      "bootstrap": {
+        "n_successful_fits": 500,
+        "rejection_rate": {
+          "0.01": 0.024,
+          "0.05": 0.058,
+          "0.10": 0.094
+        },
+        "mean_se": 0.5097482138251239,
+        "true_sd_tau_hat": 0.4512243070193919,
+        "se_over_truesd": 1.1297002530566618
+      },
+      "jackknife": {
+        "n_successful_fits": 0,
+        "rejection_rate": {
+          "0.01": null,
+          "0.05": null,
+          "0.10": null
+        },
+        "mean_se": null,
+        "true_sd_tau_hat": null,
+        "se_over_truesd": null
+      },
+      "_elapsed_sec": 16.48
     }
   }
 }