Skip to content

Commit 2bf3f93

Browse files
igerberclaude
andcommitted
Address PR #355 R3 P3: clarify hybrid bootstrap docs + pin boot_idx slice
Two P3s from R3; PR was already ✅ Looks good — these are close-out polish. P3 docs/tests — secondary surfaces described the full-design path as "Rao-Wu rescaled bootstrap" but only REGISTRY.md surfaced the material caveat that SDID still uses unit-level pairs-bootstrap resampling (``boot_idx = rng.choice(n_total)``) and then applies Rao-Wu rescaled weights on top — a hybrid composition, not a standalone Rao-Wu bootstrap. Update survey-theory.md (splits SunAbraham/TROP's standalone Rao-Wu bullet from SDID's hybrid bullet) and CHANGELOG.md's PR #352 Added entry to use the hybrid-composition wording mirroring REGISTRY. P3 tests — the methodology-critical ``boot_idx`` × ``generate_rao_wu_weights`` interaction was only guarded by the slow coverage MC. Add ``test_bootstrap_full_design_rao_wu_boot_idx_slice`` (in ``TestBootstrapSE``) which monkeypatches ``generate_rao_wu_weights`` to return a known vector of distinct per-unit values (``arange(1, n_total+1)``), captures the ``rw_control_draw`` vectors fed into the weighted FW helper via a capturing wrapper on ``compute_sdid_unit_weights_survey``, and asserts every captured vector lies within ``known_rw[:n_control]`` (positions 1..n_control). This catches two bug classes: - slice-order regression: if someone swaps rw-then-slice for slice-then-rw, the captured vectors would include values from the treated slice ``known_rw[n_control:]`` and the assertion fires. - rw-drift regression: if the Rao-Wu call site bypasses ``generate_rao_wu_weights`` (e.g., a refactor silently uses the pweight-only branch for full-design fits), the captured vector would be the user's w_control (all 1.0 in this test) instead of the known Rao-Wu output. Verified: 294 targeted tests pass across test_methodology_sdid / test_survey_phase5 / test_weighted_fw / test_guides / test_rust_backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2b2e8f0 commit 2bf3f93

3 files changed

Lines changed: 97 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2121
- **SyntheticDiD bootstrap no longer supports survey designs** (capability regression in PR #351, **restored in PR #352** — see Added/Changed entries directly below). The removed fixed-weight bootstrap path was the only SDID variance method that supported strata/PSU/FPC (via Rao-Wu rescaled bootstrap); the PR #351 paper-faithful refit bootstrap initially rejected all survey designs (including pweight-only) with `NotImplementedError`. PR #352 restores the capability via a weighted-FW + Rao-Wu composition; the lock-out window applies only to the v3.2.x line that ships PR #351 alone (without PR #352). Composing Rao-Wu rescaled weights with Frank-Wolfe re-estimation: see `docs/methodology/REGISTRY.md` §SyntheticDiD `Note (survey + bootstrap composition)`.
2222

2323
### Added (PR #352)
24-
- **SDID `variance_method="bootstrap"` survey support restored** via weighted Frank-Wolfe + Rao-Wu rescaling. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` Rao-Wu branch composes Rao-Wu rescaled weights per draw (or constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion).
24+
- **SDID `variance_method="bootstrap"` survey support restored** via a hybrid pairs-bootstrap + Rao-Wu rescaling composed with a weighted Frank-Wolfe kernel. Each bootstrap draw first performs the unit-level pairs-bootstrap resampling specified by Arkhangelsky et al. (2021) Algorithm 2 (`boot_idx = rng.choice(n_total)`), and *then* applies Rao-Wu rescaled per-unit weights (Rao & Wu 1988) sliced over the resampled units — NOT a standalone Rao-Wu bootstrap. New Rust kernel `sc_weight_fw_weighted` (and `_with_convergence` sibling) accepts a per-coordinate `reg_weights` argument so the FW objective becomes `min ||A·ω - b||² + ζ²·Σ_j reg_w[j]·ω[j]²`. New Python helpers `compute_sdid_unit_weights_survey` and `compute_time_weights_survey` thread per-control survey weights through the two-pass sparsify-refit dispatcher (column-scaling Y by `rw` for the loss, `reg_weights=rw` for the penalty on the unit-weights side; weighted column-centering + row-scaling Y by `sqrt(rw)` for the loss with uniform reg on the time-weights side). `_bootstrap_se` survey branch composes the per-draw `rw` (Rao-Wu rescaling for full designs, constant `w_control` for pweight-only fits) with the weighted-FW helpers, then composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Coverage MC artifact extended with a `stratified_survey` DGP (BRFSS-style: N=40, strata=2, PSU=2/stratum); the bootstrap row's near-nominal calibration is the validation gate (target rejection ∈ [0.02, 0.10] at α=0.05). New regression tests across `test_methodology_sdid.py::TestBootstrapSE` (single-PSU short-circuit, full-design and pweight-only succeeds-tests, zero-treated-mass retry, deterministic Rao-Wu × boot_idx slice) and `test_survey_phase5.py::TestSyntheticDiDSurvey` (full-design ↔ pweight-only SE differs assertion). See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)`` for the full objective and the argmin-set caveat.
2525

2626
### Changed (PR #352)
2727
- **SDID bootstrap SE values under survey fits now differ numerically from the v3.2.x line that shipped PR #351 alone**: the fit no longer raises `NotImplementedError`, and instead returns the weighted-FW + Rao-Wu SE. Non-survey fits are unaffected (the bootstrap dispatcher routes only the survey branch through the new `_survey` helpers; non-survey fits continue to call the existing `compute_sdid_unit_weights` / `compute_time_weights` and stay bit-identical at rel=1e-14 on the `_BASELINE["bootstrap"]` regression). SDID's `placebo` and `jackknife` paths still reject `strata/PSU/FPC` (separate methodology gap; tracked in TODO.md as a follow-up PR).

docs/methodology/survey-theory.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -722,18 +722,23 @@ Two bootstrap strategies interact with survey designs:
722722
Generates multiplier weights at the PSU level within strata, with FPC
723723
scaling. Each bootstrap draw reweights the IF values.
724724

725-
- **Rao-Wu rescaled bootstrap** (SunAbraham, SyntheticDiD, TROP): Draws PSUs
725+
- **Rao-Wu rescaled bootstrap** (SunAbraham, TROP): Draws PSUs
726726
with replacement within strata and rescales observation weights. Each draw
727-
re-runs the full estimator on the resampled data. *SyntheticDiD composes
728-
the Rao-Wu rescaled per-draw weights with the* **weighted Frank-Wolfe**
729-
*kernel (PR #352)*: each draw solves
730-
``min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²`` and composes
731-
``ω_eff = rw·ω / Σ(rw·ω)`` for the SDID estimator. See REGISTRY.md
732-
§SyntheticDiD ``Note (survey + bootstrap composition)`` for the full
733-
objective and the argmin-set caveat. SDID's `placebo` and `jackknife`
734-
methods still reject strata/PSU/FPC (the placebo permutation allocator
735-
and jackknife LOO mass need their own weighted derivations; tracked in
736-
TODO.md as a follow-up).
727+
re-runs the full estimator on the resampled data.
728+
- **Hybrid pairs-bootstrap + Rao-Wu rescaling** (SyntheticDiD, PR #352):
729+
SDID's full-design bootstrap is NOT a standalone Rao-Wu bootstrap. Each
730+
draw first performs the unit-level pairs-bootstrap resampling that
731+
Arkhangelsky et al. (2021) Algorithm 2 specifies (``boot_idx = rng.choice(n_total)``),
732+
and *then* applies the Rao-Wu rescaled per-unit weights sliced over the
733+
resampled units (``rw_control = rao_wu_rw[:n_control][boot_idx_control]``).
734+
The weighted-Frank-Wolfe kernel then solves
735+
``min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²`` on the resampled panel,
736+
and ``ω_eff = rw·ω / Σ(rw·ω)`` is composed for the SDID estimator.
737+
See REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap composition)``
738+
for the full objective and the argmin-set caveat. SDID's `placebo` and
739+
`jackknife` methods still reject strata/PSU/FPC (the placebo permutation
740+
allocator and jackknife LOO mass need their own weighted derivations;
741+
tracked in TODO.md as a follow-up).
737742

738743
---
739744

tests/test_methodology_sdid.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -740,6 +740,86 @@ def test_bootstrap_full_design_without_explicit_weights(self):
740740
assert result.survey_metadata.n_strata is not None
741741
assert result.survey_metadata.n_psu is not None
742742

743+
def test_bootstrap_full_design_rao_wu_boot_idx_slice(self, monkeypatch):
744+
"""Full-design bootstrap slices Rao-Wu weights by ``boot_idx``.
745+
746+
Documented in REGISTRY.md §SyntheticDiD ``Note (survey + bootstrap
747+
composition)``: the hybrid path first performs unit-level pairs-
748+
bootstrap (``boot_idx = rng.choice(n_total)``) and THEN slices
749+
the Rao-Wu rescaled weights over the resampled units. Monkeypatch
750+
``generate_rao_wu_weights`` to return a known vector and capture
751+
the ``rw_control`` fed into ``compute_sdid_unit_weights_survey``;
752+
assert the captured vector matches the expected slice
753+
``known_rw[:n_control][boot_idx[boot_is_control]]``.
754+
755+
Regression against a subtle class of bug where either the slice
756+
index arithmetic or the Rao-Wu call site could drift (e.g.,
757+
someone refactors ``resolved_survey_unit`` indexing to skip the
758+
boot_idx slicing, or the rw-then-slice order gets swapped to
759+
slice-then-rw). Both would silently produce wrong bootstrap SE.
760+
"""
761+
from diff_diff import utils as dd_utils
762+
from diff_diff import synthetic_did as sdid_mod
763+
from diff_diff.survey import SurveyDesign
764+
765+
df = _make_panel(n_control=15, n_treated=3, seed=42)
766+
df["wt"] = 1.0
767+
df["stratum"] = df["unit"] % 2
768+
df["psu"] = df["unit"]
769+
770+
# Known Rao-Wu weight vector. Length = n_total = 18; distinct
771+
# values per unit so a slice of the first n_control=15 positions
772+
# by boot_idx[boot_is_control] is identifiable.
773+
n_total = 18
774+
known_rw = np.arange(1, n_total + 1, dtype=np.float64)
775+
776+
def fake_rao_wu(resolved_survey, rng):
777+
return known_rw.copy()
778+
779+
monkeypatch.setattr(sdid_mod, "generate_rao_wu_weights", fake_rao_wu)
780+
781+
captured: list = []
782+
783+
real_helper = dd_utils.compute_sdid_unit_weights_survey
784+
785+
def capturing_helper(Y_pre_c, Y_pre_t_mean, rw, *args, **kwargs):
786+
captured.append(np.array(rw, copy=True))
787+
return real_helper(Y_pre_c, Y_pre_t_mean, rw, *args, **kwargs)
788+
789+
monkeypatch.setattr(
790+
sdid_mod, "compute_sdid_unit_weights_survey", capturing_helper
791+
)
792+
793+
SyntheticDiD(variance_method="bootstrap", n_bootstrap=10, seed=1).fit(
794+
df, outcome="outcome", treatment="treated",
795+
unit="unit", time="period",
796+
post_periods=[5, 6, 7],
797+
survey_design=SurveyDesign(weights="wt", strata="stratum", psu="psu"),
798+
)
799+
800+
# For each captured rw vector: its values must all come from the
801+
# first n_control=15 positions of known_rw (never from the
802+
# treated slice [15:18]). Values may repeat across the vector
803+
# (bootstrap picks with replacement) but every element must be
804+
# ≤ n_control (positions 1..15, since we built known_rw as
805+
# arange(1, 19)). Catches either a slice-order bug (would mix in
806+
# treated-slice values 16..18) or a rw-drift bug (would produce
807+
# values outside [1, 15]).
808+
assert len(captured) >= 1, "no FW calls captured — survey dispatch broken"
809+
n_control = 15
810+
control_slice_max = float(known_rw[:n_control].max()) # = 15.0
811+
for i, rw_captured in enumerate(captured):
812+
assert rw_captured.shape[0] > 0, f"draw {i}: empty rw"
813+
assert rw_captured.max() <= control_slice_max, (
814+
f"draw {i}: captured rw max = {rw_captured.max()} exceeds "
815+
f"control-slice max ({control_slice_max}); slice order "
816+
"regressed — Rao-Wu weights mixed with treated slice."
817+
)
818+
assert rw_captured.min() >= 1.0, (
819+
f"draw {i}: captured rw min = {rw_captured.min()} below "
820+
"known_rw[0]=1; weights drifted outside the Rao-Wu output."
821+
)
822+
743823
def test_bootstrap_single_psu_returns_nan(self):
744824
"""Unstratified single-PSU survey design returns NaN SE (PR #352).
745825

0 commit comments

Comments
 (0)