Skip to content

Commit dc2045f

Browse files
igerberclaude
andcommitted
Address PR #351 R9 P3: unify SDID bootstrap slowdown wording
Single actionable P3 from R9 CI review: user-facing runtime wording for refit bootstrap had diverged across surfaces, giving conflicting expectations about the cost of the new bootstrap path: - CHANGELOG.md and diff_diff/synthetic_did.py said ~5-30x slower. - diff_diff/power.py said ~10-100x slower (two sites). - docs/choosing_estimator.rst said ~10-100x slower. - docs/performance-scenarios.md said ~10-100x slower. - docs/methodology/REGISTRY.md coverage-MC block said ~10-100x slower. - docs/tutorials/03_synthetic_did.ipynb and docs/tutorials/18_geo_experiments.ipynb said ~10-100x slower. - benchmarks/python/coverage_sdid.py said the 500-seed MC run takes ~2-4 hours, while REGISTRY.md said ~15-40 min (the actually-observed wall-clock; aer63 is ~37 min, balanced + unbalanced ~2 min combined). Unify on "~5-30x slower than placebo (panel-size dependent)" for the per-fit slowdown (the warm-start plumbing closed the gap vs the pre- warm-start cold-start estimate of 10-100x) and on "~15-40 min" for the coverage MC wall-clock. The CHANGELOG entry already notes the 10-100x figure as a historical "prior estimate" — left as-is so the release notes continue to explain the revision. Also fix two tutorial surfaces that still called placebo "R's default" (tutorial 03, sections 7 and 10). R's default is bootstrap; placebo is the library default per the REGISTRY Note added in 710f966. Reword to describe placebo as the library default with the rationale pointer. Verified: 353 tests pass across test_methodology_sdid, test_power, test_guides (UTF-8 fingerprint preserved). Tutorial-18 nbmake drift guards unaffected because the change is markdown-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 710f966 commit dc2045f

7 files changed

Lines changed: 14 additions & 21 deletions

File tree

benchmarks/python/coverage_sdid.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@
1111
``docs/methodology/REGISTRY.md`` §SyntheticDiD.
1212
1313
Usage:
14-
# Full run (~2–4 hours, AER §6.3 refit is the long tail)
14+
# Full run (~15–40 min on M-series Mac with Rust backend; AER §6.3 refit
15+
# is the long tail at ~37 min. Matches the wall-clock wording in
16+
# REGISTRY.md §SyntheticDiD coverage MC note.)
1517
python benchmarks/python/coverage_sdid.py \\
1618
--n-seeds 500 --n-bootstrap 200 \\
1719
--output benchmarks/data/sdid_coverage.json

diff_diff/power.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -718,7 +718,7 @@ def _check_sdid_placebo_data(
718718
f"n_treated={n_treated}. Either adjust your data_generator so that "
719719
f"n_control > n_treated, or use "
720720
f"SyntheticDiD(variance_method='bootstrap') (paper-faithful refit; "
721-
f"~10-100x slower than placebo) or SyntheticDiD(variance_method='jackknife')."
721+
f"~5-30x slower than placebo) or SyntheticDiD(variance_method='jackknife')."
722722
)
723723

724724

@@ -2047,7 +2047,7 @@ def simulate_power(
20472047
f"n_treated={effective_n_treated}). Either lower "
20482048
f"treatment_fraction so that n_control > n_treated, or use "
20492049
f"SyntheticDiD(variance_method='bootstrap') (paper-faithful refit; "
2050-
f"~10-100x slower than placebo) or "
2050+
f"~5-30x slower than placebo) or "
20512051
f"SyntheticDiD(variance_method='jackknife')."
20522052
)
20532053

docs/choosing_estimator.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -611,7 +611,7 @@ differences helps interpret results and choose appropriate inference.
611611
- Uses influence-function SEs with WIF adjustment by default. Set ``n_bootstrap=999`` for multiplier bootstrap inference (weight types: ``rademacher``, ``mammen``, ``webb``).
612612
* - ``SyntheticDiD``
613613
- Placebo, paper-faithful refit bootstrap, or jackknife
614-
- Default uses placebo-based variance (``variance_method="placebo"``). Set ``variance_method="bootstrap"`` for paper-faithful Algorithm 2 bootstrap (re-estimates ω and λ via Frank-Wolfe per draw; ~10–100× slower than placebo). Both methods use ``n_bootstrap`` replications (default 200). ``variance_method="jackknife"`` is also available.
614+
- Default uses placebo-based variance (``variance_method="placebo"``). Set ``variance_method="bootstrap"`` for paper-faithful Algorithm 2 bootstrap (re-estimates ω and λ via Frank-Wolfe per draw; ~5–30× slower than placebo, panel-size dependent). Both methods use ``n_bootstrap`` replications (default 200). ``variance_method="jackknife"`` is also available.
615615
* - ``ContinuousDiD``
616616
- Analytical (influence function)
617617
- Uses influence-function-based SEs by default. Use ``n_bootstrap=199`` (or higher) for multiplier bootstrap inference with proper CIs.

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1505,7 +1505,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
15051505

15061506
R-parity rationale: `synthdid_estimate()` (synthdid.R) stores `update.omega = TRUE` in `attr(estimate, "opts")`, and `vcov.R::bootstrap_sample` rebinds those `opts` inside its `do.call` back into `synthdid_estimate`, so the renormalized ω passed via `weights$omega` is used as Frank-Wolfe initialization (the `sum_normalize` helper in R's source explicitly says so). The Python path threads the same warm-start via `compute_sdid_unit_weights(..., init_weights=...)` and `compute_time_weights(..., init_weights=...)`. The FW objective is strictly convex on the simplex (quadratic loss + ζ² ridge on simplex), so warm- and cold-start converge to the same global minimum given enough iterations; warm-start matters in practice because the 100-iter first pass then sparsification is path-dependent on draws where the pre-sparsify budget is tight. Cross-language SE parity at bit tolerance is not claimed — different BLAS / RNG paths — but the procedure matches R's default bootstrap shape at the algorithm level, and Python-only bit-identity on non-survey data is asserted via `TestScaleEquivariance::test_baseline_parity_small_scale[bootstrap]` at `rel=1e-14`.
15071507

1508-
Expected wall-clock ~10–100× slower per fit than a fixed-weight shortcut would be (panel-size dependent; Frank-Wolfe second-pass can hit its 10K-iter cap on larger panels). Per-draw Frank-Wolfe non-convergence UserWarnings are suppressed inside the loop and aggregated into a single summary warning emitted after the loop when the share of valid bootstrap draws with any non-convergence event (counted once per draw — each draw runs Frank-Wolfe once for ω and once for λ, and any of those calls firing a non-convergence warning trips the draw) exceeds 5% of `n_successful`. Composed with any survey design (including pweight-only) this path raises `NotImplementedError` in the current release — see the survey-regression Note below for scope and the deferred-composition sketch.
1508+
Expected wall-clock ~5–30× slower per fit than placebo (panel-size dependent; Frank-Wolfe second-pass can hit its 10K-iter cap on larger panels; warm-start plumbing closes the gap vs cold-start, which would be closer to 10–30× on these DGPs). Per-draw Frank-Wolfe non-convergence UserWarnings are suppressed inside the loop and aggregated into a single summary warning emitted after the loop when the share of valid bootstrap draws with any non-convergence event (counted once per draw — each draw runs Frank-Wolfe once for ω and once for λ, and any of those calls firing a non-convergence warning trips the draw) exceeds 5% of `n_successful`. Composed with any survey design (including pweight-only) this path raises `NotImplementedError` in the current release — see the survey-regression Note below for scope and the deferred-composition sketch.
15091509

15101510
- Alternative: Jackknife variance (matching R's `synthdid::vcov(method="jackknife")`)
15111511
Implements Algorithm 3 from Arkhangelsky et al. (2021):

docs/performance-scenarios.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -235,8 +235,9 @@ serves a different purpose: R-parity accuracy). They complement it.
235235
SyntheticDiD(variance_method="jackknife", n_bootstrap=0).fit(...)
236236
# then also variance_method="bootstrap", n_bootstrap=200 for comparison
237237
# NOTE: bootstrap is now paper-faithful refit (re-estimates ω and λ via
238-
# Frank-Wolfe per draw); ~10–100× slower than placebo or the previous
239-
# release's fixed-weight bootstrap. Plan accordingly when timing.
238+
# Frank-Wolfe per draw); ~5–30× slower than placebo (panel-size dependent;
239+
# warm-start plumbing reduces the slowdown relative to cold-start). Plan
240+
# accordingly when timing.
240241
```
241242
- **Operation chain.** (1) SDiD fit with `variance_method="jackknife"` -
242243
exercises the leave-one-out refit loop; (2) SDiD fit with

docs/tutorials/03_synthetic_did.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@
398398
{
399399
"cell_type": "markdown",
400400
"metadata": {},
401-
"source": "## 7. Inference Methods\n\nSDID supports three inference methods:\n\n1. **Placebo** (`variance_method=\"placebo\"`, default): Placebo-based variance using Algorithm 4 from Arkhangelsky et al. (2021). This matches R's default.\n2. **Bootstrap** (`variance_method=\"bootstrap\"`): Paper-faithful pairs bootstrap (Algorithm 2 step 2) — re-estimates ω and λ via Frank-Wolfe on each draw. Also matches R's default `synthdid::vcov(method=\"bootstrap\")` behavior. Expect ~10–100× slower per fit than placebo.\n3. **Jackknife** (`variance_method=\"jackknife\"`): Algorithm 3 — fixed-weight leave-one-out. Deterministic; no bootstrap replications."
401+
"source": "## 7. Inference Methods\n\nSDID supports three inference methods:\n\n1. **Placebo** (`variance_method=\"placebo\"`, default): Placebo-based variance using Algorithm 4 from Arkhangelsky et al. (2021). Library default (R's default is bootstrap — we deviate because placebo is unconditionally available on pweight-only survey designs and sidesteps the refit bootstrap slowdown).\n2. **Bootstrap** (`variance_method=\"bootstrap\"`): Paper-faithful pairs bootstrap (Algorithm 2 step 2) — re-estimates ω and λ via Frank-Wolfe on each draw. Matches R's default `synthdid::vcov(method=\"bootstrap\")` behavior. Expect ~5–30× slower per fit than placebo (panel-size dependent).\n3. **Jackknife** (`variance_method=\"jackknife\"`): Algorithm 3 — fixed-weight leave-one-out. Deterministic; no bootstrap replications."
402402
},
403403
{
404404
"cell_type": "code",
@@ -599,7 +599,7 @@
599599
{
600600
"cell_type": "markdown",
601601
"metadata": {},
602-
"source": "## Summary\n\nKey takeaways for Synthetic DiD:\n\n1. **Best use cases**: Few treated units, many controls, long pre-period\n2. **Unit weights**: Identify which controls are most similar to treated (Frank-Wolfe with sparsification)\n3. **Time weights**: Determine which pre-periods are most informative (Frank-Wolfe on collapsed form)\n4. **Pre-treatment fit**: Lower RMSE indicates better synthetic match\n5. **Inference options**:\n - Placebo (`variance_method=\"placebo\"`, default): Placebo-based variance from controls (matches R)\n - Bootstrap (`variance_method=\"bootstrap\"`): Paper-faithful pairs bootstrap re-estimating ω and λ via Frank-Wolfe per draw (Algorithm 2 step 2; matches R's default `vcov`). ~10–100× slower than placebo.\n - Jackknife (`variance_method=\"jackknife\"`): Algorithm 3 — fixed-weight leave-one-out.\n6. **Regularization**: Auto-computed from data noise level by default. Override with `zeta_omega`/`zeta_lambda`.\n\nReference:\n- Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12), 4088-4118."
602+
"source": "## Summary\n\nKey takeaways for Synthetic DiD:\n\n1. **Best use cases**: Few treated units, many controls, long pre-period\n2. **Unit weights**: Identify which controls are most similar to treated (Frank-Wolfe with sparsification)\n3. **Time weights**: Determine which pre-periods are most informative (Frank-Wolfe on collapsed form)\n4. **Pre-treatment fit**: Lower RMSE indicates better synthetic match\n5. **Inference options**:\n - Placebo (`variance_method=\"placebo\"`, default): Placebo-based variance from controls. Library default (R's default is bootstrap; we deviate for survey availability + perf).\n - Bootstrap (`variance_method=\"bootstrap\"`): Paper-faithful pairs bootstrap re-estimating ω and λ via Frank-Wolfe per draw (Algorithm 2 step 2; matches R's default `vcov`). ~5–30× slower than placebo.\n - Jackknife (`variance_method=\"jackknife\"`): Algorithm 3 — fixed-weight leave-one-out.\n6. **Regularization**: Auto-computed from data noise level by default. Override with `zeta_omega`/`zeta_lambda`.\n\nReference:\n- Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12), 4088-4118."
603603
}
604604
],
605605
"metadata": {

docs/tutorials/18_geo_experiments.ipynb

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -854,17 +854,7 @@
854854
"cell_type": "markdown",
855855
"id": "t18-cell-028",
856856
"metadata": {},
857-
"source": [
858-
"diff-diff's `SyntheticDiD` supports three standard error methods, and the difference between the two paper-based ones is *what gets resampled* per replication:\n",
859-
"\n",
860-
"- **Placebo SE** (default): permutes which control units are pretended to be \"treated\", then **re-estimates both the unit weights and the time weights** (Frank-Wolfe) on each permutation and recomputes SDiD. The standard deviation of those placebo effects is the SE. This is Algorithm 4 in Arkhangelsky et al. (2021) and matches R's `synthdid::vcov(method=\"placebo\")`.\n",
861-
"- **Bootstrap SE**: pairs-bootstrap resampling of all units with replacement, then **re-estimates both the unit weights and the time weights** via Frank-Wolfe on each resampled panel and recomputes SDiD. This is Algorithm 2 step 2 in Arkhangelsky et al. (2021) and matches R's default `synthdid::vcov(method=\"bootstrap\")` behavior (which rebinds `attr(estimate, \"opts\")` so the renormalized ω is only Frank-Wolfe initialization). Expect ~10–100× slower per fit than placebo.\n",
862-
"- **Jackknife SE**: deterministic Algorithm 3 — fixed-weight leave-one-out across all units. Faster than bootstrap; mildly anti-conservative on smaller panels.\n",
863-
"\n",
864-
"Both bootstrap and placebo re-estimate the weights per replication, so each reflects the full uncertainty in the weighting procedure. They differ in *how* they resample: placebo permutes the control-vs-treated assignment, bootstrap draws with replacement. On exchangeable DGPs the two SEs typically track each other; on small panels with non-exchangeable factor structure (like the marketing geo-experiment here), they can differ in magnitude while still agreeing on significance and CI direction.\n",
865-
"\n",
866-
"All three methods are configured on the `SyntheticDiD` *constructor*, not on `.fit()`. Use placebo by default (it's the published method); switch to bootstrap if you want a cross-check from a different resampling protocol; switch to jackknife if you need a deterministic, fast alternative."
867-
]
857+
"source": "diff-diff's `SyntheticDiD` supports three standard error methods, and the difference between the two paper-based ones is *what gets resampled* per replication:\n\n- **Placebo SE** (default): permutes which control units are pretended to be \"treated\", then **re-estimates both the unit weights and the time weights** (Frank-Wolfe) on each permutation and recomputes SDiD. The standard deviation of those placebo effects is the SE. This is Algorithm 4 in Arkhangelsky et al. (2021) and matches R's `synthdid::vcov(method=\"placebo\")`.\n- **Bootstrap SE**: pairs-bootstrap resampling of all units with replacement, then **re-estimates both the unit weights and the time weights** via Frank-Wolfe on each resampled panel and recomputes SDiD. This is Algorithm 2 step 2 in Arkhangelsky et al. (2021) and matches R's default `synthdid::vcov(method=\"bootstrap\")` behavior (which rebinds `attr(estimate, \"opts\")` so the renormalized ω is only Frank-Wolfe initialization). Expect ~5–30× slower per fit than placebo (panel-size dependent).\n- **Jackknife SE**: deterministic Algorithm 3 — fixed-weight leave-one-out across all units. Faster than bootstrap; mildly anti-conservative on smaller panels.\n\nBoth bootstrap and placebo re-estimate the weights per replication, so each reflects the full uncertainty in the weighting procedure. They differ in *how* they resample: placebo permutes the control-vs-treated assignment, bootstrap draws with replacement. On exchangeable DGPs the two SEs typically track each other; on small panels with non-exchangeable factor structure (like the marketing geo-experiment here), they can differ in magnitude while still agreeing on significance and CI direction.\n\nAll three methods are configured on the `SyntheticDiD` *constructor*, not on `.fit()`. Use placebo by default (it's the library default; R's default is bootstrap); switch to bootstrap if you want a cross-check from a different resampling protocol; switch to jackknife if you need a deterministic, fast alternative."
868858
},
869859
{
870860
"cell_type": "code",
@@ -1095,4 +1085,4 @@
10951085
},
10961086
"nbformat": 4,
10971087
"nbformat_minor": 5
1098-
}
1088+
}

0 commit comments

Comments
 (0)