Skip to content

Commit 5eadcb6

Browse files
igerberclaude
andcommitted
Address R10 P2 + P3: FW non-conv denominator; survey docs SDID Rao-Wu
R10 CI review found two items on top of the previous ✅ Looks good. P2 Code Quality — aggregate Frank-Wolfe non-convergence warning numerator/denominator mismatch. In ``_bootstrap_se``, ``fw_nonconvergence_count`` was incremented before the draw cleared the ``np.isfinite(tau)`` gate. A draw that failed FW convergence AND then produced non-finite τ would count toward the warning numerator while the denominator is ``n_successful`` (draws that cleared the finite-τ gate). That does not affect the reported SE, but it can overstate the documented "share of valid bootstrap draws" warning contract and cause the warning to over-trigger. Fix: move the increment inside the ``if np.isfinite(tau)`` block so the numerator only counts draws that also contribute to the SE. A draw failing the finite-τ gate is retried upstream and should not inflate the non-convergence rate. P3 Documentation (previously unresolved) — two survey-cross-reference docs still advertised SyntheticDiD Rao-Wu bootstrap support, which the estimator now rejects at fit-time with NotImplementedError: - ``docs/methodology/survey-theory.md:725`` — rewrite the Rao-Wu bullet to exclude SDID explicitly, with a pointer to the REGISTRY sketch for the deferred weighted-FW composition and to pweight-only placebo/jackknife as the available SDID variance alternatives. - ``docs/tutorials/16_survey_did.ipynb`` cell-35-f1ef376c — update the support-matrix table so SDID's row reads "pweight only (placebo / jackknife)" with bootstrap struck out, and add a "Note on SyntheticDiD" below explaining which methods accept pweight-only and why bootstrap rejects all survey designs (weighted- FW derivation tracked in TODO.md). Test coverage unchanged: TestBootstrapSE ran the full 7 under Rust with 48-of-50 non-convergence warning still firing on the regression test, confirming the warning numerator still tallies correctly after the gate-order change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b123c2b commit 5eadcb6

3 files changed

Lines changed: 33 additions & 56 deletions

File tree

diff_diff/synthetic_did.py

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -975,13 +975,6 @@ def _bootstrap_se(
975975
init_weights=boot_lambda_init,
976976
return_convergence=True,
977977
)
978-
# Count draws with ANY non-convergence (boolean per draw),
979-
# not raw solver warnings — a single draw can emit up to
980-
# three non-convergence events (ω pre-sparsify, ω main, λ).
981-
# The registry text describes the rate per valid draw.
982-
if not (omega_converged and lambda_converged):
983-
fw_nonconvergence_count += 1
984-
985978
tau = compute_sdid_estimator(
986979
Y_boot_pre_c,
987980
Y_boot_post_c,
@@ -992,6 +985,17 @@ def _bootstrap_se(
992985
)
993986
if np.isfinite(tau):
994987
bootstrap_estimates.append(float(tau))
988+
# Count draws with ANY non-convergence (boolean per
989+
# draw), not raw solver warnings — a single draw can
990+
# emit up to three non-convergence events (ω
991+
# pre-sparsify, ω main, λ). Increment the counter only
992+
# after the finite-τ gate so the registry's "share of
993+
# valid bootstrap draws" denominator matches the
994+
# numerator (draws that failed the finite-τ gate are
995+
# retried, so they shouldn't inflate the non-
996+
# convergence rate).
997+
if not (omega_converged and lambda_converged):
998+
fw_nonconvergence_count += 1
995999

9961000
except (ValueError, LinAlgError):
9971001
continue

docs/methodology/survey-theory.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -722,9 +722,17 @@ Two bootstrap strategies interact with survey designs:
722722
Generates multiplier weights at the PSU level within strata, with FPC
723723
scaling. Each bootstrap draw reweights the IF values.
724724

725-
- **Rao-Wu rescaled bootstrap** (SunAbraham, SyntheticDiD, TROP): Draws PSUs
725+
- **Rao-Wu rescaled bootstrap** (SunAbraham, TROP): Draws PSUs
726726
with replacement within strata and rescales observation weights. Each draw
727-
re-runs the full estimator on the resampled data.
727+
re-runs the full estimator on the resampled data. *SyntheticDiD is
728+
intentionally excluded in this release:* the paper-faithful refit
729+
bootstrap rejects every survey design because composing Rao-Wu rescaled
730+
weights with Frank-Wolfe re-estimation requires a weighted-FW derivation
731+
that is not yet implemented. Pweight-only SDID users should use
732+
``variance_method="placebo"`` or ``"jackknife"``; strata/PSU/FPC users
733+
have no SDID variance option. See TODO.md and
734+
``docs/methodology/REGISTRY.md`` §SyntheticDiD for the deferred-
735+
composition sketch.
728736

729737
---
730738

docs/tutorials/16_survey_did.ipynb

Lines changed: 12 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@
195195
"id": "cell-05-90139e87",
196196
"metadata": {},
197197
"source": [
198-
"**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. The library rescales them internally so that weighted estimators are numerically stable. This is standard practice (Lumley 2004, \u00a72.2). The warning confirms rescaling occurred; it is not an error."
198+
"**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. The library rescales them internally so that weighted estimators are numerically stable. This is standard practice (Lumley 2004, §2.2). The warning confirms rescaling occurred; it is not an error."
199199
]
200200
},
201201
{
@@ -1087,42 +1087,7 @@
10871087
"cell_type": "markdown",
10881088
"id": "cell-35-f1ef376c",
10891089
"metadata": {},
1090-
"source": [
1091-
"## 9. Which Estimators Support Survey Design?\n",
1092-
"\n",
1093-
"`diff-diff` supports survey design across all estimators, though the level of support varies:\n",
1094-
"\n",
1095-
"| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n",
1096-
"|-----------|---------|---------------------|-------------------|------------------------|\n",
1097-
"| **DifferenceInDifferences** | Full | Full | -- | -- |\n",
1098-
"| **TwoWayFixedEffects** | Full | Full | -- | -- |\n",
1099-
"| **MultiPeriodDiD** | Full | Full | -- | -- |\n",
1100-
"| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n",
1101-
"| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n",
1102-
"| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n",
1103-
"| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n",
1104-
"| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n",
1105-
"| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n",
1106-
"| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n",
1107-
"| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n",
1108-
"| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n",
1109-
"| **SyntheticDiD** | pweight only | -- | -- | Rao-Wu rescaled |\n",
1110-
"| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n",
1111-
"| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n",
1112-
"\n",
1113-
"**Legend:**\n",
1114-
"- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n",
1115-
"- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n",
1116-
"- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n",
1117-
"- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n",
1118-
"- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n",
1119-
"- **Diagnostic**: Weighted descriptive statistics only (no inference)\n",
1120-
"- **--**: Not supported\n",
1121-
"\n",
1122-
"**Note:** `EfficientDiD` supports `covariates` and `survey_design` simultaneously. The doubly-robust (DR) path threads survey weights through WLS outcome regression, weighted sieve propensity ratios, and survey-weighted kernel smoothing.\n",
1123-
"\n",
1124-
"For full details, see `docs/survey-roadmap.md`."
1125-
]
1090+
"source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only (placebo / jackknife) | -- | -- | -- |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note on SyntheticDiD:** `variance_method=\"placebo\"` and `variance_method=\"jackknife\"` support pweight-only survey designs. `variance_method=\"bootstrap\"` rejects every survey design (including pweight-only) because the paper-faithful refit bootstrap composed with Rao-Wu rescaled weights requires a weighted-Frank-Wolfe derivation that is not yet implemented. Strata/PSU/FPC are not supported by any SDID variance method in this release. The weighted-FW + Rao-Wu composition follow-up is tracked in `TODO.md`; see `docs/methodology/REGISTRY.md` §SyntheticDiD for the deferred-composition sketch.\n\n**Note:** `EfficientDiD` supports `covariates` and `survey_design` simultaneously. The doubly-robust (DR) path threads survey weights through WLS outcome regression, weighted sieve propensity ratios, and survey-weighted kernel smoothing.\n\nFor full details, see `docs/survey-roadmap.md`."
11261091
},
11271092
{
11281093
"cell_type": "markdown",
@@ -1137,15 +1102,15 @@
11371102
"\n",
11381103
"**Policy background.** The Affordable Care Act's dependent coverage provision, effective\n",
11391104
"September 2010, allowed young adults to remain on their parents' health insurance until age 26.\n",
1140-
"This created a natural experiment: adults aged 19\u201325 gained coverage access (treatment group),\n",
1141-
"while adults aged 27\u201334 \u2014 similar demographics but ineligible \u2014 serve as controls. This is one\n",
1105+
"This created a natural experiment: adults aged 19–25 gained coverage access (treatment group),\n",
1106+
"while adults aged 27–34 — similar demographics but ineligible serve as controls. This is one\n",
11421107
"of the most widely studied DiD natural experiments in health economics.\n",
11431108
"\n",
11441109
"> Antwi, Y.A., Moriya, A.S. & Simon, K. (2013). \"Effects of Federal Policy to Insure Young\n",
11451110
"> Adults: Evidence from the 2010 Affordable Care Act's Dependent-Coverage Mandate.\"\n",
1146-
"> *American Economic Journal: Economic Policy* 5(4): 1\u201328.\n",
1111+
"> *American Economic Journal: Economic Policy* 5(4): 1–28.\n",
11471112
"\n",
1148-
"We use two NHANES cycles: **2007\u20132008** (pre-ACA) and **2015\u20132016** (post-ACA), with health\n",
1113+
"We use two NHANES cycles: **2007–2008** (pre-ACA) and **2015–2016** (post-ACA), with health\n",
11491114
"insurance coverage as the outcome (binary, modeled as a linear probability model). NHANES uses\n",
11501115
"a complex multi-stage probability sampling design with **masked pseudo-strata** (`SDMVSTRA`),\n",
11511116
"**masked pseudo-PSUs** (`SDMVPSU`), and **exam weights** (`WTMEC2YR`). Because PSU IDs are\n",
@@ -1351,21 +1316,21 @@
13511316
"source": [
13521317
"With real survey data, **both the point estimate and standard error change** when we account\n",
13531318
"for the survey design. The ATT shifts from 0.065 (unweighted) to 0.097 (weighted) because\n",
1354-
"NHANES uses unequal selection probabilities \u2014 the weighted estimate is population-representative,\n",
1319+
"NHANES uses unequal selection probabilities the weighted estimate is population-representative,\n",
13551320
"while the unweighted one over- or under-represents certain demographic groups. The SE also\n",
13561321
"increases because NHANES clusters individuals within PSUs, and people from the same geographic\n",
13571322
"area have correlated insurance status.\n",
13581323
"\n",
13591324
"The survey-corrected estimate of **~9.7 percentage points** suggests that the ACA provisions\n",
13601325
"meaningfully increased insurance coverage among young adults. This is consistent with the\n",
13611326
"published literature: studies measuring only the 2010 dependent coverage mandate find\n",
1362-
"3\u20136 pp effects (Antwi et al., 2013; Sommers, 2012), while studies spanning the full\n",
1363-
"ACA implementation \u2014 including the 2014 marketplace, individual mandate, and Medicaid\n",
1364-
"expansion \u2014 find 8\u201313 pp (Kaestner et al., 2017; Courtemanche et al., 2017). Our\n",
1365-
"2007\u201308 vs. 2015\u201316 window captures all of these provisions, placing the 9.7 pp\n",
1327+
"3–6 pp effects (Antwi et al., 2013; Sommers, 2012), while studies spanning the full\n",
1328+
"ACA implementation including the 2014 marketplace, individual mandate, and Medicaid\n",
1329+
"expansion find 8–13 pp (Kaestner et al., 2017; Courtemanche et al., 2017). Our\n",
1330+
"2007–08 vs. 2015–16 window captures all of these provisions, placing the 9.7 pp\n",
13661331
"estimate squarely in the expected range.\n",
13671332
"\n",
1368-
"The survey degrees of freedom (31 = n_PSU \u2212 n_strata) reflect the actual number\n",
1333+
"The survey degrees of freedom (31 = n_PSU n_strata) reflect the actual number\n",
13691334
"of independent sampling units, not the number of individuals. This is why the\n",
13701335
"confidence interval [0.006, 0.187] is wide despite nearly 3,000 observations.\n",
13711336
"\n",

0 commit comments

Comments
 (0)