Address R10 P2 + P3: FW non-conv denominator; survey docs SDID Rao-Wu

igerber · claude · igerber · commit 5eadcb6dbd1c · 2026-04-23T17:25:27.000-04:00
R10 CI review found two items on top of the previous ✅ Looks good.

P2 Code Quality — aggregate Frank-Wolfe non-convergence warning
numerator/denominator mismatch. In ``_bootstrap_se``,
``fw_nonconvergence_count`` was incremented before the draw cleared
the ``np.isfinite(tau)`` gate. A draw that failed FW convergence AND
then produced non-finite τ would count toward the warning numerator
while the denominator is ``n_successful`` (draws that cleared the
finite-τ gate). That does not affect the reported SE, but it can
overstate the documented "share of valid bootstrap draws" warning
contract and cause the warning to over-trigger.

Fix: move the increment inside the ``if np.isfinite(tau)`` block so
the numerator only counts draws that also contribute to the SE. A
draw failing the finite-τ gate is retried upstream and should not
inflate the non-convergence rate.

P3 Documentation (previously unresolved) — two survey-cross-reference
docs still advertised SyntheticDiD Rao-Wu bootstrap support, which
the estimator now rejects at fit-time with NotImplementedError:

- ``docs/methodology/survey-theory.md:725`` — rewrite the Rao-Wu bullet
  to exclude SDID explicitly, with a pointer to the REGISTRY sketch
  for the deferred weighted-FW composition and to pweight-only
  placebo/jackknife as the available SDID variance alternatives.
- ``docs/tutorials/16_survey_did.ipynb`` cell-35-f1ef376c — update the
  support-matrix table so SDID's row reads
  "pweight only (placebo / jackknife)" with bootstrap struck out, and
  add a "Note on SyntheticDiD" below explaining which methods accept
  pweight-only and why bootstrap rejects all survey designs (weighted-
  FW derivation tracked in TODO.md).

Test coverage unchanged: TestBootstrapSE ran the full 7 under Rust
with 48-of-50 non-convergence warning still firing on the regression
test, confirming the warning numerator still tallies correctly after
the gate-order change.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
@@ -975,13 +975,6 @@ def _bootstrap_se(
                     init_weights=boot_lambda_init,
                     return_convergence=True,
                 )
-                # Count draws with ANY non-convergence (boolean per draw),
-                # not raw solver warnings — a single draw can emit up to
-                # three non-convergence events (ω pre-sparsify, ω main, λ).
-                # The registry text describes the rate per valid draw.
-                if not (omega_converged and lambda_converged):
-                    fw_nonconvergence_count += 1
-
                 tau = compute_sdid_estimator(
                     Y_boot_pre_c,
                     Y_boot_post_c,
@@ -992,6 +985,17 @@ def _bootstrap_se(
                 )
                 if np.isfinite(tau):
                     bootstrap_estimates.append(float(tau))
+                    # Count draws with ANY non-convergence (boolean per
+                    # draw), not raw solver warnings — a single draw can
+                    # emit up to three non-convergence events (ω
+                    # pre-sparsify, ω main, λ). Increment the counter only
+                    # after the finite-τ gate so the registry's "share of
+                    # valid bootstrap draws" denominator matches the
+                    # numerator (draws that failed the finite-τ gate are
+                    # retried, so they shouldn't inflate the non-
+                    # convergence rate).
+                    if not (omega_converged and lambda_converged):
+                        fw_nonconvergence_count += 1
 
             except (ValueError, LinAlgError):
                 continue
diff --git a/docs/methodology/survey-theory.md b/docs/methodology/survey-theory.md
@@ -722,9 +722,17 @@ Two bootstrap strategies interact with survey designs:
   Generates multiplier weights at the PSU level within strata, with FPC
   scaling. Each bootstrap draw reweights the IF values.
 
-- **Rao-Wu rescaled bootstrap** (SunAbraham, SyntheticDiD, TROP): Draws PSUs
+- **Rao-Wu rescaled bootstrap** (SunAbraham, TROP): Draws PSUs
   with replacement within strata and rescales observation weights. Each draw
-  re-runs the full estimator on the resampled data.
+  re-runs the full estimator on the resampled data. *SyntheticDiD is
+  intentionally excluded in this release:* the paper-faithful refit
+  bootstrap rejects every survey design because composing Rao-Wu rescaled
+  weights with Frank-Wolfe re-estimation requires a weighted-FW derivation
+  that is not yet implemented. Pweight-only SDID users should use
+  ``variance_method="placebo"`` or ``"jackknife"``; strata/PSU/FPC users
+  have no SDID variance option. See TODO.md and
+  ``docs/methodology/REGISTRY.md`` §SyntheticDiD for the deferred-
+  composition sketch.
 
 ---
 
diff --git a/docs/tutorials/16_survey_did.ipynb b/docs/tutorials/16_survey_did.ipynb
@@ -195,7 +195,7 @@
    "id": "cell-05-90139e87",
    "metadata": {},
    "source": [
-    "**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. The library rescales them internally so that weighted estimators are numerically stable. This is standard practice (Lumley 2004, \u00a72.2). The warning confirms rescaling occurred; it is not an error."
+    "**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. The library rescales them internally so that weighted estimators are numerically stable. This is standard practice (Lumley 2004, §2.2). The warning confirms rescaling occurred; it is not an error."
    ]
   },
   {
@@ -1087,42 +1087,7 @@
    "cell_type": "markdown",
    "id": "cell-35-f1ef376c",
    "metadata": {},
-   "source": [
-    "## 9. Which Estimators Support Survey Design?\n",
-    "\n",
-    "`diff-diff` supports survey design across all estimators, though the level of support varies:\n",
-    "\n",
-    "| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n",
-    "|-----------|---------|---------------------|-------------------|------------------------|\n",
-    "| **DifferenceInDifferences** | Full | Full | -- | -- |\n",
-    "| **TwoWayFixedEffects** | Full | Full | -- | -- |\n",
-    "| **MultiPeriodDiD** | Full | Full | -- | -- |\n",
-    "| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n",
-    "| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n",
-    "| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n",
-    "| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n",
-    "| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n",
-    "| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n",
-    "| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n",
-    "| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n",
-    "| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n",
-    "| **SyntheticDiD** | pweight only | -- | -- | Rao-Wu rescaled |\n",
-    "| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n",
-    "| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n",
-    "\n",
-    "**Legend:**\n",
-    "- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n",
-    "- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n",
-    "- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n",
-    "- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n",
-    "- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n",
-    "- **Diagnostic**: Weighted descriptive statistics only (no inference)\n",
-    "- **--**: Not supported\n",
-    "\n",
-    "**Note:** `EfficientDiD` supports `covariates` and `survey_design` simultaneously. The doubly-robust (DR) path threads survey weights through WLS outcome regression, weighted sieve propensity ratios, and survey-weighted kernel smoothing.\n",
-    "\n",
-    "For full details, see `docs/survey-roadmap.md`."
-   ]
+   "source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only (placebo / jackknife) | -- | -- | -- |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note on SyntheticDiD:** `variance_method=\"placebo\"` and `variance_method=\"jackknife\"` support pweight-only survey designs. `variance_method=\"bootstrap\"` rejects every survey design (including pweight-only) because the paper-faithful refit bootstrap composed with Rao-Wu rescaled weights requires a weighted-Frank-Wolfe derivation that is not yet implemented. Strata/PSU/FPC are not supported by any SDID variance method in this release. The weighted-FW + Rao-Wu composition follow-up is tracked in `TODO.md`; see `docs/methodology/REGISTRY.md` §SyntheticDiD for the deferred-composition sketch.\n\n**Note:** `EfficientDiD` supports `covariates` and `survey_design` simultaneously. The doubly-robust (DR) path threads survey weights through WLS outcome regression, weighted sieve propensity ratios, and survey-weighted kernel smoothing.\n\nFor full details, see `docs/survey-roadmap.md`."
   },
   {
    "cell_type": "markdown",
@@ -1137,15 +1102,15 @@
     "\n",
     "**Policy background.** The Affordable Care Act's dependent coverage provision, effective\n",
     "September 2010, allowed young adults to remain on their parents' health insurance until age 26.\n",
-    "This created a natural experiment: adults aged 19\u201325 gained coverage access (treatment group),\n",
-    "while adults aged 27\u201334 \u2014 similar demographics but ineligible \u2014 serve as controls. This is one\n",
+    "This created a natural experiment: adults aged 19–25 gained coverage access (treatment group),\n",
+    "while adults aged 27–34 — similar demographics but ineligible — serve as controls. This is one\n",
     "of the most widely studied DiD natural experiments in health economics.\n",
     "\n",
     "> Antwi, Y.A., Moriya, A.S. & Simon, K. (2013). \"Effects of Federal Policy to Insure Young\n",
     "> Adults: Evidence from the 2010 Affordable Care Act's Dependent-Coverage Mandate.\"\n",
-    "> *American Economic Journal: Economic Policy* 5(4): 1\u201328.\n",
+    "> *American Economic Journal: Economic Policy* 5(4): 1–28.\n",
     "\n",
-    "We use two NHANES cycles: **2007\u20132008** (pre-ACA) and **2015\u20132016** (post-ACA), with health\n",
+    "We use two NHANES cycles: **2007–2008** (pre-ACA) and **2015–2016** (post-ACA), with health\n",
     "insurance coverage as the outcome (binary, modeled as a linear probability model). NHANES uses\n",
     "a complex multi-stage probability sampling design with **masked pseudo-strata** (`SDMVSTRA`),\n",
     "**masked pseudo-PSUs** (`SDMVPSU`), and **exam weights** (`WTMEC2YR`). Because PSU IDs are\n",
@@ -1351,21 +1316,21 @@
    "source": [
     "With real survey data, **both the point estimate and standard error change** when we account\n",
     "for the survey design. The ATT shifts from 0.065 (unweighted) to 0.097 (weighted) because\n",
-    "NHANES uses unequal selection probabilities \u2014 the weighted estimate is population-representative,\n",
+    "NHANES uses unequal selection probabilities — the weighted estimate is population-representative,\n",
     "while the unweighted one over- or under-represents certain demographic groups. The SE also\n",
     "increases because NHANES clusters individuals within PSUs, and people from the same geographic\n",
     "area have correlated insurance status.\n",
     "\n",
     "The survey-corrected estimate of **~9.7 percentage points** suggests that the ACA provisions\n",
     "meaningfully increased insurance coverage among young adults. This is consistent with the\n",
     "published literature: studies measuring only the 2010 dependent coverage mandate find\n",
-    "3\u20136 pp effects (Antwi et al., 2013; Sommers, 2012), while studies spanning the full\n",
-    "ACA implementation \u2014 including the 2014 marketplace, individual mandate, and Medicaid\n",
-    "expansion \u2014 find 8\u201313 pp (Kaestner et al., 2017; Courtemanche et al., 2017). Our\n",
-    "2007\u201308 vs. 2015\u201316 window captures all of these provisions, placing the 9.7 pp\n",
+    "3–6 pp effects (Antwi et al., 2013; Sommers, 2012), while studies spanning the full\n",
+    "ACA implementation — including the 2014 marketplace, individual mandate, and Medicaid\n",
+    "expansion — find 8–13 pp (Kaestner et al., 2017; Courtemanche et al., 2017). Our\n",
+    "2007–08 vs. 2015–16 window captures all of these provisions, placing the 9.7 pp\n",
     "estimate squarely in the expected range.\n",
     "\n",
-    "The survey degrees of freedom (31 = n_PSU \u2212 n_strata) reflect the actual number\n",
+    "The survey degrees of freedom (31 = n_PSU − n_strata) reflect the actual number\n",
     "of independent sampling units, not the number of individuals. This is why the\n",
     "confidence interval [0.006, 0.187] is wide despite nearly 3,000 observations.\n",
     "\n",