T22 consolidation pass: tighten prose, preserve methodology

igerber · claude · igerber · commit 57753eeb32c1 · 2026-05-15T09:51:59.000-04:00
Eight rounds of CI-review iteration tightened methodology
precision but left the notebook prose denser than necessary —
implementation detail and version bookkeeping had crept into §3,
§4, §5, and §7 alongside the pedagogical arc. This pass prunes
those without regressing on any methodology contract:

- §3 setup paragraph: dropped the file:line dump
  (`had.py:3747-3760`, `:3803-3808`) and the redundant
  weighted-vs-unweighted point-by-point enumeration. The
  three-point weight-consumption claim (`tau_bc`, weighted ΔY
  mean, weighted denominator) is preserved in compact form.
- §3 Assumption 5/6 note: trimmed from 15 lines to 11. Kept all
  load-bearing content (Assumption 6 / Assumption 5; not testable
  from data; §6 diagnostics necessary but not sufficient; domain
  knowledge justification; paired-with-QUG-deferral framing).
- §4 opener: restructured to lead with intuition (few states near
  d_lower → small lever for PSU correlation), with the formal
  `WAS_{d̲}` definition pushed into a `**Formal definition.**`
  callout. Both halves are preserved — the formal definition is
  unchanged in content, just demoted from the lead.
- §5: dropped the "(Phase 4.5 B composition; ...)" parenthetical
  (internal version bookkeeping, not user-facing).
- §7 methodologist block: tightened from a numbered list with two
  verbatim verdict quotes to a compact two-clause description of
  the two paths plus the shared verdict suffix quoted once.
  `report.yatchew` / `report.stute = None` callout on the
  event-study path preserved. The SE-inflation-is-modest
  explanation (with section 4 cross-link) preserved.

Methodology preservation verified against 14 load-bearing anchors:
estimand definition, Assumption 5/6 caveat, non-testability,
QUG-under-survey deferral, Phase 4.5 C0 label, Stute + Yatchew
surfaces, joint pretrends + homogeneity surfaces, ES-path
`yatchew/stute is None`, Binder TSL composition, local-linear
boundary fit description, PSU x period shock mechanism. All 14
still present in the rendered prose.

31/31 drift tests still pass (the drift suite anchors load-bearing
claims via the runtime API contract, not the notebook prose, so
prose tightening is structurally safe).

Diff: +57/-79 (net 22-line reduction in tutorial body).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/22_had_survey_design.ipynb b/docs/tutorials/22_had_survey_design.ipynb
@@ -295,26 +295,19 @@
    "source": [
     "## 3. Naive vs survey-aware headline fit\n",
     "\n",
-    "T20's headline path collapses to two periods (pre-mean vs post-mean per\n",
-    "state) and fits HAD with `design=\"auto\"` - the heuristic lands on\n",
-    "`continuous_near_d_lower` (Design 1) on this dose support, with the\n",
-    "target estimand `WAS_d_lower` (Weighted Average Slope at the lower\n",
-    "boundary). T22 fits the same configuration twice: once naive (no\n",
-    "`survey_design` argument), once survey-aware\n",
-    "(`survey_design=sd`). Both fits use the same local-linear estimator family at d_lower,\n",
-    "but the moment computations in `_fit_continuous` switch to weighted\n",
-    "form only when weights are present (`had.py:3747-3760` for the\n",
-    "denominator; `:3803-3808` for `dy_mean`). The naive fit uses the\n",
-    "unweighted local-linear `tau_bc`, the unweighted `dy_mean`, and\n",
-    "the unweighted denominator `E[D - d_lower]`. The survey-aware fit\n",
-    "uses the WEIGHTED `tau_bc` (via `bias_corrected_local_linear(...,\n",
-    "weights=weights_arr)`), the weighted `np.average(dy, weights=...)`,\n",
-    "and the weighted `np.average(d - d_lower, weights=...)`. On\n",
-    "this DGP the weight CV (~0.30) and the dose-distribution shape do\n",
-    "not co-vary strongly enough to shift the boundary slope materially,\n",
-    "so the two ATTs are numerically close on this DGP. The SE\n",
-    "and CI differ because the survey path additionally folds the PSU\n",
-    "clustering and FPC into the variance via the Binder TSL composition.\n"
+    "T20's headline path collapses to two periods (pre-mean vs post-mean\n",
+    "per state) and fits HAD with `design=\"auto\"` — the heuristic lands\n",
+    "on `continuous_near_d_lower` (Design 1), with the target estimand\n",
+    "`WAS_d_lower`. T22 fits the same configuration twice: once naive\n",
+    "(no `survey_design` argument), once survey-aware. Both fits share\n",
+    "the same local-linear machinery at d_lower; the survey path\n",
+    "additionally consumes the weights in the local-linear `tau_bc`\n",
+    "boundary fit, in the weighted ΔY mean, and in the weighted\n",
+    "denominator `E_w[D - d_lower]`. On this DGP the weight CV (~0.30)\n",
+    "and dose distribution do not co-vary strongly enough to shift the\n",
+    "boundary slope materially, so the two ATTs land close. The SE and\n",
+    "CI differ because the survey path folds PSU clustering and FPC\n",
+    "into the variance via Binder TSL.\n"
    ]
   },
   {
@@ -413,23 +406,19 @@
     "estimand, not a bug. We unpack it in the next section.\n",
     "\n",
     "\n",
-    "**A note on the non-testable identifying assumption.** Design 1\n",
-    "(`continuous_near_d_lower`) requires **Assumption 6** from de\n",
-    "Chaisemartin et al. (2026) for point identification of\n",
-    "`WAS_d_lower`, or **Assumption 5** for sign identification only.\n",
-    "Both are about local linearity of the dose-response near `d_lower`\n",
-    "and are **not testable from data** — the linearity diagnostics\n",
-    "exercised in §6 (Stute, Yatchew, joint pretrends, joint\n",
-    "homogeneity) are necessary but not sufficient. Justify Assumption\n",
-    "6 from domain knowledge: is there reason to believe the marginal\n",
-    "effect of the next $1K of supplemental spend is roughly constant\n",
-    "in the $5K-$50K range? On this DGP it is, by construction; in a\n",
-    "real analysis, this is the load-bearing methodology caveat\n",
-    "alongside the QUG-under-survey deferral §6 calls out. The library\n",
-    "fires a `UserWarning` flagging this on every Design 1 fit; we\n",
-    "let it surface in the cell above for the headline fit and\n",
-    "narrowly filter it on subsequent fits to keep the cell output\n",
-    "focused."
+    "**A non-testable identifying assumption.** Design 1 requires\n",
+    "**Assumption 6** for point identification of `WAS_d_lower` (or\n",
+    "**Assumption 5** for sign identification only) — both are about\n",
+    "local linearity of the dose-response near `d_lower` and are **not\n",
+    "testable from data**. The §6 linearity diagnostics (Stute,\n",
+    "Yatchew, joint pretrends/homogeneity) are necessary but not\n",
+    "sufficient. Assumption 6 itself is justified from domain\n",
+    "knowledge (is the marginal effect of the next $1K of supplemental\n",
+    "spend roughly constant in the $5K-$50K range?). The library fires\n",
+    "a `UserWarning` on every Design 1 fit; the headline cell above\n",
+    "lets it surface, subsequent cells filter it as redundant. This is\n",
+    "the load-bearing methodology caveat alongside the QUG-under-survey\n",
+    "deferral (§6)."
    ]
   },
   {
@@ -439,21 +428,23 @@
    "source": [
     "## 4. Why the SE inflation is modest for HAD\n",
     "\n",
-    "The HAD `WAS_d_lower` estimand is the **average slope above d_lower**:\n",
-    "`WAS_{d̲} = (E[ΔY] - lim_{d↓d̲} E[ΔY | D_2 ≤ d]) / E[D_2 - d̲]`\n",
-    "(REGISTRY § HeterogeneousAdoptionDiD; `had.py:21-31`). The estimator\n",
-    "uses a **local-linear boundary fit** to estimate the\n",
-    "`lim_{d↓d_lower} E[ΔY | D_2 ≤ d]` term — the only component of the\n",
-    "estimand that requires nonparametric identification. The leading-\n",
-    "order variance is therefore dominated by the influence functions of\n",
-    "units near `d_lower`, NOT by the full panel. With dose ~ Uniform[5, 50] and\n",
-    "60 states, only a handful of states sit close to d_lower ~ 5 - and\n",
-    "those are the units whose IFs dominate `Var(WAS_d_lower)`. The\n",
-    "PSU-level cluster correlation can amplify the variance only as much\n",
-    "as those few units are correlated with PSU-mates. With 2 states/PSU\n",
-    "and only a small share of states near the boundary, the within-PSU\n",
+    "**The intuition.** `WAS_d_lower` is the average slope above d_lower,\n",
+    "but its leading-order variance reads off a local-linear boundary\n",
+    "fit at `d_lower` — and that fit only weights units near the\n",
+    "boundary. With dose ~ Uniform[5, 50] and 60 states, only a handful\n",
+    "of states sit close to d_lower ~ 5, and those are the units whose\n",
+    "influence functions dominate `Var(WAS_d_lower)`. The PSU-level\n",
+    "cluster correlation can amplify the variance only as much as those\n",
+    "few units are correlated with PSU-mates. With 2 states/PSU and\n",
+    "only a small share of states near the boundary, the within-PSU\n",
     "correlation has a small lever to act on.\n",
     "\n",
+    "**Formal definition.** `WAS_{d̲} = (E[ΔY] - lim_{d↓d̲} E[ΔY | D_2\n",
+    "≤ d]) / E[D_2 - d̲]` (REGISTRY § HeterogeneousAdoptionDiD;\n",
+    "`had.py:21-31`). The estimator uses a local-linear boundary fit at\n",
+    "`d_lower` to estimate the `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` term — the\n",
+    "only component requiring nonparametric identification.\n",
+    "\n",
     "Contrast with the event-study path: each event-time horizon is a\n",
     "**separate** local-linear fit on that horizon's first differences\n",
     "`ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` against the common regressor\n",
@@ -545,9 +536,8 @@
     "Refit with `aggregate=\"event_study\"` and `cband=True` to get\n",
     "per-horizon ATT estimates plus a sup-t confidence band that adjusts\n",
     "for the multiple-horizon comparison. The cband is computed via a\n",
-    "multiplier bootstrap that aggregates per-PSU IF tensor under the\n",
-    "survey design (Phase 4.5 B composition; the Phase 4.5 C work\n",
-    "covered the survey-aware Stute pretests demonstrated in §6).\n"
+    "multiplier bootstrap that aggregates the per-PSU IF tensor under\n",
+    "the survey design.\n"
    ]
   },
   {
@@ -785,34 +775,22 @@
     "> rollout.\n",
     "\n",
     "> **For the methodologist.** The HAD pretest workflow runs two\n",
-    "> diagnostic passes; the QUG step is deferred under survey/weights\n",
-    "> per Phase 4.5 C0 (the load-bearing caveat we owe the audience).\n",
-    ">\n",
-    "> 1. **Overall (two-period) path:** `Stute` (CvM linearity test on\n",
-    ">    residuals) + `Yatchew-HR` (closed-form weighted-OLS sandwich,\n",
-    ">    `null=\"linearity\"` only - T22 does not exercise the\n",
-    ">    `mean_independence` mode). Both fail-to-reject; verdict reads\n",
-    ">    `\"Stute and Yatchew linearity diagnostics fail-to-reject\n",
-    ">    (linearity-conditional verdict; QUG-under-survey deferred per\n",
-    ">    Phase 4.5 C0)\"`.\n",
-    ">\n",
-    "> 2. **Event-study path:** `joint pre-trends` (joint-Stute over the\n",
-    ">    three pre-launch placebo horizons) + `joint homogeneity`\n",
-    ">    (joint-Stute over the four post-launch horizons). Both\n",
-    ">    fail-to-reject; verdict reads `\"joint pre-trends and joint\n",
-    ">    linearity diagnostics fail-to-reject (linearity-conditional\n",
-    ">    verdict; QUG-under-survey deferred per Phase 4.5 C0)\"`.\n",
-    ">    `report.yatchew is None` and `report.stute is None` on this\n",
-    ">    path - those single-horizon tests are overall-only.\n",
+    "> diagnostic passes — overall (`Stute` CvM + `Yatchew-HR`\n",
+    "> closed-form) on the two-period collapse, and event-study\n",
+    "> (`joint pre-trends` + `joint homogeneity`, both joint-Stute)\n",
+    "> on the full panel. Both passes fail-to-reject on this DGP. Both\n",
+    "> verdicts end in `(linearity-conditional verdict; QUG-under-survey\n",
+    "> deferred per Phase 4.5 C0)` — the load-bearing C0 caveat. On the\n",
+    "> event-study path `report.yatchew` and `report.stute` are `None`;\n",
+    "> those single-horizon tests are overall-only.\n",
     ">\n",
-    "> Both paths share the QUG-under-survey deferral suffix. The\n",
-    "> design-based SE on the headline fit is ~10% larger than the naive\n",
-    "> SE - smaller than the inflation a CallawaySantAnna or\n",
+    "> The design-based SE on the headline fit is ~10% larger than the\n",
+    "> naive SE — smaller than the inflation a CallawaySantAnna or\n",
     "> LinearRegression coefficient would see at this PSU correlation,\n",
-    "> because HAD uses a local-linear boundary fit at d_lower to\n",
-    "> estimate the boundary-limit term in the `WAS-d_lower` formula\n",
-    "> (variance is dominated by the few states near the boundary, not\n",
-    "> by the full panel; see section 4).\n"
+    "> because HAD uses a local-linear boundary fit at `d_lower` to\n",
+    "> estimate the boundary-limit term in the `WAS_d_lower` formula\n",
+    "> (variance is dominated by the few states near the boundary; see\n",
+    "> §4).\n"
    ]
   },
   {