Skip to content

Commit 57753ee

Browse files
igerberclaude
andcommitted
T22 consolidation pass: tighten prose, preserve methodology
Eight rounds of CI-review iteration tightened methodology precision but left the notebook prose denser than necessary — implementation detail and version bookkeeping had crept into §3, §4, §5, and §7 alongside the pedagogical arc. This pass prunes those without regressing on any methodology contract: - §3 setup paragraph: dropped the file:line dump (`had.py:3747-3760`, `:3803-3808`) and the redundant weighted-vs-unweighted point-by-point enumeration. The three-point weight-consumption claim (`tau_bc`, weighted ΔY mean, weighted denominator) is preserved in compact form. - §3 Assumption 5/6 note: trimmed from 15 lines to 11. Kept all load-bearing content (Assumption 6 / Assumption 5; not testable from data; §6 diagnostics necessary but not sufficient; domain knowledge justification; paired-with-QUG-deferral framing). - §4 opener: restructured to lead with intuition (few states near d_lower → small lever for PSU correlation), with the formal `WAS_{d̲}` definition pushed into a `**Formal definition.**` callout. Both halves are preserved — the formal definition is unchanged in content, just demoted from the lead. - §5: dropped the "(Phase 4.5 B composition; ...)" parenthetical (internal version bookkeeping, not user-facing). - §7 methodologist block: tightened from a numbered list with two verbatim verdict quotes to a compact two-clause description of the two paths plus the shared verdict suffix quoted once. `report.yatchew` / `report.stute = None` callout on the event-study path preserved. The SE-inflation-is-modest explanation (with section 4 cross-link) preserved. Methodology preservation verified against 14 load-bearing anchors: estimand definition, Assumption 5/6 caveat, non-testability, QUG-under-survey deferral, Phase 4.5 C0 label, Stute + Yatchew surfaces, joint pretrends + homogeneity surfaces, ES-path `yatchew/stute is None`, Binder TSL composition, local-linear boundary fit description, PSU x period shock mechanism. All 14 still present in the rendered prose. 31/31 drift tests still pass (the drift suite anchors load-bearing claims via the runtime API contract, not the notebook prose, so prose tightening is structurally safe). Diff: +57/-79 (net 22-line reduction in tutorial body). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e2a809d commit 57753ee

1 file changed

Lines changed: 57 additions & 79 deletions

File tree

docs/tutorials/22_had_survey_design.ipynb

Lines changed: 57 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -295,26 +295,19 @@
295295
"source": [
296296
"## 3. Naive vs survey-aware headline fit\n",
297297
"\n",
298-
"T20's headline path collapses to two periods (pre-mean vs post-mean per\n",
299-
"state) and fits HAD with `design=\"auto\"` - the heuristic lands on\n",
300-
"`continuous_near_d_lower` (Design 1) on this dose support, with the\n",
301-
"target estimand `WAS_d_lower` (Weighted Average Slope at the lower\n",
302-
"boundary). T22 fits the same configuration twice: once naive (no\n",
303-
"`survey_design` argument), once survey-aware\n",
304-
"(`survey_design=sd`). Both fits use the same local-linear estimator family at d_lower,\n",
305-
"but the moment computations in `_fit_continuous` switch to weighted\n",
306-
"form only when weights are present (`had.py:3747-3760` for the\n",
307-
"denominator; `:3803-3808` for `dy_mean`). The naive fit uses the\n",
308-
"unweighted local-linear `tau_bc`, the unweighted `dy_mean`, and\n",
309-
"the unweighted denominator `E[D - d_lower]`. The survey-aware fit\n",
310-
"uses the WEIGHTED `tau_bc` (via `bias_corrected_local_linear(...,\n",
311-
"weights=weights_arr)`), the weighted `np.average(dy, weights=...)`,\n",
312-
"and the weighted `np.average(d - d_lower, weights=...)`. On\n",
313-
"this DGP the weight CV (~0.30) and the dose-distribution shape do\n",
314-
"not co-vary strongly enough to shift the boundary slope materially,\n",
315-
"so the two ATTs are numerically close on this DGP. The SE\n",
316-
"and CI differ because the survey path additionally folds the PSU\n",
317-
"clustering and FPC into the variance via the Binder TSL composition.\n"
298+
"T20's headline path collapses to two periods (pre-mean vs post-mean\n",
299+
"per state) and fits HAD with `design=\"auto\"` — the heuristic lands\n",
300+
"on `continuous_near_d_lower` (Design 1), with the target estimand\n",
301+
"`WAS_d_lower`. T22 fits the same configuration twice: once naive\n",
302+
"(no `survey_design` argument), once survey-aware. Both fits share\n",
303+
"the same local-linear machinery at d_lower; the survey path\n",
304+
"additionally consumes the weights in the local-linear `tau_bc`\n",
305+
"boundary fit, in the weighted ΔY mean, and in the weighted\n",
306+
"denominator `E_w[D - d_lower]`. On this DGP the weight CV (~0.30)\n",
307+
"and dose distribution do not co-vary strongly enough to shift the\n",
308+
"boundary slope materially, so the two ATTs land close. The SE and\n",
309+
"CI differ because the survey path folds PSU clustering and FPC\n",
310+
"into the variance via Binder TSL.\n"
318311
]
319312
},
320313
{
@@ -413,23 +406,19 @@
413406
"estimand, not a bug. We unpack it in the next section.\n",
414407
"\n",
415408
"\n",
416-
"**A note on the non-testable identifying assumption.** Design 1\n",
417-
"(`continuous_near_d_lower`) requires **Assumption 6** from de\n",
418-
"Chaisemartin et al. (2026) for point identification of\n",
419-
"`WAS_d_lower`, or **Assumption 5** for sign identification only.\n",
420-
"Both are about local linearity of the dose-response near `d_lower`\n",
421-
"and are **not testable from data** — the linearity diagnostics\n",
422-
"exercised in §6 (Stute, Yatchew, joint pretrends, joint\n",
423-
"homogeneity) are necessary but not sufficient. Justify Assumption\n",
424-
"6 from domain knowledge: is there reason to believe the marginal\n",
425-
"effect of the next $1K of supplemental spend is roughly constant\n",
426-
"in the $5K-$50K range? On this DGP it is, by construction; in a\n",
427-
"real analysis, this is the load-bearing methodology caveat\n",
428-
"alongside the QUG-under-survey deferral §6 calls out. The library\n",
429-
"fires a `UserWarning` flagging this on every Design 1 fit; we\n",
430-
"let it surface in the cell above for the headline fit and\n",
431-
"narrowly filter it on subsequent fits to keep the cell output\n",
432-
"focused."
409+
"**A non-testable identifying assumption.** Design 1 requires\n",
410+
"**Assumption 6** for point identification of `WAS_d_lower` (or\n",
411+
"**Assumption 5** for sign identification only) — both are about\n",
412+
"local linearity of the dose-response near `d_lower` and are **not\n",
413+
"testable from data**. The §6 linearity diagnostics (Stute,\n",
414+
"Yatchew, joint pretrends/homogeneity) are necessary but not\n",
415+
"sufficient. Assumption 6 itself is justified from domain\n",
416+
"knowledge (is the marginal effect of the next $1K of supplemental\n",
417+
"spend roughly constant in the $5K-$50K range?). The library fires\n",
418+
"a `UserWarning` on every Design 1 fit; the headline cell above\n",
419+
"lets it surface, subsequent cells filter it as redundant. This is\n",
420+
"the load-bearing methodology caveat alongside the QUG-under-survey\n",
421+
"deferral (§6)."
433422
]
434423
},
435424
{
@@ -439,21 +428,23 @@
439428
"source": [
440429
"## 4. Why the SE inflation is modest for HAD\n",
441430
"\n",
442-
"The HAD `WAS_d_lower` estimand is the **average slope above d_lower**:\n",
443-
"`WAS_{d̲} = (E[ΔY] - lim_{d↓d̲} E[ΔY | D_2 ≤ d]) / E[D_2 - d̲]`\n",
444-
"(REGISTRY § HeterogeneousAdoptionDiD; `had.py:21-31`). The estimator\n",
445-
"uses a **local-linear boundary fit** to estimate the\n",
446-
"`lim_{d↓d_lower} E[ΔY | D_2 ≤ d]` term — the only component of the\n",
447-
"estimand that requires nonparametric identification. The leading-\n",
448-
"order variance is therefore dominated by the influence functions of\n",
449-
"units near `d_lower`, NOT by the full panel. With dose ~ Uniform[5, 50] and\n",
450-
"60 states, only a handful of states sit close to d_lower ~ 5 - and\n",
451-
"those are the units whose IFs dominate `Var(WAS_d_lower)`. The\n",
452-
"PSU-level cluster correlation can amplify the variance only as much\n",
453-
"as those few units are correlated with PSU-mates. With 2 states/PSU\n",
454-
"and only a small share of states near the boundary, the within-PSU\n",
431+
"**The intuition.** `WAS_d_lower` is the average slope above d_lower,\n",
432+
"but its leading-order variance reads off a local-linear boundary\n",
433+
"fit at `d_lower` — and that fit only weights units near the\n",
434+
"boundary. With dose ~ Uniform[5, 50] and 60 states, only a handful\n",
435+
"of states sit close to d_lower ~ 5, and those are the units whose\n",
436+
"influence functions dominate `Var(WAS_d_lower)`. The PSU-level\n",
437+
"cluster correlation can amplify the variance only as much as those\n",
438+
"few units are correlated with PSU-mates. With 2 states/PSU and\n",
439+
"only a small share of states near the boundary, the within-PSU\n",
455440
"correlation has a small lever to act on.\n",
456441
"\n",
442+
"**Formal definition.** `WAS_{d̲} = (E[ΔY] - lim_{d↓d̲} E[ΔY | D_2\n",
443+
"≤ d]) / E[D_2 - d̲]` (REGISTRY § HeterogeneousAdoptionDiD;\n",
444+
"`had.py:21-31`). The estimator uses a local-linear boundary fit at\n",
445+
"`d_lower` to estimate the `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` term — the\n",
446+
"only component requiring nonparametric identification.\n",
447+
"\n",
457448
"Contrast with the event-study path: each event-time horizon is a\n",
458449
"**separate** local-linear fit on that horizon's first differences\n",
459450
"`ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` against the common regressor\n",
@@ -545,9 +536,8 @@
545536
"Refit with `aggregate=\"event_study\"` and `cband=True` to get\n",
546537
"per-horizon ATT estimates plus a sup-t confidence band that adjusts\n",
547538
"for the multiple-horizon comparison. The cband is computed via a\n",
548-
"multiplier bootstrap that aggregates per-PSU IF tensor under the\n",
549-
"survey design (Phase 4.5 B composition; the Phase 4.5 C work\n",
550-
"covered the survey-aware Stute pretests demonstrated in §6).\n"
539+
"multiplier bootstrap that aggregates the per-PSU IF tensor under\n",
540+
"the survey design.\n"
551541
]
552542
},
553543
{
@@ -785,34 +775,22 @@
785775
"> rollout.\n",
786776
"\n",
787777
"> **For the methodologist.** The HAD pretest workflow runs two\n",
788-
"> diagnostic passes; the QUG step is deferred under survey/weights\n",
789-
"> per Phase 4.5 C0 (the load-bearing caveat we owe the audience).\n",
790-
">\n",
791-
"> 1. **Overall (two-period) path:** `Stute` (CvM linearity test on\n",
792-
"> residuals) + `Yatchew-HR` (closed-form weighted-OLS sandwich,\n",
793-
"> `null=\"linearity\"` only - T22 does not exercise the\n",
794-
"> `mean_independence` mode). Both fail-to-reject; verdict reads\n",
795-
"> `\"Stute and Yatchew linearity diagnostics fail-to-reject\n",
796-
"> (linearity-conditional verdict; QUG-under-survey deferred per\n",
797-
"> Phase 4.5 C0)\"`.\n",
798-
">\n",
799-
"> 2. **Event-study path:** `joint pre-trends` (joint-Stute over the\n",
800-
"> three pre-launch placebo horizons) + `joint homogeneity`\n",
801-
"> (joint-Stute over the four post-launch horizons). Both\n",
802-
"> fail-to-reject; verdict reads `\"joint pre-trends and joint\n",
803-
"> linearity diagnostics fail-to-reject (linearity-conditional\n",
804-
"> verdict; QUG-under-survey deferred per Phase 4.5 C0)\"`.\n",
805-
"> `report.yatchew is None` and `report.stute is None` on this\n",
806-
"> path - those single-horizon tests are overall-only.\n",
778+
"> diagnostic passes — overall (`Stute` CvM + `Yatchew-HR`\n",
779+
"> closed-form) on the two-period collapse, and event-study\n",
780+
"> (`joint pre-trends` + `joint homogeneity`, both joint-Stute)\n",
781+
"> on the full panel. Both passes fail-to-reject on this DGP. Both\n",
782+
"> verdicts end in `(linearity-conditional verdict; QUG-under-survey\n",
783+
"> deferred per Phase 4.5 C0)` — the load-bearing C0 caveat. On the\n",
784+
"> event-study path `report.yatchew` and `report.stute` are `None`;\n",
785+
"> those single-horizon tests are overall-only.\n",
807786
">\n",
808-
"> Both paths share the QUG-under-survey deferral suffix. The\n",
809-
"> design-based SE on the headline fit is ~10% larger than the naive\n",
810-
"> SE - smaller than the inflation a CallawaySantAnna or\n",
787+
"> The design-based SE on the headline fit is ~10% larger than the\n",
788+
"> naive SE — smaller than the inflation a CallawaySantAnna or\n",
811789
"> LinearRegression coefficient would see at this PSU correlation,\n",
812-
"> because HAD uses a local-linear boundary fit at d_lower to\n",
813-
"> estimate the boundary-limit term in the `WAS-d_lower` formula\n",
814-
"> (variance is dominated by the few states near the boundary, not\n",
815-
"> by the full panel; see section 4).\n"
790+
"> because HAD uses a local-linear boundary fit at `d_lower` to\n",
791+
"> estimate the boundary-limit term in the `WAS_d_lower` formula\n",
792+
"> (variance is dominated by the few states near the boundary; see\n",
793+
"> §4).\n"
816794
]
817795
},
818796
{

0 commit comments

Comments
 (0)