igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 2 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎TODO.md‎
Lines changed: 1 addition & 1 deletion b/‎TODO.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎diff_diff/guides/llms-practitioner.txt‎
Lines changed: 4 additions & 1 deletion b/‎diff_diff/guides/llms-practitioner.txt‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎diff_diff/guides/llms.txt‎
Lines changed: 1 addition & 0 deletions b/‎diff_diff/guides/llms.txt‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/api/had.rst‎
Lines changed: 9 additions & 0 deletions b/‎docs/api/had.rst‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/doc-deps.yaml‎
Lines changed: 6 additions & 0 deletions b/‎docs/doc-deps.yaml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/methodology/REGISTRY.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/methodology/REGISTRY.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/practitioner_decision_tree.rst‎
Lines changed: 11 additions & 2 deletions b/‎docs/practitioner_decision_tree.rst‎
Lines changed: 11 additions & 2 deletions
diff --git a/‎docs/survey-roadmap.md‎
Lines changed: 29 additions & 0 deletions b/‎docs/survey-roadmap.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎docs/tutorials/20_had_brand_campaign.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎docs/tutorials/20_had_brand_campaign.ipynb‎
Lines changed: 1 addition & 1 deletion
@@ -112,7 +112,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `HeterogeneousAdoptionDiD` Phase 3 R-parity: Phase 3 ships coverage-rate validation on synthetic DGPs (not tight point parity against `chaisemartin::stute_test` / `yatchew_test`). Tight numerical parity requires aligning bootstrap seed semantics and `B` across numpy/R and is deferred. | `tests/test_had_pretests.py` | Phase 3 | Low |
 | `HeterogeneousAdoptionDiD` Phase 3 nprobust bandwidth for Stute: some Stute variants on continuous regressors use nprobust-style optimal bandwidth selection. Phase 3 uses OLS residuals from a 2-parameter linear fit (no bandwidth selection). nprobust integration is a future enhancement; not in paper scope. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low |
 | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low |
-| `HeterogeneousAdoptionDiD` Phase 5 follow-up tutorial (T22 weighted/survey HAD tutorial). T21 HAD pretest workflow notebook landed in PR #409; `practitioner_next_steps()` HAD handlers + `llms-full.txt` HeterogeneousAdoptionDiD section + Choosing-an-Estimator row landed in Phase 5 wave 1 (PR #402). | `tutorials/`, `tests/test_t22_*_drift.py` | Phase 2a | Low |
+| `HeterogeneousAdoptionDiD` Phase 5 follow-up tutorial — SHIPPED. T22 (`docs/tutorials/22_had_survey_design.ipynb` + `tests/test_t22_had_survey_design_drift.py`) landed as the follow-up to PR #432; demonstrates the now-supported `SurveyDesign(strata=...)` path through HAD + `did_had_pretest_workflow` end-to-end on a BRFSS-shape household-panel design. T20 HAD brand-campaign (PR #394), T21 HAD pretest workflow (PR #409), and `practitioner_next_steps()` HAD handlers + `llms-full.txt` HeterogeneousAdoptionDiD section + Choosing-an-Estimator row (Phase 5 wave 1, PR #402) landed earlier. | `tutorials/`, `tests/test_t22_*_drift.py` | Phase 2a (shipped) | Done |
 | `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
 | `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
 | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
 
@@ -193,7 +193,10 @@ see REGISTRY HeterogeneousAdoptionDiD edge cases):
 |        filters to the last cohort + never-treated (Appendix B.2)
 |        and the estimand becomes last-cohort-only WAS — use
 |        ChaisemartinDHaultfoeuille if full multi-cohort staggered
-|        support under continuous treatment is required.
+|        support under continuous treatment is required. For the
+|        survey-weighted HAD workflow on a stratified-PSU design
+|        (BRFSS / CPS / NHANES shape), see Tutorial 22:
+|        docs/tutorials/22_had_survey_design.ipynb.
 |
 Is treatment adoption staggered (multiple cohorts, different timing)?
 |-- YES: Do NOT use plain TWFE. Use one of:
 
@@ -97,6 +97,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [15 Efficient DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/15_efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD — optimal weighting, PT-All vs PT-Post, efficiency gains
 - [16 Survey DiD](https://diff-diff.readthedocs.io/en/stable/tutorials/16_survey_did.html): Survey-weighted DiD — SurveyDesign, strata/PSU/FPC, replicate weights, subpopulation analysis, DEFF diagnostics
 - [16 Wooldridge ETWFE](https://diff-diff.readthedocs.io/en/stable/tutorials/16_wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson (ASF-based ATT), aggregation types
+- [22 HAD Survey-Weighted Workflow](https://diff-diff.readthedocs.io/en/stable/tutorials/22_had_survey_design.html): HeterogeneousAdoptionDiD + did_had_pretest_workflow under SurveyDesign(strata, psu, weights, fpc) — BRFSS-shape panel, modest SE inflation explanation, Phase 4.5 C0 QUG-deferred verdict
 
 ## Survey Support
 
 
@@ -116,6 +116,15 @@ Unit Remains Untreated" (arXiv:2405.04465v6), which:
      weighted-CR1 per-horizon), or drop ``cluster=`` (keeps
      weighted-HC1 sup-t).
 
+.. tip::
+
+   For an end-to-end walkthrough of the survey-aware HAD workflow on a
+   BRFSS-shape stratified household-survey panel - including the now-
+   supported ``SurveyDesign(strata=...)`` path through the Stute pretest
+   family (lifted in PR #432, 2026-05) - see
+   `Tutorial 22: Survey-Weighted HAD
+   <../tutorials/22_had_survey_design.ipynb>`_.
+
 HeterogeneousAdoptionDiD
 ------------------------
 
 
@@ -391,6 +391,9 @@ sources:
       - path: docs/tutorials/21_had_pretest_workflow.ipynb
         type: tutorial
         note: "Drift-locks `HAD(design=\"auto\")` resolution to `continuous_at_zero` on T21's panel via `tests/test_t21_had_pretest_workflow_drift.py::test_had_design_auto_lands_on_continuous_at_zero`; changes to `_detect_design()` heuristic should re-validate T21"
+      - path: docs/tutorials/22_had_survey_design.ipynb
+        type: tutorial
+        note: "Survey-aware HAD walkthrough; drift-locked at `tests/test_t22_had_survey_design_drift.py`. Drift-locks `HAD(design=\"auto\")` resolution to `continuous_near_d_lower` on T22's panel and the `survey_design=` path's SE/CI behavior."
 
   diff_diff/had_pretests.py:
     drift_risk: medium
@@ -410,6 +413,9 @@ sources:
       - path: docs/tutorials/21_had_pretest_workflow.ipynb
         type: tutorial
         note: "Composite pre-test workflow walkthrough; drift-locked at tests/test_t21_had_pretest_workflow_drift.py"
+      - path: docs/tutorials/22_had_survey_design.ipynb
+        type: tutorial
+        note: "Survey-aware pretest workflow walkthrough (overall + event-study under SurveyDesign(strata=...)); drift-locks `_QUG_DEFERRED_SUFFIX`, the event-study summary QUG-skip note, and joint pretrends/homogeneity horizon labels under stratified-clustered Stute bootstrap (PR #432). Drift-locked at tests/test_t22_had_survey_design_drift.py"
 
   diff_diff/local_linear.py:
     drift_risk: low
 
@@ -2526,7 +2526,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
 - **Note:** Horizon labels in `StuteJointResult.horizon_labels` are `str(t)` verbatim and carry STRING IDENTITY ONLY — NOT a chronological ordering key. Callers who need chronological order must preserve the original period values alongside (e.g. from the `pre_periods` / `post_periods` argument).
 - **Note:** NaN propagation is explicit: when any horizon has NaN in residuals, `cvm_stat_joint=NaN`, `p_value=NaN`, `reject=False`, AND `per_horizon_stats={label: np.nan for every horizon}` (full dict preserved with NaN values — not empty, not partial).
 
-**Phase 3 follow-up delivery:** `stute_joint_pretest()`, `joint_pretrends_test()`, `joint_homogeneity_test()`, `StuteJointResult`, and `did_had_pretest_workflow(aggregate="event_study")` shipped together in PR #353 (2026-04). The `practitioner_next_steps()` HAD handlers landed in Phase 5 wave 1 (PR #402); the T21 HAD pretest workflow tutorial landed in PR #409 (Phase 5 wave 2 first slice). T22 weighted/survey HAD tutorial remains queued.
+**Phase 3 follow-up delivery:** `stute_joint_pretest()`, `joint_pretrends_test()`, `joint_homogeneity_test()`, `StuteJointResult`, and `did_had_pretest_workflow(aggregate="event_study")` shipped together in PR #353 (2026-04). The `practitioner_next_steps()` HAD handlers landed in Phase 5 wave 1 (PR #402); the T21 HAD pretest workflow tutorial landed in PR #409 (Phase 5 wave 2 first slice). The T22 survey-weighted HAD tutorial (`docs/tutorials/22_had_survey_design.ipynb`) shipped as the follow-up to PR #432 (2026-05).
 
 **Reference implementation(s):**
 - R: `did_had` (de Chaisemartin, Ciccia, D'Haultfœuille, Knau 2024a); `stute_test` (2024c); `yatchew_test` (Online Appendix, Table 3).
@@ -2574,7 +2574,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
 - [x] Phase 5 (wave 1, PR #402): `llms-full.txt` HeterogeneousAdoptionDiD section + result-class blocks + `## HAD Pretests` index + Choosing-an-Estimator row landed; constructor / fit() parameter names are regression-locked against `inspect.signature(HeterogeneousAdoptionDiD.__init__)` and `HeterogeneousAdoptionDiD.fit` for parameter-name presence (parameter defaults and the non-return parameter type annotations remain unpinned; the `fit()` return-type union is locked BOTH at the source-code level AND at the test level by `TestFitReturnAnnotation`); result-class field tables enumerate every public dataclass field (regression-tested via `dataclasses.fields()`); `llms-practitioner.txt` Step 4 decision tree distinguishes ContinuousDiD (per-dose ATT(d), needs never-treated) from HeterogeneousAdoptionDiD (WAS, universal-rollout-compatible).
 - [x] Phase 5 (partial): README catalog one-liner, bundled `llms.txt` `## Estimators` entry, `docs/api/had.rst` (autoclass for the three classes), and `docs/references.rst` citation landed in PR #372 docs refresh.
 - [x] Phase 5 (wave 2 first slice, PR #409): T21 HAD pretest workflow tutorial (`docs/tutorials/21_had_pretest_workflow.ipynb`) — composite pre-test walkthrough for `did_had_pretest_workflow`. Uses a `Uniform[$0.01K, $50K]` dose-distribution variant of T20's brand-campaign panel (true support strictly positive but near-zero, chosen so QUG fails-to-reject `H0: d_lower = 0` in finite sample). Walks through `aggregate="overall"` (Steps 1 + 3 only, verdict explicitly flags Step 2 deferral) and upgrades to `aggregate="event_study"` (joint pre-trends Stute + joint homogeneity Stute close the gap). Side panel exercises both `yatchew_hr_test` null modes (`linearity` vs `mean_independence`). Companion drift-test file `tests/test_t21_had_pretest_workflow_drift.py` (16 tests pinning panel composition, both verdict pivots, structural anchors, deterministic stats, bootstrap p-value tolerance bands per backend, and `HAD(design="auto")` resolution to `continuous_at_zero` on this panel).
-- [ ] Phase 5 (remaining): T22 weighted/survey HAD tutorial - tracked in `TODO.md`.
+- [x] Phase 5 (wave 2 second slice): T22 weighted/survey HAD tutorial (`docs/tutorials/22_had_survey_design.ipynb`) - shipped as the follow-up to PR #432. End-to-end walkthrough of `HeterogeneousAdoptionDiD` + `did_had_pretest_workflow` under `SurveyDesign(weights, strata, psu, fpc)` on a BRFSS-shape state-rollout panel (5 strata x 6 PSUs/stratum x 2 states/PSU = 60 states; post-stratification raking weights with CV ~ 0.30; FPC = 30 PSUs/stratum). Companion drift-test file `tests/test_t22_had_survey_design_drift.py` (25 tests pinning panel composition, naive-vs-survey SE inflation direction, design auto-detection, event-study cband-vs-pointwise width ordering, `_QUG_DEFERRED_SUFFIX` substring on `report.verdict` for both overall and event-study paths, the distinct `report.summary()` QUG-skip note on the event-study path, deterministic Yatchew sigma2_*, and bootstrap p-value tolerance bands per `feedback_strata_bootstrap_path_divergence` (>= 0.25 abs)).
 - [ ] Documentation of non-testability of Assumptions 5 and 6.
 - [ ] Warnings for staggered treatment timing (redirect to `ChaisemartinDHaultfoeuille`).
 - [ ] `NotImplementedError` phase pointer when `covariates=` is passed (Theorem 6 future work).
 
@@ -315,7 +315,13 @@ identification rests on stronger structural assumptions (Design 1).
    For a full walkthrough including data setup, the design auto-detection
    diagnostic, the multi-week event study, and a stakeholder communication
    template, see `Tutorial 20: HAD for National Brand Campaign with Regional
-   Spend Intensity <tutorials/20_had_brand_campaign.ipynb>`_.
+   Spend Intensity <tutorials/20_had_brand_campaign.ipynb>`_. For the
+   composite pre-test diagnostic walkthrough on top of HAD, see
+   `Tutorial 21: HAD Pre-test Workflow
+   <tutorials/21_had_pretest_workflow.ipynb>`_. For the same workflow under
+   stratified survey weights (BRFSS-shape design), see
+   `Tutorial 22: Survey-Weighted HAD
+   <tutorials/22_had_survey_design.ipynb>`_.
 
 
 .. _section-few-markets:
@@ -412,7 +418,10 @@ See :doc:`practitioner_getting_started` for an end-to-end example.
 
    For a full walkthrough with brand funnel metrics and staggered rollouts, see
    `Tutorial 17: Brand Awareness Survey
-   <tutorials/17_brand_awareness_survey.ipynb>`_.
+   <tutorials/17_brand_awareness_survey.ipynb>`_. For the survey-design path
+   through HAD (universal-rollout, continuous dose, stratified survey weights),
+   see `Tutorial 22: Survey-Weighted HAD
+   <tutorials/22_had_survey_design.ipynb>`_.
 
 
 At a Glance
 
@@ -140,6 +140,35 @@ Supports both TSL and replicate-weight variance.
 See `docs/api/prep.rst` for the API reference and `docs/methodology/REGISTRY.md`
 for the methodology entry.
 
+### Phase 4.5 C: HAD Stute Survey Workflow ✅ Shipped
+
+The HeterogeneousAdoptionDiD pretest family (`stute_test`,
+`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`,
+and the composite `did_had_pretest_workflow`) gained end-to-end
+support for `SurveyDesign(strata=..., psu=..., weights=..., fpc=...)`
+in PR #432 (2026-05). The Stute CvM bootstrap on stratified survey
+designs uses a documented synthesis of clustered-wild-bootstrap
+ingredients (Cameron-Gelbach-Miller 2008 cluster-level multipliers;
+Davidson-Flachaire 2008 wild-bootstrap centering; Wu 1986 / Liu 1988
+Bessel small-sample correction; Djogbenou-MacKinnon-Nielsen 2019
+cluster-wild consistency for nonlinear functionals): within-stratum
+demean + `sqrt(n_h/(n_h-1))` rescale on the PSU multipliers BEFORE
+the per-obs broadcast in the wild-residual loop. The shared helper
+`bootstrap_utils.apply_stratum_centering` backs both the new Stute
+path and the existing HAD sup-t event-study cband bootstrap. The QUG
+step remains permanently deferred under survey/weights (Phase 4.5
+C0); the workflow surfaces this in `report.qug=None` plus the
+`_QUG_DEFERRED_SUFFIX` substring on `report.verdict`. Tutorial 22
+(`docs/tutorials/22_had_survey_design.ipynb`) walks the workflow
+end-to-end on a BRFSS-shape state-rollout panel.
+
+Remaining HAD survey-path deferrals (separate follow-up PRs):
+`lonely_psu='adjust'` + singleton strata (pseudo-stratum centering
+transform not yet derived for the Stute functional — same gap as the
+HAD sup-t deviation at REGISTRY:2382); replicate-weight designs
+(BRR / Fay / JK1 / JKn / SDR — separate Rao-Wu / JKn bootstrap
+composition).
+
 ---
 
 ## Phase 10: Academic Grounding (History)
 
@@ -388,7 +388,7 @@
     "\n",
     "This tutorial covered HAD's headline workflow: the overall WAS_d_lower fit and the multi-week event study. The library also supports several extensions we did not demonstrate here.\n",
     "\n",
-    "- **Population-weighted (survey-aware) inference**: when some markets or regions carry more weight than others - e.g., DMAs weighted by population - HAD accepts a `weights=` array or a `SurveyDesign` object on the same `fit()` interface.\n",
+    "- **Population-weighted (survey-aware) inference**: when some markets or regions carry more weight than others - e.g., DMAs weighted by population - HAD accepts a `SurveyDesign` object on the same `fit()` interface (the deprecated `weights=` and `survey=` kwarg aliases will be removed in the next minor release; use `survey_design=` going forward). [Tutorial 22](22_had_survey_design.ipynb) walks the BRFSS-shape survey-design path end-to-end including the pretest workflow.\n",
     "- **Composite pretest workflow**: HAD ships a `did_had_pretest_workflow` that combines the QUG support-infimum test (`H0: d_lower = 0`, which adjudicates between the `continuous_at_zero` and `continuous_near_d_lower` design paths) with linearity tests (Stute and Yatchew-HR). On the two-period (`aggregate='overall'`) path this workflow checks QUG and linearity only; the parallel-trends step is closed by the multi-period (`aggregate='event_study'`) joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). The visual placebo check we used in Section 4 is a parallel-trends sanity check, not a substitute for the formal joint pretests; see [Tutorial 21](21_had_pretest_workflow.ipynb) for an end-to-end pretest walkthrough.\n",
     "- **`continuous_at_zero` design path**: if the lightest-touch DMA had no regional add-on (spend exactly $0), HAD switches to the Design 1' identification path with target `WAS` instead of `WAS_d_lower`. The auto-detection picks it up.\n",
     "- **Mass-point design path**: if a meaningful chunk of DMAs sit at exactly the same minimum spend (rather than spread continuously near the boundary), HAD switches to a 2SLS estimator with matching identification logic. Auto-detected as well.\n",