Address PR #394 R2 review (4 P2/P3 polish items)

igerber · claude · igerber · commit d60f9c18af39 · 2026-04-26T10:32:22.000-04:00
- Delete the stale T20 CHANGELOG bullet that was orphaned under
  [3.3.1] when the new [Unreleased] entry was added in R1; only the
  current Design 1 / WAS_d_lower description survives.
- Update the README index bullet from "WAS" to "WAS_d_lower" and
  spell out the boundary-anchored interpretation, matching the
  tutorial body.
- Make the practitioner decision-tree code block runnable: switch to
  n_periods=2 / cohort_periods=[2] so the panel directly satisfies
  HAD's overall-mode 2-period contract; show the explicit pre-period
  zeroing instead of "..." hand-waving.
- Correct the QUG description in the notebook extensions cell: it is
  the support-infimum test (`H0: d_lower = 0`, adjudicates between
  `continuous_at_zero` and `continuous_near_d_lower`), not a
  constant-per-period-effect test.
- Drift test pinning: add a sample-median assertion to lock the
  README/template "median ~$25K" prose; tighten the pre-launch
  placebo magnitude check from `&lt; 0.5` to `&lt; 0.1` so the notebook's
  "within ±0.06" claim cannot drift unnoticed.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -24,7 +24,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
 - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.
 - **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added.
-- **Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity** (`docs/tutorials/20_had_brand_campaign.ipynb`) — end-to-end practitioner walkthrough for `HeterogeneousAdoptionDiD` on a 60-DMA panel where every market got the campaign at varying intensity and no untreated comparison group exists. Covers the headline Weighted Average Slope (WAS) on a 2-period collapse with `design="auto"` resolving to `continuous_at_zero`, the multi-week event study with per-week pointwise CIs and pre-launch placebos, and a stakeholder communication template. Companion drift-test file `tests/test_t20_had_brand_campaign_drift.py` (12 tests pinning panel composition, design auto-detection, overall WAS, CI endpoints, and per-horizon coverage). Cross-link added from `docs/practitioner_decision_tree.rst` § "Varying Spending Levels" pointing to T20 when the no-untreated-controls regime applies. New `had.py` entry in `docs/doc-deps.yaml` so future HAD source changes flag the methodology, tutorial, and decision-tree dependents via `/docs-impact`.
 
 ## [3.3.0] - 2026-04-25
 
diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst
@@ -263,18 +263,19 @@ a Weighted Average Slope ``WAS_d_lower`` on the dose scale.
 
    from diff_diff import HAD, generate_continuous_did_data
 
-   # Markets where every unit gets some treatment dose - regional spend
-   # ranges from a $5K floor (lightest-touch DMA) to $50K. No DMA at $0;
+   # Two-period example: every unit gets some treatment dose; regional
+   # spend ranges from $5K (lightest-touch DMA) to $50K. No DMA at $0;
    # HAD anchors at the lightest-touch DMA's spend (d_lower) instead.
    data = generate_continuous_did_data(
-       n_units=60, n_periods=8, cohort_periods=[5],
+       n_units=60, n_periods=2, cohort_periods=[2],
        never_treated_frac=0.0, dose_distribution="uniform",
        dose_params={"low": 5.0, "high": 50.0},
        att_function="linear", att_slope=100.0, seed=87,
    )
-   # ... zero out pre-treatment doses to satisfy HAD's D=0 in pre contract ...
+   # HAD requires D=0 in the pre-launch period for every unit.
+   data.loc[data["period"] < data["first_treat"], "dose"] = 0.0
 
-   had = HAD(design="auto")
+   had = HAD(design="auto")  # aggregate="overall" is the default
    results = had.fit(
        data, outcome_col="outcome", dose_col="dose",
        time_col="period", unit_col="unit",
diff --git a/docs/tutorials/20_had_brand_campaign.ipynb b/docs/tutorials/20_had_brand_campaign.ipynb
@@ -389,7 +389,7 @@
     "This tutorial covered HAD's headline workflow: the overall WAS_d_lower fit and the multi-week event study. The library also supports several extensions we did not demonstrate here.\n",
     "\n",
     "- **Population-weighted (survey-aware) inference**: when some markets or regions carry more weight than others - e.g., DMAs weighted by population - HAD accepts a `weights=` array or a `SurveyDesign` object on the same `fit()` interface.\n",
-    "- **Composite pretest workflow**: HAD ships a `did_had_pretest_workflow` that combines the QUG test (constant-per-period effect) with linearity tests (Stute and Yatchew-HR). On the two-period (`aggregate='overall'`) path this workflow checks QUG and linearity only; the parallel-trends step is closed by the multi-period (`aggregate='event_study'`) joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). The visual placebo check we used in Section 4 is a parallel-trends sanity check, not a substitute for the formal joint pretests.\n",
+    "- **Composite pretest workflow**: HAD ships a `did_had_pretest_workflow` that combines the QUG support-infimum test (`H0: d_lower = 0`, which adjudicates between the `continuous_at_zero` and `continuous_near_d_lower` design paths) with linearity tests (Stute and Yatchew-HR). On the two-period (`aggregate='overall'`) path this workflow checks QUG and linearity only; the parallel-trends step is closed by the multi-period (`aggregate='event_study'`) joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). The visual placebo check we used in Section 4 is a parallel-trends sanity check, not a substitute for the formal joint pretests.\n",
     "- **`continuous_at_zero` design path**: if the lightest-touch DMA had no regional add-on (spend exactly $0), HAD switches to the Design 1' identification path with target `WAS` instead of `WAS_d_lower`. The auto-detection picks it up.\n",
     "- **Mass-point design path**: if a meaningful chunk of DMAs sit at exactly the same minimum spend (rather than spread continuously near the boundary), HAD switches to a 2SLS estimator with matching identification logic. Auto-detected as well.\n",
     "\n",
diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md
@@ -98,7 +98,7 @@ Practitioner walkthrough for measuring lift from on/off promotional pulses acros
 ### 20. HAD for National Brand Campaign with Regional Spend Intensity (`20_had_brand_campaign.ipynb`)
 Practitioner walkthrough for measuring per-dollar lift when every market got the campaign at varying intensity and no untreated comparison group exists:
 - The measurement problem framed for heterogeneous-adoption (no-untreated-control) panels
-- `HAD` overall fit on a 2-period collapse for the headline Weighted Average Slope (WAS)
+- `HAD` overall fit on a 2-period collapse, with `design="auto"` resolving to `continuous_near_d_lower` (Design 1) and target `WAS_d_lower` (per-$1K marginal effect above the lightest-touch DMA's spend)
 - Multi-week event study showing per-week dynamics with pre-launch placebos
 - Stakeholder communication template
 - Companion drift-test file (`tests/test_t20_had_brand_campaign_drift.py`)
diff --git a/tests/test_t20_had_brand_campaign_drift.py b/tests/test_t20_had_brand_campaign_drift.py
@@ -123,7 +123,9 @@ def event_study_result(panel):
 def test_panel_composition(panel):
     """Section 2 narrative quotes 60 DMAs over 8 weeks, with regional
     spend ranging from a $5K floor to $50K and every DMA participating
-    (no DMA at $0). If the DGP drifts, this surfaces."""
+    (no DMA at $0). The Section 5 stakeholder template additionally
+    quotes 'median ~$25K' for the spend distribution. If the DGP
+    drifts, this surfaces."""
     assert panel["dma_id"].nunique() == N_UNITS
     assert panel["week"].nunique() == N_PERIODS
     post_doses = (
@@ -136,6 +138,9 @@ def test_panel_composition(panel):
         "got some regional spend'. If a DMA appears at zero the "
         "Section 1/3 narrative is wrong."
     )
+    # Pin the sample median so the README/template "median ~$25K" prose
+    # cannot drift unnoticed (PR #394 R2 P3 fix).
+    assert round(post_doses.median(), 1) == 24.8, post_doses.median()
 
 
 def test_overall_design_auto_detection(overall_result):
@@ -250,16 +255,18 @@ def test_event_study_post_atts_close_to_truth(event_study_result):
 
 def test_event_study_pre_placebos_cover_zero(event_study_result):
     """Section 4 narrative claims pre-launch placebos (e=-2,-3,-4) sit
-    at essentially zero with 95% CIs comfortably bracketing zero.
-    Presence of these horizons is verified separately by
-    `test_event_study_horizons_complete` so we can reach into the
+    at essentially zero (within ±0.06) with 95% CIs comfortably
+    bracketing zero. Presence of these horizons is verified separately
+    by `test_event_study_horizons_complete` so we can reach into the
     arrays without an `if e in event_times` guard that would silently
-    skip a missing horizon (PR #394 R1 P2 fix)."""
+    skip a missing horizon (PR #394 R1 P2 fix). Magnitude pinned to
+    < 0.1 to lock the prose claim of 'within ±0.06' with light slack
+    (PR #394 R2 P3 fix)."""
     event_times = list(event_study_result.event_times)
     atts = list(event_study_result.att)
     ci_lows = list(event_study_result.conf_int_low)
     ci_highs = list(event_study_result.conf_int_high)
     for e in (-2, -3, -4):
         i = event_times.index(e)
-        assert abs(atts[i]) < 0.5, (e, atts[i])
+        assert abs(atts[i]) < 0.1, (e, atts[i])
         assert ci_lows[i] <= 0.0 <= ci_highs[i], (e, ci_lows[i], ci_highs[i])