Skip to content

Commit d60f9c1

Browse files
igerberclaude
andcommitted
Address PR #394 R2 review (4 P2/P3 polish items)
- Delete the stale T20 CHANGELOG bullet that was orphaned under [3.3.1] when the new [Unreleased] entry was added in R1; only the current Design 1 / WAS_d_lower description survives. - Update the README index bullet from "WAS" to "WAS_d_lower" and spell out the boundary-anchored interpretation, matching the tutorial body. - Make the practitioner decision-tree code block runnable: switch to n_periods=2 / cohort_periods=[2] so the panel directly satisfies HAD's overall-mode 2-period contract; show the explicit pre-period zeroing instead of "..." hand-waving. - Correct the QUG description in the notebook extensions cell: it is the support-infimum test (`H0: d_lower = 0`, adjudicates between `continuous_at_zero` and `continuous_near_d_lower`), not a constant-per-period-effect test. - Drift test pinning: add a sample-median assertion to lock the README/template "median ~$25K" prose; tighten the pre-launch placebo magnitude check from `< 0.5` to `< 0.1` so the notebook's "within ±0.06" claim cannot drift unnoticed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6bd3c4e commit d60f9c1

5 files changed

Lines changed: 21 additions & 14 deletions

File tree

CHANGELOG.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2424
- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
2525
- **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.
2626
- **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added.
27-
- **Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity** (`docs/tutorials/20_had_brand_campaign.ipynb`) — end-to-end practitioner walkthrough for `HeterogeneousAdoptionDiD` on a 60-DMA panel where every market got the campaign at varying intensity and no untreated comparison group exists. Covers the headline Weighted Average Slope (WAS) on a 2-period collapse with `design="auto"` resolving to `continuous_at_zero`, the multi-week event study with per-week pointwise CIs and pre-launch placebos, and a stakeholder communication template. Companion drift-test file `tests/test_t20_had_brand_campaign_drift.py` (12 tests pinning panel composition, design auto-detection, overall WAS, CI endpoints, and per-horizon coverage). Cross-link added from `docs/practitioner_decision_tree.rst` § "Varying Spending Levels" pointing to T20 when the no-untreated-controls regime applies. New `had.py` entry in `docs/doc-deps.yaml` so future HAD source changes flag the methodology, tutorial, and decision-tree dependents via `/docs-impact`.
2827

2928
## [3.3.0] - 2026-04-25
3029

docs/practitioner_decision_tree.rst

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -263,18 +263,19 @@ a Weighted Average Slope ``WAS_d_lower`` on the dose scale.
263263
264264
from diff_diff import HAD, generate_continuous_did_data
265265
266-
# Markets where every unit gets some treatment dose - regional spend
267-
# ranges from a $5K floor (lightest-touch DMA) to $50K. No DMA at $0;
266+
# Two-period example: every unit gets some treatment dose; regional
267+
# spend ranges from $5K (lightest-touch DMA) to $50K. No DMA at $0;
268268
# HAD anchors at the lightest-touch DMA's spend (d_lower) instead.
269269
data = generate_continuous_did_data(
270-
n_units=60, n_periods=8, cohort_periods=[5],
270+
n_units=60, n_periods=2, cohort_periods=[2],
271271
never_treated_frac=0.0, dose_distribution="uniform",
272272
dose_params={"low": 5.0, "high": 50.0},
273273
att_function="linear", att_slope=100.0, seed=87,
274274
)
275-
# ... zero out pre-treatment doses to satisfy HAD's D=0 in pre contract ...
275+
# HAD requires D=0 in the pre-launch period for every unit.
276+
data.loc[data["period"] < data["first_treat"], "dose"] = 0.0
276277
277-
had = HAD(design="auto")
278+
had = HAD(design="auto") # aggregate="overall" is the default
278279
results = had.fit(
279280
data, outcome_col="outcome", dose_col="dose",
280281
time_col="period", unit_col="unit",

docs/tutorials/20_had_brand_campaign.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -389,7 +389,7 @@
389389
"This tutorial covered HAD's headline workflow: the overall WAS_d_lower fit and the multi-week event study. The library also supports several extensions we did not demonstrate here.\n",
390390
"\n",
391391
"- **Population-weighted (survey-aware) inference**: when some markets or regions carry more weight than others - e.g., DMAs weighted by population - HAD accepts a `weights=` array or a `SurveyDesign` object on the same `fit()` interface.\n",
392-
"- **Composite pretest workflow**: HAD ships a `did_had_pretest_workflow` that combines the QUG test (constant-per-period effect) with linearity tests (Stute and Yatchew-HR). On the two-period (`aggregate='overall'`) path this workflow checks QUG and linearity only; the parallel-trends step is closed by the multi-period (`aggregate='event_study'`) joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). The visual placebo check we used in Section 4 is a parallel-trends sanity check, not a substitute for the formal joint pretests.\n",
392+
"- **Composite pretest workflow**: HAD ships a `did_had_pretest_workflow` that combines the QUG support-infimum test (`H0: d_lower = 0`, which adjudicates between the `continuous_at_zero` and `continuous_near_d_lower` design paths) with linearity tests (Stute and Yatchew-HR). On the two-period (`aggregate='overall'`) path this workflow checks QUG and linearity only; the parallel-trends step is closed by the multi-period (`aggregate='event_study'`) joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). The visual placebo check we used in Section 4 is a parallel-trends sanity check, not a substitute for the formal joint pretests.\n",
393393
"- **`continuous_at_zero` design path**: if the lightest-touch DMA had no regional add-on (spend exactly $0), HAD switches to the Design 1' identification path with target `WAS` instead of `WAS_d_lower`. The auto-detection picks it up.\n",
394394
"- **Mass-point design path**: if a meaningful chunk of DMAs sit at exactly the same minimum spend (rather than spread continuously near the boundary), HAD switches to a 2SLS estimator with matching identification logic. Auto-detected as well.\n",
395395
"\n",

docs/tutorials/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ Practitioner walkthrough for measuring lift from on/off promotional pulses acros
9898
### 20. HAD for National Brand Campaign with Regional Spend Intensity (`20_had_brand_campaign.ipynb`)
9999
Practitioner walkthrough for measuring per-dollar lift when every market got the campaign at varying intensity and no untreated comparison group exists:
100100
- The measurement problem framed for heterogeneous-adoption (no-untreated-control) panels
101-
- `HAD` overall fit on a 2-period collapse for the headline Weighted Average Slope (WAS)
101+
- `HAD` overall fit on a 2-period collapse, with `design="auto"` resolving to `continuous_near_d_lower` (Design 1) and target `WAS_d_lower` (per-$1K marginal effect above the lightest-touch DMA's spend)
102102
- Multi-week event study showing per-week dynamics with pre-launch placebos
103103
- Stakeholder communication template
104104
- Companion drift-test file (`tests/test_t20_had_brand_campaign_drift.py`)

tests/test_t20_had_brand_campaign_drift.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,9 @@ def event_study_result(panel):
123123
def test_panel_composition(panel):
124124
"""Section 2 narrative quotes 60 DMAs over 8 weeks, with regional
125125
spend ranging from a $5K floor to $50K and every DMA participating
126-
(no DMA at $0). If the DGP drifts, this surfaces."""
126+
(no DMA at $0). The Section 5 stakeholder template additionally
127+
quotes 'median ~$25K' for the spend distribution. If the DGP
128+
drifts, this surfaces."""
127129
assert panel["dma_id"].nunique() == N_UNITS
128130
assert panel["week"].nunique() == N_PERIODS
129131
post_doses = (
@@ -136,6 +138,9 @@ def test_panel_composition(panel):
136138
"got some regional spend'. If a DMA appears at zero the "
137139
"Section 1/3 narrative is wrong."
138140
)
141+
# Pin the sample median so the README/template "median ~$25K" prose
142+
# cannot drift unnoticed (PR #394 R2 P3 fix).
143+
assert round(post_doses.median(), 1) == 24.8, post_doses.median()
139144

140145

141146
def test_overall_design_auto_detection(overall_result):
@@ -250,16 +255,18 @@ def test_event_study_post_atts_close_to_truth(event_study_result):
250255

251256
def test_event_study_pre_placebos_cover_zero(event_study_result):
252257
"""Section 4 narrative claims pre-launch placebos (e=-2,-3,-4) sit
253-
at essentially zero with 95% CIs comfortably bracketing zero.
254-
Presence of these horizons is verified separately by
255-
`test_event_study_horizons_complete` so we can reach into the
258+
at essentially zero (within ±0.06) with 95% CIs comfortably
259+
bracketing zero. Presence of these horizons is verified separately
260+
by `test_event_study_horizons_complete` so we can reach into the
256261
arrays without an `if e in event_times` guard that would silently
257-
skip a missing horizon (PR #394 R1 P2 fix)."""
262+
skip a missing horizon (PR #394 R1 P2 fix). Magnitude pinned to
263+
< 0.1 to lock the prose claim of 'within ±0.06' with light slack
264+
(PR #394 R2 P3 fix)."""
258265
event_times = list(event_study_result.event_times)
259266
atts = list(event_study_result.att)
260267
ci_lows = list(event_study_result.conf_int_low)
261268
ci_highs = list(event_study_result.conf_int_high)
262269
for e in (-2, -3, -4):
263270
i = event_times.index(e)
264-
assert abs(atts[i]) < 0.5, (e, atts[i])
271+
assert abs(atts[i]) < 0.1, (e, atts[i])
265272
assert ci_lows[i] <= 0.0 <= ci_highs[i], (e, ci_lows[i], ci_highs[i])

0 commit comments

Comments
 (0)