Address PR #350 CI review round 1: P0 constant-dose + P1 staggered-detection

igerber · claude · igerber · commit 1bfec37582c1 · 2026-04-22T07:30:13.000-04:00
**P0 (blocker):** `_aggregate_multi_period_first_differences` reuses
`D_{g, F}` as the single regressor for every event-time horizon. Without
validation, panels where a unit's dose varies across post-treatment
periods silently misattribute later-horizon effects to the period-F dose.

Fix: `_validate_had_panel_event_study` now rejects panels where any unit
has time-varying dose across post-periods (within-unit spread beyond
float tolerance), with a `ValueError` redirecting to
ChaisemartinDHaultfoeuille for genuinely time-varying regimes.

**P1:** Staggered-timing auto-filter previously only ran inside
`if first_treat_col is not None`. Multi-cohort panels without cohort
metadata slipped through, treating later-cohort units as zero-dose
"controls" at the inferred F, violating Appendix B.2's last-cohort-only
contract.

Fix: When `first_treat_col is None`, the validator computes per-unit
first-positive-dose period from the dose path. If multiple distinct
cohorts are detected, it raises a `ValueError` directing users to
pass `first_treat_col` (which activates the last-cohort auto-filter)
or use ChaisemartinDHaultfoeuille for full staggered support.

**P2 (docs):** Reconciled contradictory REGISTRY guidance between the
legacy edge-case note (line ~2251) and the new Phase 2b last-cohort
filter note. Both now describe the auto-filter + front-door rejection
of un-annotated staggered panels consistently.

**P2 (tests):** Added regression tests for both blockers:
- `test_time_varying_post_F_dose_rejected`
- `test_staggered_without_first_treat_col_rejected`

Also added a **Note (Phase 2b constant-dose requirement)** block to
REGISTRY documenting the new validator guard. TODO.md entry updated
to reflect front-door rejection of time-varying doses (not silent
reuse as before).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/TODO.md b/TODO.md
@@ -98,7 +98,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `HeterogeneousAdoptionDiD` Phase 3: `qug_test()`, `stute_test()`, `yatchew_hr_test()` pre-test diagnostics (paper Section 3.3). Composite helper `did_had_pretest_workflow()`. Not part of Phase 2a scope. | `diff_diff/had.py`, new module | Phase 2a | Medium |
 | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low |
 | `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
-| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b uses `D_{g, F}` (first-treatment-period dose) as the single regressor for ALL event-time horizons (paper convention assumes "once treated, stay treated with same dose"). Panels where `D_{g,t}` varies for `t >= F` get the period-F dose used throughout — correct under the constant-dose interpretation but lossy under time-varying regimes. Paper Section 2 scope. | `diff_diff/had.py::_aggregate_multi_period_first_differences` | Phase 2b | Low |
+| `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low |
 | `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
 
 #### Performance
diff --git a/diff_diff/had.py b/diff_diff/had.py
@@ -1116,6 +1116,73 @@ def _validate_had_panel_event_study(
             f"column."
         )
 
+    # Staggered-without-``first_treat_col`` detection. When cohort metadata
+    # is not supplied, the dose-invariant period classification still
+    # declares t=F=min-post-period based on "any unit has nonzero dose".
+    # That silently accepts staggered panels where units have DIFFERENT
+    # first-positive-dose periods: the later-treated cohorts enter
+    # ``d_arr`` as zero-dose "controls" at the inferred F, violating
+    # paper Appendix B.2's last-cohort-only contract. Compute per-unit
+    # first-positive-dose period directly from the dose path and raise
+    # if multiple cohorts are present, directing users to pass
+    # ``first_treat_col`` (which activates the last-cohort auto-filter)
+    # or to use ChaisemartinDHaultfoeuille for full staggered support.
+    if first_treat_col is None:
+        df_sorted = data_filtered.sort_values([unit_col, time_col])
+        # For each unit, the first period at which dose > 0.
+        pos_mask_global = df_sorted[dose_col] > 0
+        first_pos_per_unit = df_sorted.loc[pos_mask_global].groupby(unit_col)[time_col].first()
+        cohort_labels = list(first_pos_per_unit.unique())
+        if len(cohort_labels) > 1:
+            try:
+                distinct_cohorts = sorted(cohort_labels, key=lambda x: (x is None, x))
+            except TypeError:
+                distinct_cohorts = list(cohort_labels)
+            raise ValueError(
+                f"Staggered-timing panel detected (first_treat_col is "
+                f"None): {len(distinct_cohorts)} distinct first-positive-"
+                f"dose periods {distinct_cohorts!r} across units. HAD's "
+                f"last-cohort auto-filter (paper Appendix B.2) only runs "
+                f"when first_treat_col is supplied so the estimator can "
+                f"identify cohorts. Pass first_treat_col=<column> to "
+                f"enable the auto-filter to the last cohort, or use "
+                f"ChaisemartinDHaultfoeuille (did_multiplegt_dyn) for "
+                f"full staggered support."
+            )
+
+    # Constant post-period dose check. Paper Appendix B.2 assumes
+    # "once treated, stay treated with the same dose"; the event-study
+    # aggregation uses ``D_{g, F}`` as the single regressor for every
+    # event-time horizon. Panels where a unit's dose varies across
+    # post-periods (e.g., phased adoption, dose changes after F) would
+    # silently misattribute later-horizon effects to the period-F dose.
+    # Reject front-door with a redirect to ChaisemartinDHaultfoeuille
+    # for genuinely time-varying post-treatment doses.
+    if len(t_post_list) > 1:
+        post_data = data_filtered.loc[post_mask]
+        dose_spread_per_unit = post_data.groupby(unit_col)[dose_col].agg(
+            lambda x: float(x.max() - x.min())
+        )
+        abs_max_dose = float(np.max(np.abs(post_doses))) if post_doses.size else 0.0
+        tol = 1e-12 * max(1.0, abs_max_dose)
+        bad_mask = dose_spread_per_unit > tol
+        if bool(bad_mask.any()):
+            n_bad = int(bad_mask.sum())
+            max_spread = float(dose_spread_per_unit.max())
+            raise ValueError(
+                f"HAD event-study requires constant dose within unit for "
+                f"all post-treatment periods t >= F={F!r}. {n_bad} unit(s) "
+                f"have time-varying doses across post-periods "
+                f"{t_post_list!r} (max within-unit spread={max_spread!r}, "
+                f"tolerance={tol!r}). The aggregation uses D_{{g, F}} as "
+                f"the single regressor for every event-time horizon "
+                f"(paper Appendix B.2 constant-dose convention), so "
+                f"silently accepting time-varying post-treatment doses "
+                f"would misattribute later-horizon effects. For genuinely "
+                f"time-varying post-treatment doses use "
+                f"ChaisemartinDHaultfoeuille (did_multiplegt_dyn)."
+            )
+
     return F, t_pre_list, t_post_list, data_filtered, filter_info
 
 
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2248,7 +2248,7 @@ so `δ_0` is recovered by OLS of `ΔY` on `X` and `D_2 * X`; Average Slope is `(
 - **Extensive-margin effects**: ruled out by Assumption 3. If a jump `Y_2(0) ≠ Y_2(0+)` is suspected, the target parameter and estimator are not appropriate.
 - **Partial identification of WAS_{d̲}**: only identified up to a positive constant offset `≤ ε` by the bound in Equation 22 (Jensen inequality argument in Appendix C.3).
 - **Density at boundary**: Assumption 4 requires `f_{D_2}(0) > 0`. This is a non-trivial assumption since 0 is on the boundary of `Supp(D_2)`.
-- **Variation in treatment timing**: Appendix B.2 - "in designs with variation in treatment timing, there must be an untreated group, at least till the period where the last cohort gets treated." The implementation errors (hard fail, not warning) on this configuration and redirects users to `ChaisemartinDHaultfoeuille`.
+- **Variation in treatment timing**: Appendix B.2 - "in designs with variation in treatment timing, there must be an untreated group, at least till the period where the last cohort gets treated." In Phase 2b (`aggregate="event_study"`) the implementation auto-filters to the last-treatment cohort plus never-treated units with a `UserWarning` when `first_treat_col` is supplied (see Phase 2b last-cohort filter note below); when `first_treat_col` is omitted the estimator detects multiple first-positive-dose cohorts from the dose path and raises a front-door `ValueError` directing users to pass `first_treat_col` or use `ChaisemartinDHaultfoeuille`.
 - **Mechanical zero at reference period under linear trends (Footnote 13, main text p. 31)**: with industry/unit-specific linear trends, the pre-trends estimator is mechanically zero in the second-to-last pre-period (the slope anchor year). Practical consequence: that year is not an informative placebo check.
 
 *Algorithm (Design 1' nonparametric - summarized from Section 3.1.3-3.1.4 and Equations 7-8):*
@@ -2330,7 +2330,8 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
     - **Note (Phase 2a/2b scope):** Phase 2a ships the single-period `aggregate="overall"` path; Phase 2b lifts `aggregate="event_study"` (Appendix B.2 multi-period extension) which returns a `HeterogeneousAdoptionDiDEventStudyResults` with per-event-time WAS estimates and pointwise CIs. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR.
     - **Note (panel-only):** The paper (Section 2) defines HAD on *panel or repeated cross-section* data, but both the overall and event-study paths ship a panel-only implementation: `HeterogeneousAdoptionDiD.fit()` requires a balanced panel with a unit identifier so that unit-level first differences `ΔY_{g,t} = Y_{g,t} - Y_{g,t_anchor}` can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator. RCS support is queued for a follow-up PR (tracked in `TODO.md`); it will need a separate identification path based on pre/post cell means rather than unit-level differences.
 - [x] Phase 2b: Multi-period event-study extension (Appendix B.2). `aggregate="event_study"` produces per-event-time WAS estimates using a uniform `F-1` baseline (`ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` for every horizon), reusing the three Phase 2a design paths on per-horizon first differences. Pre-period placebos included for `e <= -2` (the anchor `e = -1` is skipped since `ΔY = 0` trivially). Post-period estimates for `e >= 0`. The joint Stute test (Equation 18) across pre-periods is a SEPARATE diagnostic deferred to Phase 3 (pre-test diagnostics).
-    - **Note (Phase 2b last-cohort filter):** When `first_treat_col` indicates more than one nonzero cohort, the panel is auto-filtered to the last-treatment cohort (`F_last = max(cohorts)`) **plus never-treated units** (`first_treat = 0`), with a `UserWarning` naming kept/dropped unit counts and dropped cohort labels. Paper Appendix B.2 is explicit that HAD "may be used only for the LAST treatment cohort in a staggered design"; the auto-filter implements this prescription, retaining never-treated units per the paper's "there must be an untreated group, at least till the period where the last cohort gets treated" requirement. Only earlier-cohort units (with `first_treat > 0` and `< F_last`) are dropped — never-treated units satisfy the dose invariant at every period (`D = 0` throughout) and preserve Design 1' identifiability (boundary at `0`) when last-cohort doses are uniformly positive. Panels without `first_treat_col` with >2 periods infer `F` from the dose invariant (all-zero-dose periods → pre; any-nonzero period → post) and require dose contiguity (pre-periods < post-periods in natural ordering). Non-contiguous dose sequences (e.g., reverse treatment) raise with a pointer to `ChaisemartinDHaultfoeuille`.
+    - **Note (Phase 2b last-cohort filter):** When `first_treat_col` indicates more than one nonzero cohort, the panel is auto-filtered to the last-treatment cohort (`F_last = max(cohorts)`) **plus never-treated units** (`first_treat = 0`), with a `UserWarning` naming kept/dropped unit counts and dropped cohort labels. Paper Appendix B.2 is explicit that HAD "may be used only for the LAST treatment cohort in a staggered design"; the auto-filter implements this prescription, retaining never-treated units per the paper's "there must be an untreated group, at least till the period where the last cohort gets treated" requirement. Only earlier-cohort units (with `first_treat > 0` and `< F_last`) are dropped — never-treated units satisfy the dose invariant at every period (`D = 0` throughout) and preserve Design 1' identifiability (boundary at `0`) when last-cohort doses are uniformly positive. When `first_treat_col` is omitted on a >2-period panel, the validator infers each unit's first-positive-dose period from the dose path; if multiple distinct first-positive-dose cohorts are detected, the estimator raises a front-door `ValueError` directing users to pass `first_treat_col` (which activates the auto-filter) or use `ChaisemartinDHaultfoeuille` for full staggered support — there is no silent acceptance of staggered panels without cohort metadata. Common-adoption panels (single first-positive-dose cohort, or only never-treated + one cohort) pass through unchanged with `F` inferred from the dose invariant, and require dose contiguity (pre-periods < post-periods in natural ordering). Non-contiguous dose sequences (e.g., reverse treatment) raise with a pointer to `ChaisemartinDHaultfoeuille`.
+    - **Note (Phase 2b constant-dose requirement):** The event-study aggregation uses `D_{g, F}` (first-treatment-period dose) as the single regressor for every event-time horizon, per paper Appendix B.2's "once treated, stay treated with the same dose" convention. The validator REJECTS panels where a unit has time-varying dose across post-treatment periods (`D_{g, t} != D_{g, F}` for any `t >= F` within-unit, beyond float tolerance) with a front-door `ValueError`, directing users with genuinely time-varying post-treatment doses to `ChaisemartinDHaultfoeuille` (`did_multiplegt_dyn`). Silent acceptance would misattribute later-horizon treatment-effect heterogeneity to the period-F dose. A follow-up PR could implement a time-varying-dose estimator; tracked in `TODO.md`.
     - **Note (Phase 2b per-horizon SE):** Each event-time horizon uses an INDEPENDENT sandwich computed on that horizon's first differences: continuous paths use the CCT-2014 robust SE from Phase 1c divided by `|den|`; mass-point path uses the structural-residual 2SLS sandwich from Phase 2a. This produces pointwise CIs per horizon, matching the paper's Pierce-Schott application (Section 5.2, Figure 2: "nonparametric pointwise CIs"). Joint cross-horizon covariance (IF-based stacking or block bootstrap) is NOT computed — the paper does not derive it and all reported CIs are pointwise. Follow-up PRs may add joint covariance for cross-horizon hypothesis tests; current tracking in `TODO.md`.
     - **Note (Phase 2b baseline convention):** All event-time horizons use a uniform `F-1` anchor: `ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` for every `t`. This is consistent with the paper's Garrett-et-al. application (Section 5.1: "outcome `Y_{g,t} - Y_{g,2001}`" where `F = 2002`), simplifies event-time indexing (`e = t - F` so `e = -1` is the anchor, skipped), and keeps the implementation symmetric for pre- and post-period horizons. The paper review text's asymmetric "`Y_{g,t} - Y_{g,1}` for pre" / "`Y_{g,t} - Y_{g,F-1}` for post" phrasing is covered by the uniform convention since both give the same placebo interpretation under parallel trends (the paper's own applications use the uniform anchor).
     - **Note (Phase 2b result class):** `aggregate="event_study"` returns a new `HeterogeneousAdoptionDiDEventStudyResults` dataclass (distinct from the single-period `HeterogeneousAdoptionDiDResults`) with per-horizon arrays (`event_times`, `att`, `se`, `t_stat`, `p_value`, `conf_int_low`, `conf_int_high`, `n_obs_per_horizon`) and shared metadata. `to_dataframe()` returns a tidy per-horizon DataFrame; `to_dict()` returns a dict with list-of-per-horizon fields. The static return-type annotation on `fit()` is `HeterogeneousAdoptionDiDResults` (the common case); callers passing `aggregate="event_study"` should annotate their variable as `HeterogeneousAdoptionDiDEventStudyResults` for type checkers.
diff --git a/tests/test_had.py b/tests/test_had.py
@@ -2690,6 +2690,74 @@ def test_no_pre_period_rejected(self):
                 panel, "outcome", "dose", "period", "unit", aggregate="event_study"
             )
 
+    def test_time_varying_post_F_dose_rejected(self):
+        """Within-unit dose variation across post-periods raises.
+
+        Paper Appendix B.2 assumes "once treated, stay treated with the
+        same dose"; the aggregation uses ``D_{g, F}`` as the single
+        regressor for every horizon. Silent acceptance of time-varying
+        post-treatment doses would misattribute later-horizon effects.
+        Covers CI reviewer round 1 P0: `_aggregate_multi_period_first_differences`
+        would otherwise use period-F dose for all horizons.
+        """
+        rng = np.random.default_rng(0)
+        G = 50
+        rows = []
+        for g in range(G):
+            d_F = float(rng.uniform(0.1, 0.5))
+            d_F_plus_1 = d_F + 0.3  # time-varying: dose changes after F
+            for t in range(1, 6):
+                if t < 3:
+                    dose = 0.0
+                elif t == 3:
+                    dose = d_F
+                else:
+                    dose = d_F_plus_1  # different from d_F
+                rows.append(
+                    {
+                        "unit": g,
+                        "period": t,
+                        "dose": dose,
+                        "outcome": rng.standard_normal(),
+                    }
+                )
+        panel = pd.DataFrame(rows)
+        with pytest.raises(ValueError, match="constant dose|time-varying"):
+            HeterogeneousAdoptionDiD(design="auto").fit(
+                panel, "outcome", "dose", "period", "unit", aggregate="event_study"
+            )
+
+    def test_staggered_without_first_treat_col_rejected(self):
+        """Multi-cohort panel without first_treat_col raises (not silent).
+
+        Without cohort metadata, the dose-invariant period classification
+        would silently treat later-cohort units as zero-dose "controls"
+        at the inferred F, violating Appendix B.2's last-cohort-only
+        contract. Covers CI reviewer round 1 P1.
+        """
+        rng = np.random.default_rng(0)
+        G = 100
+        rows = []
+        for g in range(G):
+            # Assign cohort: half treat at t=3, half at t=5.
+            F_g = 3 if g < G // 2 else 5
+            d_g = float(rng.uniform(0.1, 1.0))
+            for t in range(1, 7):
+                dose = d_g if t >= F_g else 0.0
+                rows.append(
+                    {
+                        "unit": g,
+                        "period": t,
+                        "dose": dose,
+                        "outcome": rng.standard_normal(),
+                    }
+                )
+        panel = pd.DataFrame(rows)
+        with pytest.raises(ValueError, match="Staggered-timing|first_treat_col"):
+            HeterogeneousAdoptionDiD(design="auto").fit(
+                panel, "outcome", "dose", "period", "unit", aggregate="event_study"
+            )
+
 
 class TestEventStudyGuardsPreserved:
     """Phase 2a policy guards fire on the event-study path too."""