Address PR #392 R2 review (2 P1)

igerber · claude · igerber · commit acbf69952359 · 2026-04-26T09:57:13.000-04:00
P1 (non-terminal-base methodology guard):
joint_pretrends_test(trends_lin=True) and joint_homogeneity_test(
trends_lin=True) silently accepted base_period values other than the
last validated pre-period (= F-1). Paper Eq 17 / Eq 18 specifically
anchor the detrending at F-1 with slope Y[F-1] - Y[F-2]; non-
terminal anchors compute a different slope at a different anchor —
silently changing the methodology away from the documented Eq
17/18 / R DIDHAD::did_had(trends_lin=TRUE) construction.

Fix: under trends_lin=True, require base_period == t_pre_list[-1]
inside the validator block. The workflow and HAD.fit always pass
F-1, so they're unaffected. Direct callers passing a non-terminal
base get a clear ValueError pointing at t_pre_list[-1].

Regression tests (2):
  - joint_pretrends_test(base=F-2, trends_lin=True) raises
  - joint_homogeneity_test(base=F-2, trends_lin=True) raises

P1 (ordered-categorical observed-period base-1 lookup):
The previous base_period - 1 lookup walked period_rank (= the full
ordered-categorical level list) decrementing by rank. On panels
with unused intermediate categorical levels (e.g. dtype levels
[t1, t2, t_unused, t3, t4] with only t1..t4 observed), base=t3
under trends_lin would resolve base-1 to t_unused (rank 2), and
the slope-pivot lookup would KeyError because t_unused is not in
the data.

Fix: derive base_minus_1 from the validator's t_pre_list[-2] —
the validator builds t_pre_list from observed contiguous pre-
periods only, so it correctly skips unused categorical levels.
Both joint wrappers and the workflow now use the same observed-
period predecessor, eliminating the failure mode.

Restructure: moved the trends_lin consumed-placebo drop from
before the validator block into it (so t_pre_list is in scope).
Added an early `n_periods &lt; 3` gate for trends_lin so 2-period
panels get a clean error instead of silently failing downstream.

Regression test:
  - joint_pretrends_test on an ordered-categorical panel with an
    unused intermediate level succeeds (no KeyError on slope pivot)

Stats: 538 tests pass (535 prior + 3 new R3 P1 regressions), 0
regressions. Parity tolerances unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/had_pretests.py b/diff_diff/had_pretests.py
@@ -3516,55 +3516,14 @@ def joint_pretrends_test(
             f"(base_period={base_period!r})."
         )
 
-    # ---- trends_lin: identify the consumed placebo (base_period - 1)
-    # and drop it from the test set BEFORE aggregation.
-    # The F-2 → F-1 evolution is "consumed" by the per-group slope
-    # estimator: at t = base_period - 1 the detrended dy_t = dy_t -
-    # (-1) × slope = (Y_{base-1} - Y_base) + (Y_base - Y_{base-1}) = 0
-    # for every unit. Feeding that all-zero residual into
-    # `stute_joint_pretest` would trip the exact-linear short-circuit
-    # and report a mechanical p_value=1.0 — a confidently-non-rejecting
-    # placebo that is actually no placebo at all. Drop it explicitly,
-    # mirroring R's "max placebo lag reduces by 1" convention and our
-    # HAD.fit `e=-2` drop. Emits UserWarning when this filter fires
-    # so the caller knows their `pre_periods` was modified.
+    # ---- trends_lin: defer the consumed-placebo drop and base-1
+    # identification until AFTER the validator block runs (so we can
+    # use t_pre_list to enforce the non-terminal-base guard and the
+    # observed-period predecessor consistently). On 2-period panels
+    # the validator does not run and trends_lin needs F-2, which is
+    # impossible — front-door-reject here.
     base_minus_1_period: Any = None
     pre_periods_effective = list(pre_periods)
-    if trends_lin:
-        for p, r in period_rank.items():
-            if r == base_rank - 1:
-                base_minus_1_period = p
-                break
-        if base_minus_1_period is None:
-            raise ValueError(
-                f"joint_pretrends_test(trends_lin=True) requires the "
-                f"period immediately before base_period={base_period!r} "
-                f"to exist in the panel (rank {base_rank - 1}). The "
-                f"per-group linear-trend slope Y[g, base] - Y[g, base-1] "
-                f"is not identified without it. Available periods: "
-                f"{sorted(period_rank.keys(), key=lambda t: period_rank[t])!r}."
-            )
-        if base_minus_1_period in pre_periods_effective:
-            warnings.warn(
-                f"joint_pretrends_test(trends_lin=True): dropping "
-                f"period {base_minus_1_period!r} from pre_periods — it "
-                f"is the 'consumed' placebo (the F-2 → F-1 evolution "
-                f"used by the per-group slope estimator), so under "
-                f"trends_lin its detrended residual is mechanically "
-                f"zero. R's `did_had(trends_lin=TRUE)` reduces max "
-                f"placebo lag by 1 with the same effect.",
-                UserWarning,
-                stacklevel=2,
-            )
-            pre_periods_effective = [t for t in pre_periods_effective if t != base_minus_1_period]
-        if len(pre_periods_effective) == 0:
-            raise ValueError(
-                f"joint_pretrends_test(trends_lin=True): no testable "
-                f"placebo horizons remain after dropping the consumed "
-                f"placebo at base_period - 1 = {base_minus_1_period!r}. "
-                f"Pass at least one earlier pre-period (rank < "
-                f"{base_rank - 1}) when using trends_lin=True."
-            )
 
     # Event-study validation contract (paper Appendix B.2):
     # When the panel has >= 3 distinct periods, always route through
@@ -3577,6 +3536,13 @@ def joint_pretrends_test(
     # panels the validator does not apply; skip and fall through to the
     # simpler balance/invariant guards in `_aggregate_for_joint_test`.
     n_periods = int(data[time_col].nunique())
+    if trends_lin and n_periods < 3:
+        raise ValueError(
+            f"joint_pretrends_test(trends_lin=True) requires a panel "
+            f"with at least 3 distinct time periods so the per-group "
+            f"slope Y[g, base] - Y[g, base - 1] is identified. Got "
+            f"n_periods={n_periods}."
+        )
     data_filtered: pd.DataFrame = data
     if n_periods >= 3:
         F_val, t_pre_list, _t_post_list, data_filtered, _filter_info = (
@@ -3610,6 +3576,68 @@ def joint_pretrends_test(
                 f"periods. Not-pre entries: {not_pre!r}. Validator's "
                 f"pre-period set: {list(t_pre_list)!r}."
             )
+        # PR #392 R3 P1 (non-terminal base guard): paper Eq 17 / Eq 18
+        # and R `DIDHAD::did_had(..., trends_lin=TRUE)` anchor the
+        # detrending at F-1 (the last validated pre-period) and use
+        # Y[F-1] - Y[F-2] as the slope. A direct caller passing
+        # base_period < F-1 (e.g. F-2) would compute a different slope
+        # at a different anchor, silently changing the methodology
+        # away from the documented Eq 17/18 construction. Reject
+        # explicitly. Workflow + HAD.fit always pass F-1; this check
+        # only fires on direct user calls with non-terminal bases.
+        if trends_lin and base_period != t_pre_list[-1]:
+            raise ValueError(
+                f"joint_pretrends_test(trends_lin=True) requires "
+                f"base_period to equal the last validated pre-period "
+                f"({t_pre_list[-1]!r}, the canonical Eq 17 anchor "
+                f"F-1). Got base_period={base_period!r}. Anchoring at "
+                f"any other pre-period would compute a different "
+                f"slope and detrending that does not match paper "
+                f"Eq 17 / Eq 18 or R DIDHAD::did_had(trends_lin=TRUE)."
+            )
+        # PR #392 R3 P1 (observed-period base-1 lookup) + R1 P0
+        # (consumed-placebo drop) consolidated:
+        # base_minus_1_period = t_pre_list[-2] (= F-2, the validated
+        # observed pre-period immediately before F-1). Using
+        # t_pre_list ensures correctness on ordered-categorical panels
+        # with unused intermediate levels (the validator's t_pre_list
+        # is built from observed contiguous pre-periods, not from the
+        # full dtype's category list). Then drop t_pre_list[-2] from
+        # pre_periods if present (the consumed placebo whose detrended
+        # residual is mechanically zero).
+        if trends_lin:
+            if len(t_pre_list) < 2:
+                raise ValueError(
+                    f"joint_pretrends_test(trends_lin=True) requires "
+                    f"at least 2 validated pre-periods so the per-"
+                    f"group slope Y[g, F-1] - Y[g, F-2] is identified. "
+                    f"Got t_pre_list={list(t_pre_list)!r}."
+                )
+            base_minus_1_period = t_pre_list[-2]
+            if base_minus_1_period in pre_periods_effective:
+                warnings.warn(
+                    f"joint_pretrends_test(trends_lin=True): dropping "
+                    f"period {base_minus_1_period!r} from pre_periods "
+                    f"— it is the 'consumed' placebo (the F-2 → F-1 "
+                    f"evolution used by the per-group slope "
+                    f"estimator), so under trends_lin its detrended "
+                    f"residual is mechanically zero. R's "
+                    f"`did_had(trends_lin=TRUE)` reduces max placebo "
+                    f"lag by 1 with the same effect.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+                pre_periods_effective = [
+                    t for t in pre_periods_effective if t != base_minus_1_period
+                ]
+            if len(pre_periods_effective) == 0:
+                raise ValueError(
+                    f"joint_pretrends_test(trends_lin=True): no testable "
+                    f"placebo horizons remain after dropping the consumed "
+                    f"placebo at base_period - 1 = {base_minus_1_period!r}. "
+                    f"Pass at least one earlier observed pre-period when "
+                    f"using trends_lin=True."
+                )
 
     d_arr, dy_by_horizon, _ = _aggregate_for_joint_test(
         data_filtered,
@@ -3915,6 +3943,14 @@ def joint_homogeneity_test(
     # time-varying post-dose would make the per-horizon refit on
     # `[1, D_g]` misspecify the regressor.
     n_periods = int(data[time_col].nunique())
+    if trends_lin and n_periods < 3:
+        raise ValueError(
+            f"joint_homogeneity_test(trends_lin=True) requires a "
+            f"panel with at least 3 distinct time periods so the "
+            f"per-group slope Y[g, base] - Y[g, base - 1] is "
+            f"identified. Got n_periods={n_periods}."
+        )
+    base_minus_1_period_validated: Any = None  # set inside validator block under trends_lin
     data_filtered: pd.DataFrame = data
     if n_periods >= 3:
         F_val, t_pre_list, t_post_list, data_filtered, _filter_info = (
@@ -3946,6 +3982,30 @@ def joint_homogeneity_test(
                 f"periods. Not-post entries: {not_post!r}. Validator's "
                 f"post-period set: {list(t_post_list)!r}."
             )
+        # PR #392 R3 P1 (non-terminal base guard + observed-period
+        # base-1 lookup, twin of joint_pretrends_test). Eq 17 anchors
+        # at F-1 and uses Y[F-1] - Y[F-2] as slope; require base ==
+        # t_pre_list[-1] AND derive base-1 from t_pre_list[-2].
+        if trends_lin and base_period != t_pre_list[-1]:
+            raise ValueError(
+                f"joint_homogeneity_test(trends_lin=True) requires "
+                f"base_period to equal the last validated pre-period "
+                f"({t_pre_list[-1]!r}, the canonical Eq 17 anchor "
+                f"F-1). Got base_period={base_period!r}. Anchoring at "
+                f"any other pre-period would compute a different "
+                f"slope and detrending that does not match paper "
+                f"Eq 17 / page 32 or R DIDHAD::did_had(trends_lin=TRUE)."
+            )
+        if trends_lin and len(t_pre_list) < 2:
+            raise ValueError(
+                f"joint_homogeneity_test(trends_lin=True) requires "
+                f"at least 2 validated pre-periods so the per-group "
+                f"slope Y[g, F-1] - Y[g, F-2] is identified. Got "
+                f"t_pre_list={list(t_pre_list)!r}."
+            )
+        # Capture the validator's predecessor for downstream use.
+        if trends_lin:
+            base_minus_1_period_validated = t_pre_list[-2]
 
     d_arr, dy_by_horizon, _ = _aggregate_for_joint_test(
         data_filtered,
@@ -3988,20 +4048,13 @@ def joint_homogeneity_test(
     # dy_t. The post-period delta = t_rank - base_rank > 0, so the
     # subtraction extrapolates the linear trend FORWARD into post-periods.
     if trends_lin:
-        base_minus_1_period_h: Any = None
-        for p, r in period_rank.items():
-            if r == base_rank - 1:
-                base_minus_1_period_h = p
-                break
-        if base_minus_1_period_h is None:
-            raise ValueError(
-                f"joint_homogeneity_test(trends_lin=True) requires the "
-                f"period immediately before base_period={base_period!r} "
-                f"to exist in the panel (rank {base_rank - 1}). The "
-                f"per-group linear-trend slope Y[g, base] - Y[g, base-1] "
-                f"is not identified without it. Available periods: "
-                f"{sorted(period_rank.keys(), key=lambda t: period_rank[t])!r}."
-            )
+        # PR #392 R3 P1: use the validator's t_pre_list[-2] as the
+        # predecessor (captured above as base_minus_1_period_validated).
+        # This is robust to ordered-categorical panels with unused
+        # intermediate levels because the validator builds t_pre_list
+        # from observed contiguous pre-periods, not the full dtype
+        # category list.
+        base_minus_1_period_h = base_minus_1_period_validated
         slope_subset_h = data_filtered[
             data_filtered[time_col].isin([base_period, base_minus_1_period_h])
         ]
diff --git a/tests/test_had_pretests.py b/tests/test_had_pretests.py
@@ -4539,7 +4539,7 @@ def test_homogeneity_trends_lin_missing_base_minus_1_raises(self):
         # before treatment), post=[4, 5].
         df = self._panel(rng_seed=14, F=4, T=5)
         df_trim = df[df["time"] >= 3].copy()
-        with pytest.raises(ValueError, match="period immediately before base_period"):
+        with pytest.raises(ValueError, match=r"at least 2 (validated )?pre-periods"):
             joint_homogeneity_test(
                 df_trim,
                 "y",
@@ -4716,6 +4716,102 @@ def test_workflow_trends_lin_minimal_panel_skips_step2_gracefully(self):
         ), "expected homogeneity_joint to still run after step 2 skip"
         assert np.isfinite(report.homogeneity_joint.p_value)
 
+    def test_pretrends_trends_lin_nonterminal_base_raises(self):
+        """Direct caller passing base_period < t_pre_list[-1] under
+        trends_lin=True must raise — Eq 17 anchors at F-1.
+        Regression for PR #392 R3 P1 (methodology guard)."""
+        df = self._panel(rng_seed=40)
+        # Panel periods 1..5, F=4. t_pre_list = [1, 2, 3], F-1 = 3.
+        # Pass base_period=2 (non-terminal pre-period).
+        with pytest.raises(ValueError, match="last validated pre-period"):
+            joint_pretrends_test(
+                df,
+                "y",
+                "d",
+                "time",
+                "unit",
+                pre_periods=[1],
+                base_period=2,
+                n_bootstrap=99,
+                seed=42,
+                trends_lin=True,
+            )
+
+    def test_homogeneity_trends_lin_nonterminal_base_raises(self):
+        """Twin guard for joint_homogeneity_test."""
+        df = self._panel(rng_seed=41)
+        with pytest.raises(ValueError, match="last validated pre-period"):
+            joint_homogeneity_test(
+                df,
+                "y",
+                "d",
+                "time",
+                "unit",
+                post_periods=[4, 5],
+                base_period=2,
+                n_bootstrap=99,
+                seed=42,
+                trends_lin=True,
+            )
+
+    def test_pretrends_trends_lin_unused_categorical_observed_only(self):
+        """Ordered categorical time column with an unused intermediate
+        level: trends_lin must resolve base_period - 1 to the previous
+        OBSERVED period, not to the unused level (which would KeyError
+        on the slope-pivot lookup). Regression for PR #392 R3 P1."""
+        df_int = self._panel(rng_seed=42)
+        # Convert time to ordered categorical with an unused intermediate
+        # level inserted between observed levels t3 and t4.
+        cat_levels = ["t1", "t2", "t3", "t_unused", "t4", "t5"]
+        time_map = {1: "t1", 2: "t2", 3: "t3", 4: "t4", 5: "t5"}
+        df = df_int.copy()
+        df["time"] = pd.Categorical(
+            df["time"].map(time_map),
+            categories=cat_levels,
+            ordered=True,
+        )
+        # Sanity: t_unused is in the dtype but absent from data.
+        assert "t_unused" in df["time"].cat.categories
+        assert "t_unused" not in set(df["time"].dropna().unique())
+        # F=4 → t_pre_list = [t1, t2, t3], base must equal t3 under
+        # trends_lin. Under the OLD period_rank lookup, base_minus_1
+        # by rank would resolve to t_unused (rank 2 below t3=rank 4...
+        # wait actually rank-1 = rank(t3)-1 = 3-1 = 2, which is t3
+        # itself wait no t3 = rank 2). Let me reason: cat_levels
+        # ranks t1=0, t2=1, t3=2, t_unused=3, t4=4, t5=5. For
+        # base=t3 (rank 2), base-1 by rank = rank 1 = t2 (correct).
+        # Better demonstration: pass base=t4 (post-period) — but that
+        # would be invalid by other guards. Use a setup where the
+        # unused level lies BEFORE base in chronology: place
+        # t_unused between t2 and t3, then base=t3.
+        cat_levels2 = ["t1", "t2", "t_unused", "t3", "t4", "t5"]
+        df2 = df_int.copy()
+        df2["time"] = pd.Categorical(
+            df2["time"].map(time_map),
+            categories=cat_levels2,
+            ordered=True,
+        )
+        # Now base=t3 (rank 3); base-1 by rank = t_unused (rank 2,
+        # not in data). Old code would KeyError on the slope pivot;
+        # new observed-only lookup resolves to t2 (the previous
+        # observed period). Verify the call SUCCEEDS.
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            r = joint_pretrends_test(
+                df2,
+                "y",
+                "d",
+                "time",
+                "unit",
+                pre_periods=["t1"],
+                base_period="t3",
+                n_bootstrap=99,
+                seed=42,
+                trends_lin=True,
+            )
+        assert np.isfinite(r.cvm_stat_joint)
+        assert np.isfinite(r.p_value)
+
     def test_workflow_trends_lin_with_overall_aggregate_raises(self):
         """trends_lin=True only valid on event_study aggregate."""
         df = self._panel(rng_seed=34)