Skip to content

Commit acbf699

Browse files
igerberclaude
andcommitted
Address PR #392 R2 review (2 P1)
P1 (non-terminal-base methodology guard): joint_pretrends_test(trends_lin=True) and joint_homogeneity_test( trends_lin=True) silently accepted base_period values other than the last validated pre-period (= F-1). Paper Eq 17 / Eq 18 specifically anchor the detrending at F-1 with slope Y[F-1] - Y[F-2]; non- terminal anchors compute a different slope at a different anchor — silently changing the methodology away from the documented Eq 17/18 / R DIDHAD::did_had(trends_lin=TRUE) construction. Fix: under trends_lin=True, require base_period == t_pre_list[-1] inside the validator block. The workflow and HAD.fit always pass F-1, so they're unaffected. Direct callers passing a non-terminal base get a clear ValueError pointing at t_pre_list[-1]. Regression tests (2): - joint_pretrends_test(base=F-2, trends_lin=True) raises - joint_homogeneity_test(base=F-2, trends_lin=True) raises P1 (ordered-categorical observed-period base-1 lookup): The previous base_period - 1 lookup walked period_rank (= the full ordered-categorical level list) decrementing by rank. On panels with unused intermediate categorical levels (e.g. dtype levels [t1, t2, t_unused, t3, t4] with only t1..t4 observed), base=t3 under trends_lin would resolve base-1 to t_unused (rank 2), and the slope-pivot lookup would KeyError because t_unused is not in the data. Fix: derive base_minus_1 from the validator's t_pre_list[-2] — the validator builds t_pre_list from observed contiguous pre- periods only, so it correctly skips unused categorical levels. Both joint wrappers and the workflow now use the same observed- period predecessor, eliminating the failure mode. Restructure: moved the trends_lin consumed-placebo drop from before the validator block into it (so t_pre_list is in scope). Added an early `n_periods < 3` gate for trends_lin so 2-period panels get a clean error instead of silently failing downstream. Regression test: - joint_pretrends_test on an ordered-categorical panel with an unused intermediate level succeeds (no KeyError on slope pivot) Stats: 538 tests pass (535 prior + 3 new R3 P1 regressions), 0 regressions. Parity tolerances unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1abd5b5 commit acbf699

2 files changed

Lines changed: 211 additions & 62 deletions

File tree

diff_diff/had_pretests.py

Lines changed: 114 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -3516,55 +3516,14 @@ def joint_pretrends_test(
35163516
f"(base_period={base_period!r})."
35173517
)
35183518

3519-
# ---- trends_lin: identify the consumed placebo (base_period - 1)
3520-
# and drop it from the test set BEFORE aggregation.
3521-
# The F-2 → F-1 evolution is "consumed" by the per-group slope
3522-
# estimator: at t = base_period - 1 the detrended dy_t = dy_t -
3523-
# (-1) × slope = (Y_{base-1} - Y_base) + (Y_base - Y_{base-1}) = 0
3524-
# for every unit. Feeding that all-zero residual into
3525-
# `stute_joint_pretest` would trip the exact-linear short-circuit
3526-
# and report a mechanical p_value=1.0 — a confidently-non-rejecting
3527-
# placebo that is actually no placebo at all. Drop it explicitly,
3528-
# mirroring R's "max placebo lag reduces by 1" convention and our
3529-
# HAD.fit `e=-2` drop. Emits UserWarning when this filter fires
3530-
# so the caller knows their `pre_periods` was modified.
3519+
# ---- trends_lin: defer the consumed-placebo drop and base-1
3520+
# identification until AFTER the validator block runs (so we can
3521+
# use t_pre_list to enforce the non-terminal-base guard and the
3522+
# observed-period predecessor consistently). On 2-period panels
3523+
# the validator does not run and trends_lin needs F-2, which is
3524+
# impossible — front-door-reject here.
35313525
base_minus_1_period: Any = None
35323526
pre_periods_effective = list(pre_periods)
3533-
if trends_lin:
3534-
for p, r in period_rank.items():
3535-
if r == base_rank - 1:
3536-
base_minus_1_period = p
3537-
break
3538-
if base_minus_1_period is None:
3539-
raise ValueError(
3540-
f"joint_pretrends_test(trends_lin=True) requires the "
3541-
f"period immediately before base_period={base_period!r} "
3542-
f"to exist in the panel (rank {base_rank - 1}). The "
3543-
f"per-group linear-trend slope Y[g, base] - Y[g, base-1] "
3544-
f"is not identified without it. Available periods: "
3545-
f"{sorted(period_rank.keys(), key=lambda t: period_rank[t])!r}."
3546-
)
3547-
if base_minus_1_period in pre_periods_effective:
3548-
warnings.warn(
3549-
f"joint_pretrends_test(trends_lin=True): dropping "
3550-
f"period {base_minus_1_period!r} from pre_periods — it "
3551-
f"is the 'consumed' placebo (the F-2 → F-1 evolution "
3552-
f"used by the per-group slope estimator), so under "
3553-
f"trends_lin its detrended residual is mechanically "
3554-
f"zero. R's `did_had(trends_lin=TRUE)` reduces max "
3555-
f"placebo lag by 1 with the same effect.",
3556-
UserWarning,
3557-
stacklevel=2,
3558-
)
3559-
pre_periods_effective = [t for t in pre_periods_effective if t != base_minus_1_period]
3560-
if len(pre_periods_effective) == 0:
3561-
raise ValueError(
3562-
f"joint_pretrends_test(trends_lin=True): no testable "
3563-
f"placebo horizons remain after dropping the consumed "
3564-
f"placebo at base_period - 1 = {base_minus_1_period!r}. "
3565-
f"Pass at least one earlier pre-period (rank < "
3566-
f"{base_rank - 1}) when using trends_lin=True."
3567-
)
35683527

35693528
# Event-study validation contract (paper Appendix B.2):
35703529
# When the panel has >= 3 distinct periods, always route through
@@ -3577,6 +3536,13 @@ def joint_pretrends_test(
35773536
# panels the validator does not apply; skip and fall through to the
35783537
# simpler balance/invariant guards in `_aggregate_for_joint_test`.
35793538
n_periods = int(data[time_col].nunique())
3539+
if trends_lin and n_periods < 3:
3540+
raise ValueError(
3541+
f"joint_pretrends_test(trends_lin=True) requires a panel "
3542+
f"with at least 3 distinct time periods so the per-group "
3543+
f"slope Y[g, base] - Y[g, base - 1] is identified. Got "
3544+
f"n_periods={n_periods}."
3545+
)
35803546
data_filtered: pd.DataFrame = data
35813547
if n_periods >= 3:
35823548
F_val, t_pre_list, _t_post_list, data_filtered, _filter_info = (
@@ -3610,6 +3576,68 @@ def joint_pretrends_test(
36103576
f"periods. Not-pre entries: {not_pre!r}. Validator's "
36113577
f"pre-period set: {list(t_pre_list)!r}."
36123578
)
3579+
# PR #392 R3 P1 (non-terminal base guard): paper Eq 17 / Eq 18
3580+
# and R `DIDHAD::did_had(..., trends_lin=TRUE)` anchor the
3581+
# detrending at F-1 (the last validated pre-period) and use
3582+
# Y[F-1] - Y[F-2] as the slope. A direct caller passing
3583+
# base_period < F-1 (e.g. F-2) would compute a different slope
3584+
# at a different anchor, silently changing the methodology
3585+
# away from the documented Eq 17/18 construction. Reject
3586+
# explicitly. Workflow + HAD.fit always pass F-1; this check
3587+
# only fires on direct user calls with non-terminal bases.
3588+
if trends_lin and base_period != t_pre_list[-1]:
3589+
raise ValueError(
3590+
f"joint_pretrends_test(trends_lin=True) requires "
3591+
f"base_period to equal the last validated pre-period "
3592+
f"({t_pre_list[-1]!r}, the canonical Eq 17 anchor "
3593+
f"F-1). Got base_period={base_period!r}. Anchoring at "
3594+
f"any other pre-period would compute a different "
3595+
f"slope and detrending that does not match paper "
3596+
f"Eq 17 / Eq 18 or R DIDHAD::did_had(trends_lin=TRUE)."
3597+
)
3598+
# PR #392 R3 P1 (observed-period base-1 lookup) + R1 P0
3599+
# (consumed-placebo drop) consolidated:
3600+
# base_minus_1_period = t_pre_list[-2] (= F-2, the validated
3601+
# observed pre-period immediately before F-1). Using
3602+
# t_pre_list ensures correctness on ordered-categorical panels
3603+
# with unused intermediate levels (the validator's t_pre_list
3604+
# is built from observed contiguous pre-periods, not from the
3605+
# full dtype's category list). Then drop t_pre_list[-2] from
3606+
# pre_periods if present (the consumed placebo whose detrended
3607+
# residual is mechanically zero).
3608+
if trends_lin:
3609+
if len(t_pre_list) < 2:
3610+
raise ValueError(
3611+
f"joint_pretrends_test(trends_lin=True) requires "
3612+
f"at least 2 validated pre-periods so the per-"
3613+
f"group slope Y[g, F-1] - Y[g, F-2] is identified. "
3614+
f"Got t_pre_list={list(t_pre_list)!r}."
3615+
)
3616+
base_minus_1_period = t_pre_list[-2]
3617+
if base_minus_1_period in pre_periods_effective:
3618+
warnings.warn(
3619+
f"joint_pretrends_test(trends_lin=True): dropping "
3620+
f"period {base_minus_1_period!r} from pre_periods "
3621+
f"— it is the 'consumed' placebo (the F-2 → F-1 "
3622+
f"evolution used by the per-group slope "
3623+
f"estimator), so under trends_lin its detrended "
3624+
f"residual is mechanically zero. R's "
3625+
f"`did_had(trends_lin=TRUE)` reduces max placebo "
3626+
f"lag by 1 with the same effect.",
3627+
UserWarning,
3628+
stacklevel=2,
3629+
)
3630+
pre_periods_effective = [
3631+
t for t in pre_periods_effective if t != base_minus_1_period
3632+
]
3633+
if len(pre_periods_effective) == 0:
3634+
raise ValueError(
3635+
f"joint_pretrends_test(trends_lin=True): no testable "
3636+
f"placebo horizons remain after dropping the consumed "
3637+
f"placebo at base_period - 1 = {base_minus_1_period!r}. "
3638+
f"Pass at least one earlier observed pre-period when "
3639+
f"using trends_lin=True."
3640+
)
36133641

36143642
d_arr, dy_by_horizon, _ = _aggregate_for_joint_test(
36153643
data_filtered,
@@ -3915,6 +3943,14 @@ def joint_homogeneity_test(
39153943
# time-varying post-dose would make the per-horizon refit on
39163944
# `[1, D_g]` misspecify the regressor.
39173945
n_periods = int(data[time_col].nunique())
3946+
if trends_lin and n_periods < 3:
3947+
raise ValueError(
3948+
f"joint_homogeneity_test(trends_lin=True) requires a "
3949+
f"panel with at least 3 distinct time periods so the "
3950+
f"per-group slope Y[g, base] - Y[g, base - 1] is "
3951+
f"identified. Got n_periods={n_periods}."
3952+
)
3953+
base_minus_1_period_validated: Any = None # set inside validator block under trends_lin
39183954
data_filtered: pd.DataFrame = data
39193955
if n_periods >= 3:
39203956
F_val, t_pre_list, t_post_list, data_filtered, _filter_info = (
@@ -3946,6 +3982,30 @@ def joint_homogeneity_test(
39463982
f"periods. Not-post entries: {not_post!r}. Validator's "
39473983
f"post-period set: {list(t_post_list)!r}."
39483984
)
3985+
# PR #392 R3 P1 (non-terminal base guard + observed-period
3986+
# base-1 lookup, twin of joint_pretrends_test). Eq 17 anchors
3987+
# at F-1 and uses Y[F-1] - Y[F-2] as slope; require base ==
3988+
# t_pre_list[-1] AND derive base-1 from t_pre_list[-2].
3989+
if trends_lin and base_period != t_pre_list[-1]:
3990+
raise ValueError(
3991+
f"joint_homogeneity_test(trends_lin=True) requires "
3992+
f"base_period to equal the last validated pre-period "
3993+
f"({t_pre_list[-1]!r}, the canonical Eq 17 anchor "
3994+
f"F-1). Got base_period={base_period!r}. Anchoring at "
3995+
f"any other pre-period would compute a different "
3996+
f"slope and detrending that does not match paper "
3997+
f"Eq 17 / page 32 or R DIDHAD::did_had(trends_lin=TRUE)."
3998+
)
3999+
if trends_lin and len(t_pre_list) < 2:
4000+
raise ValueError(
4001+
f"joint_homogeneity_test(trends_lin=True) requires "
4002+
f"at least 2 validated pre-periods so the per-group "
4003+
f"slope Y[g, F-1] - Y[g, F-2] is identified. Got "
4004+
f"t_pre_list={list(t_pre_list)!r}."
4005+
)
4006+
# Capture the validator's predecessor for downstream use.
4007+
if trends_lin:
4008+
base_minus_1_period_validated = t_pre_list[-2]
39494009

39504010
d_arr, dy_by_horizon, _ = _aggregate_for_joint_test(
39514011
data_filtered,
@@ -3988,20 +4048,13 @@ def joint_homogeneity_test(
39884048
# dy_t. The post-period delta = t_rank - base_rank > 0, so the
39894049
# subtraction extrapolates the linear trend FORWARD into post-periods.
39904050
if trends_lin:
3991-
base_minus_1_period_h: Any = None
3992-
for p, r in period_rank.items():
3993-
if r == base_rank - 1:
3994-
base_minus_1_period_h = p
3995-
break
3996-
if base_minus_1_period_h is None:
3997-
raise ValueError(
3998-
f"joint_homogeneity_test(trends_lin=True) requires the "
3999-
f"period immediately before base_period={base_period!r} "
4000-
f"to exist in the panel (rank {base_rank - 1}). The "
4001-
f"per-group linear-trend slope Y[g, base] - Y[g, base-1] "
4002-
f"is not identified without it. Available periods: "
4003-
f"{sorted(period_rank.keys(), key=lambda t: period_rank[t])!r}."
4004-
)
4051+
# PR #392 R3 P1: use the validator's t_pre_list[-2] as the
4052+
# predecessor (captured above as base_minus_1_period_validated).
4053+
# This is robust to ordered-categorical panels with unused
4054+
# intermediate levels because the validator builds t_pre_list
4055+
# from observed contiguous pre-periods, not the full dtype
4056+
# category list.
4057+
base_minus_1_period_h = base_minus_1_period_validated
40054058
slope_subset_h = data_filtered[
40064059
data_filtered[time_col].isin([base_period, base_minus_1_period_h])
40074060
]

tests/test_had_pretests.py

Lines changed: 97 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4539,7 +4539,7 @@ def test_homogeneity_trends_lin_missing_base_minus_1_raises(self):
45394539
# before treatment), post=[4, 5].
45404540
df = self._panel(rng_seed=14, F=4, T=5)
45414541
df_trim = df[df["time"] >= 3].copy()
4542-
with pytest.raises(ValueError, match="period immediately before base_period"):
4542+
with pytest.raises(ValueError, match=r"at least 2 (validated )?pre-periods"):
45434543
joint_homogeneity_test(
45444544
df_trim,
45454545
"y",
@@ -4716,6 +4716,102 @@ def test_workflow_trends_lin_minimal_panel_skips_step2_gracefully(self):
47164716
), "expected homogeneity_joint to still run after step 2 skip"
47174717
assert np.isfinite(report.homogeneity_joint.p_value)
47184718

4719+
def test_pretrends_trends_lin_nonterminal_base_raises(self):
4720+
"""Direct caller passing base_period < t_pre_list[-1] under
4721+
trends_lin=True must raise — Eq 17 anchors at F-1.
4722+
Regression for PR #392 R3 P1 (methodology guard)."""
4723+
df = self._panel(rng_seed=40)
4724+
# Panel periods 1..5, F=4. t_pre_list = [1, 2, 3], F-1 = 3.
4725+
# Pass base_period=2 (non-terminal pre-period).
4726+
with pytest.raises(ValueError, match="last validated pre-period"):
4727+
joint_pretrends_test(
4728+
df,
4729+
"y",
4730+
"d",
4731+
"time",
4732+
"unit",
4733+
pre_periods=[1],
4734+
base_period=2,
4735+
n_bootstrap=99,
4736+
seed=42,
4737+
trends_lin=True,
4738+
)
4739+
4740+
def test_homogeneity_trends_lin_nonterminal_base_raises(self):
4741+
"""Twin guard for joint_homogeneity_test."""
4742+
df = self._panel(rng_seed=41)
4743+
with pytest.raises(ValueError, match="last validated pre-period"):
4744+
joint_homogeneity_test(
4745+
df,
4746+
"y",
4747+
"d",
4748+
"time",
4749+
"unit",
4750+
post_periods=[4, 5],
4751+
base_period=2,
4752+
n_bootstrap=99,
4753+
seed=42,
4754+
trends_lin=True,
4755+
)
4756+
4757+
def test_pretrends_trends_lin_unused_categorical_observed_only(self):
4758+
"""Ordered categorical time column with an unused intermediate
4759+
level: trends_lin must resolve base_period - 1 to the previous
4760+
OBSERVED period, not to the unused level (which would KeyError
4761+
on the slope-pivot lookup). Regression for PR #392 R3 P1."""
4762+
df_int = self._panel(rng_seed=42)
4763+
# Convert time to ordered categorical with an unused intermediate
4764+
# level inserted between observed levels t3 and t4.
4765+
cat_levels = ["t1", "t2", "t3", "t_unused", "t4", "t5"]
4766+
time_map = {1: "t1", 2: "t2", 3: "t3", 4: "t4", 5: "t5"}
4767+
df = df_int.copy()
4768+
df["time"] = pd.Categorical(
4769+
df["time"].map(time_map),
4770+
categories=cat_levels,
4771+
ordered=True,
4772+
)
4773+
# Sanity: t_unused is in the dtype but absent from data.
4774+
assert "t_unused" in df["time"].cat.categories
4775+
assert "t_unused" not in set(df["time"].dropna().unique())
4776+
# F=4 → t_pre_list = [t1, t2, t3], base must equal t3 under
4777+
# trends_lin. Under the OLD period_rank lookup, base_minus_1
4778+
# by rank would resolve to t_unused (rank 2 below t3=rank 4...
4779+
# wait actually rank-1 = rank(t3)-1 = 3-1 = 2, which is t3
4780+
# itself wait no t3 = rank 2). Let me reason: cat_levels
4781+
# ranks t1=0, t2=1, t3=2, t_unused=3, t4=4, t5=5. For
4782+
# base=t3 (rank 2), base-1 by rank = rank 1 = t2 (correct).
4783+
# Better demonstration: pass base=t4 (post-period) — but that
4784+
# would be invalid by other guards. Use a setup where the
4785+
# unused level lies BEFORE base in chronology: place
4786+
# t_unused between t2 and t3, then base=t3.
4787+
cat_levels2 = ["t1", "t2", "t_unused", "t3", "t4", "t5"]
4788+
df2 = df_int.copy()
4789+
df2["time"] = pd.Categorical(
4790+
df2["time"].map(time_map),
4791+
categories=cat_levels2,
4792+
ordered=True,
4793+
)
4794+
# Now base=t3 (rank 3); base-1 by rank = t_unused (rank 2,
4795+
# not in data). Old code would KeyError on the slope pivot;
4796+
# new observed-only lookup resolves to t2 (the previous
4797+
# observed period). Verify the call SUCCEEDS.
4798+
with warnings.catch_warnings():
4799+
warnings.simplefilter("ignore", UserWarning)
4800+
r = joint_pretrends_test(
4801+
df2,
4802+
"y",
4803+
"d",
4804+
"time",
4805+
"unit",
4806+
pre_periods=["t1"],
4807+
base_period="t3",
4808+
n_bootstrap=99,
4809+
seed=42,
4810+
trends_lin=True,
4811+
)
4812+
assert np.isfinite(r.cvm_stat_joint)
4813+
assert np.isfinite(r.p_value)
4814+
47194815
def test_workflow_trends_lin_with_overall_aggregate_raises(self):
47204816
"""trends_lin=True only valid on event_study aggregate."""
47214817
df = self._panel(rng_seed=34)

0 commit comments

Comments
 (0)