Skip to content

Commit 5788e26

Browse files
igerberclaude
andcommitted
Address forty-second round of CI review findings on PR #318
Round-42 landed two P1 findings: 1. All-undefined pre-period surface routed to ``skipped`` instead of ``inconclusive`` (``diagnostic_report.py``). When every pre-row is dropped by ``_collect_pre_period_coefs`` for undefined inference (all ``se <= 0`` / non-finite effect/se), the collector returns ``([], n_dropped_undefined > 0)``. Both the applicability gate and ``_pt_event_study`` treated that as "no coefficients available" and skipped, letting BR drop the identifying-assumption warning. Fixed both sites to detect the all-undefined case and route to the explicit ``method="inconclusive"`` runner alongside the partial- undefined case already covered by R33. BR's existing inconclusive phrasing lifts through unchanged. 2. Source-faithful assumption text for ``ImputationDiDResults`` and ``TwoStageDiDResults`` (``business_report.py``). BR's ``_describe_assumption`` was grouping both with CS / SA / Wooldridge under the generic "parallel trends across treatment cohorts and time periods (group-time ATT)" template, but BJS (2024) and Gardner (2022) both identify through an untreated-potential-outcome model: unit+time FE fitted on untreated observations (``Omega_0`` = never-treated + not-yet-treated) deliver the counterfactual, and the identifying restriction is on ``E[Y_it(0)] = alpha_i + beta_t`` — not on cohort-time ATT equality. Split each into its own branch mirroring REGISTRY.md §ImputationDiD (lines 1000-1013) and §TwoStageDiD (lines 1113-1128), including the Gardner-BJS algebraic-equivalence note. Tests: 3 new regressions. - ``test_all_pre_periods_undefined_yields_inconclusive_not_skipped``: all pre-rows with ``se == 0``, asserts DR emits ``method="inconclusive"`` / ``status="ran"`` / ``n_pre_periods=0`` / ``n_dropped_undefined=2``, and BR summary emits "inconclusive". - ``test_imputation_did_assumption_uses_untreated_fe_model`` and ``test_two_stage_did_assumption_uses_untreated_fe_model``: lock the new ``parallel_trends_variant="untreated_outcome_fe_model"`` tag, require the registry-backed source attribution and untreated-subset detail, and reject the pre-R42 generic-PT template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 020d537 commit 5788e26

4 files changed

Lines changed: 277 additions & 18 deletions

File tree

diff_diff/business_report.py

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1232,11 +1232,79 @@ def _describe_assumption(estimator_name: str, results: Any = None) -> Dict[str,
12321232
block["control_group"] = clean_control
12331233
block["clean_control"] = clean_control
12341234
return block
1235+
if estimator_name == "ImputationDiDResults":
1236+
# Borusyak, Jaravel & Spiess (2024) — identification is through
1237+
# an untreated-potential-outcome model: unit+time FE (optionally
1238+
# plus covariates) fitted on untreated observations only
1239+
# (``Omega_0``) deliver the counterfactual ``Y_it(0)``, and the
1240+
# treatment effect ``tau_it`` is the residual on treated
1241+
# observations. Writing this as generic "group-time ATT
1242+
# parallel trends" misstates the identifying model — the
1243+
# restriction is on the UNTREATED outcome's additive FE
1244+
# structure, not on cohort-time ATT equality. REGISTRY.md
1245+
# §ImputationDiD lines 1000-1013 and Assumption 1 (parallel
1246+
# trends) + Assumption 2 (no anticipation on untreated
1247+
# observations). Round-42 P1 CI review on PR #318 flagged this
1248+
# source-faithfulness gap.
1249+
return {
1250+
"parallel_trends_variant": "untreated_outcome_fe_model",
1251+
"no_anticipation": True,
1252+
"description": (
1253+
"Identification under Imputation DiD (Borusyak, Jaravel "
1254+
"& Spiess 2024): the untreated potential outcome "
1255+
"``Y_it(0)`` follows an additive unit+time fixed-effects "
1256+
"model ``Y_it(0) = alpha_i + beta_t [+ X'_it * delta] + "
1257+
"epsilon_it``. Step 1 estimates those FE on untreated "
1258+
"observations only (``Omega_0`` = never-treated plus "
1259+
"not-yet-treated cells); Step 2 imputes the "
1260+
"counterfactual for treated observations from the "
1261+
"fitted FE; Step 3 aggregates ``tau_hat_it = Y_it - "
1262+
"Y_hat_it(0)`` with researcher-chosen weights. The "
1263+
"identifying restriction is therefore parallel trends "
1264+
"of the UNTREATED outcome model (Assumption 1) — "
1265+
"``E[Y_it(0)] = alpha_i + beta_t``, holding across all "
1266+
"observations — rather than equality of cohort-time "
1267+
"ATTs. Also assumes no anticipation on untreated "
1268+
"observations (Assumption 2) and absorbing treatment."
1269+
),
1270+
}
1271+
if estimator_name == "TwoStageDiDResults":
1272+
# Gardner (2022) — identification is the same as BJS
1273+
# ImputationDiD (point estimates are algebraically equivalent
1274+
# per REGISTRY.md §TwoStageDiD line 1130): unit+time FE
1275+
# estimated on untreated observations only deliver the
1276+
# untreated potential-outcome trajectory; Stage 2 regresses
1277+
# the resulting residuals on treatment indicators. Writing
1278+
# this as generic "group-time ATT parallel trends" loses the
1279+
# load-bearing detail that Stage 1 operates only on untreated
1280+
# cells. REGISTRY.md §TwoStageDiD lines 1113-1128 and
1281+
# Assumption (same as ImputationDiD). Round-42 P1 CI review on
1282+
# PR #318 flagged this source-faithfulness gap.
1283+
return {
1284+
"parallel_trends_variant": "untreated_outcome_fe_model",
1285+
"no_anticipation": True,
1286+
"description": (
1287+
"Identification under Two-Stage DiD (Gardner 2022): "
1288+
"Stage 1 fits unit + time fixed effects on untreated "
1289+
"observations only (``Omega_0``), residualizing the "
1290+
"outcome as ``y_tilde_it = Y_it - alpha_hat_i - "
1291+
"beta_hat_t``; Stage 2 regresses residualized outcomes "
1292+
"on the treatment indicator across treated observations "
1293+
"to recover the ATT. The point estimates are "
1294+
"algebraically equivalent to Borusyak-Jaravel-Spiess "
1295+
"imputation (both rely on the same untreated-outcome FE "
1296+
"model to construct the counterfactual). The "
1297+
"identifying restriction is therefore parallel trends "
1298+
"of the UNTREATED outcome: ``E[Y_it(0)] = alpha_i + "
1299+
"beta_t`` for all observations (not a group-time ATT "
1300+
"equality across cohorts). Also assumes no anticipation "
1301+
"(``Y_it = Y_it(0)`` for all untreated observations) "
1302+
"and absorbing / irreversible treatment."
1303+
),
1304+
}
12351305
if estimator_name in {
12361306
"CallawaySantAnnaResults",
12371307
"SunAbrahamResults",
1238-
"ImputationDiDResults",
1239-
"TwoStageDiDResults",
12401308
"WooldridgeDiDResults",
12411309
}:
12421310
return {

diff_diff/diagnostic_report.py

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -615,8 +615,17 @@ def _instance_skip_reason(self, check: str) -> Optional[str]:
615615
"opt in."
616616
)
617617
if method == "event_study":
618-
pre_coefs, _ = _collect_pre_period_coefs(r)
619-
if not pre_coefs:
618+
pre_coefs, n_dropped_undefined = _collect_pre_period_coefs(r)
619+
# Round-42 P1 CI review on PR #318: the all-undefined
620+
# pre-period case (every pre-row dropped for ``se <= 0``
621+
# / non-finite inference) is the twin of the partial-
622+
# undefined case from round-33. It must route to the
623+
# inconclusive runner rather than skip, so the explicit
624+
# ``method="inconclusive"`` / ``n_dropped_undefined``
625+
# provenance is surfaced through DR's schema and BR's
626+
# summary emits the "inconclusive" identifying-
627+
# assumption warning rather than silently dropping PT.
628+
if not pre_coefs and n_dropped_undefined == 0:
620629
return (
621630
"No pre-period event-study coefficients are exposed on "
622631
"this fit. For staggered estimators, re-fit with "
@@ -1083,20 +1092,19 @@ def _pt_event_study(self) -> Dict[str, Any]:
10831092
"""
10841093
r = self._results
10851094
pre_coefs, n_dropped_undefined = _collect_pre_period_coefs(r)
1086-
if not pre_coefs:
1087-
return {
1088-
"status": "skipped",
1089-
"reason": "No pre-period event-study coefficients available.",
1090-
}
1091-
# Round-33 P0 CI review on PR #318: if any real pre-period was
1092-
# rejected for undefined inference (``se <= 0`` or non-finite
1093-
# ``effect`` / ``se``), the Bonferroni fallback used to silently
1094-
# shrink the test family on the remaining subset and publish a
1095-
# finite joint p-value that then lifted into clean BR prose.
1096-
# That violates the ``safe_inference`` contract (``se <= 0`` ->
1097-
# NaN downstream). Return an explicit inconclusive PT result
1098-
# instead — the user cannot conclude "PT holds" from a
1099-
# partially-undefined pre-period surface.
1095+
# Round-33 P0 / Round-42 P1 CI review on PR #318: undefined-
1096+
# inference rows must drive an explicit ``inconclusive`` PT
1097+
# result rather than either (a) silently shrinking the
1098+
# Bonferroni family on the remaining subset and publishing a
1099+
# finite joint p-value (R33, mixed-partial case), or (b)
1100+
# routing through the empty-coefs ``skipped`` path when every
1101+
# pre-row was rejected (R42, all-undefined case). Both violate
1102+
# the ``safe_inference`` contract: ``se <= 0`` / non-finite
1103+
# effect or SE yields NaN downstream per ``utils.py`` line
1104+
# 175, REGISTRY.md line 197. The inconclusive block preserves
1105+
# the undefined-row count on the schema so BR's summary can
1106+
# quote it and stakeholders see an explicit "PT could not be
1107+
# assessed" warning rather than a silent PT-absent narrative.
11001108
if n_dropped_undefined > 0:
11011109
return {
11021110
"status": "ran",
@@ -1119,6 +1127,11 @@ def _pt_event_study(self) -> Dict[str, Any]:
11191127
"investigate why the per-period SE collapsed."
11201128
),
11211129
}
1130+
if not pre_coefs:
1131+
return {
1132+
"status": "skipped",
1133+
"reason": "No pre-period event-study coefficients available.",
1134+
}
11221135
interaction_indices = getattr(r, "interaction_indices", None)
11231136
vcov = getattr(r, "vcov", None)
11241137

tests/test_business_report.py

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -750,6 +750,101 @@ class StaggeredTripleDiffResults:
750750
# Must NOT be the generic group-time PT text.
751751
assert "group-time ATT" not in desc
752752

753+
def test_imputation_did_assumption_uses_untreated_fe_model(self):
754+
"""Round-42 P1 regression: BJS (2024) identifies through the
755+
untreated-outcome FE model (Step 1 estimates FE on ``Omega_0``
756+
= never-treated + not-yet-treated observations, Assumption 1
757+
parallel trends applies to ``E[Y_it(0)]``). The old generic
758+
"group-time ATT" wording misstated this: the identifying
759+
restriction is on the UNTREATED outcome's additive FE
760+
structure, not on cohort-time ATT equality. REGISTRY.md
761+
§ImputationDiD lines 1000-1013 and Assumption 1/2.
762+
"""
763+
764+
class ImputationDiDResults:
765+
pass
766+
767+
obj = ImputationDiDResults()
768+
obj.overall_att = 1.0
769+
obj.overall_se = 0.1
770+
obj.overall_p_value = 0.001
771+
obj.overall_conf_int = (0.8, 1.2)
772+
obj.alpha = 0.05
773+
obj.n_obs = 100
774+
obj.n_treated = 40
775+
obj.n_control = 60
776+
obj.survey_metadata = None
777+
obj.event_study_effects = None
778+
obj.inference_method = "analytical"
779+
obj.anticipation = 0
780+
781+
br = BusinessReport(obj, auto_diagnostics=False)
782+
assumption = br.to_dict()["assumption"]
783+
assert assumption["parallel_trends_variant"] == "untreated_outcome_fe_model"
784+
desc = assumption["description"]
785+
# Registry-backed: Borusyak-Jaravel-Spiess attribution.
786+
assert "Borusyak" in desc or "BJS" in desc or "2024" in desc
787+
# Load-bearing source detail: untreated-observation FE model.
788+
assert "untreated" in desc.lower()
789+
assert "Omega_0" in desc or "fixed effect" in desc.lower()
790+
# Must NOT render the pre-R42 generic group-time-ATT template
791+
# that grouped BJS in with CS / SA.
792+
assert (
793+
"parallel trends across treatment cohorts and time periods (group-time ATT)" not in desc
794+
), (
795+
"ImputationDiD identifies via untreated-outcome FE modelling "
796+
"(BJS 2024 Assumption 1), not generic group-time ATT PT. The "
797+
f"assumption description must not use the pre-R42 template. Got: {desc!r}"
798+
)
799+
800+
def test_two_stage_did_assumption_uses_untreated_fe_model(self):
801+
"""Round-42 P1 regression: Gardner (2022) two-stage DiD shares
802+
BJS's untreated-outcome FE identification (REGISTRY.md explicitly
803+
states "Parallel trends (same as ImputationDiD)" and the point
804+
estimates are algebraically equivalent). Stage 1 fits FE on
805+
untreated observations, Stage 2 residualizes treated observations.
806+
The old generic "group-time ATT" wording dropped the untreated-
807+
subset detail. REGISTRY.md §TwoStageDiD lines 1113-1128.
808+
"""
809+
810+
class TwoStageDiDResults:
811+
pass
812+
813+
obj = TwoStageDiDResults()
814+
obj.overall_att = 1.0
815+
obj.overall_se = 0.1
816+
obj.overall_p_value = 0.001
817+
obj.overall_conf_int = (0.8, 1.2)
818+
obj.alpha = 0.05
819+
obj.n_obs = 100
820+
obj.n_treated = 40
821+
obj.n_control = 60
822+
obj.survey_metadata = None
823+
obj.event_study_effects = None
824+
obj.inference_method = "analytical"
825+
obj.anticipation = 0
826+
827+
br = BusinessReport(obj, auto_diagnostics=False)
828+
assumption = br.to_dict()["assumption"]
829+
assert assumption["parallel_trends_variant"] == "untreated_outcome_fe_model"
830+
desc = assumption["description"]
831+
# Registry-backed: Gardner 2022 attribution.
832+
assert "Gardner" in desc or "2022" in desc
833+
# Load-bearing: Stage 1 operates on untreated observations.
834+
assert "untreated" in desc.lower()
835+
assert "Stage 1" in desc or "stage 1" in desc.lower()
836+
# Must mention the two-stage procedure.
837+
assert "two-stage" in desc.lower() or "Two-Stage" in desc
838+
# Must NOT render the pre-R42 generic group-time-ATT template
839+
# that grouped Gardner in with CS / SA.
840+
assert (
841+
"parallel trends across treatment cohorts and time periods (group-time ATT)" not in desc
842+
), (
843+
"TwoStageDiD identifies via the same untreated-outcome FE "
844+
"model as ImputationDiD (Gardner 2022); the assumption "
845+
f"description must not use the pre-R42 template. Got: {desc!r}"
846+
)
847+
753848

754849
class TestEfficientDiDAssumptionPtAllPtPost:
755850
"""Round-8 regression: EfficientDiD has two distinct PT regimes

tests/test_diagnostic_report.py

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1388,6 +1388,89 @@ class MultiPeriodDiDResults:
13881388
assert pt["method"] == "inconclusive"
13891389
assert pt["n_dropped_undefined"] >= 1
13901390

1391+
def test_all_pre_periods_undefined_yields_inconclusive_not_skipped(self):
1392+
"""Round-42 P1 regression: the twin of the partially-undefined
1393+
case. When every pre-period row is dropped by the collector
1394+
for undefined inference (all ``se <= 0`` or non-finite effect/SE),
1395+
``_collect_pre_period_coefs`` returns ``([], n_dropped_undefined > 0)``.
1396+
The prior behavior routed through the empty-coefs ``skipped``
1397+
path ("No pre-period event-study coefficients available"),
1398+
which let BR drop the identifying-assumption warning and render
1399+
a silent-PT-absent narrative. That violates the inconclusive
1400+
contract documented in REPORTING.md: when any pre-row is
1401+
dropped for undefined inference, the joint PT test is
1402+
inconclusive, not skipped.
1403+
"""
1404+
from diff_diff import BusinessReport
1405+
1406+
class StackedDiDResults:
1407+
pass
1408+
1409+
obj = StackedDiDResults()
1410+
obj.overall_att = 1.0
1411+
obj.overall_se = 0.2
1412+
obj.overall_p_value = 0.001
1413+
obj.overall_conf_int = (0.6, 1.4)
1414+
obj.alpha = 0.05
1415+
obj.n_obs = 400
1416+
obj.n_treated_units = 100
1417+
obj.n_control_units = 300
1418+
obj.survey_metadata = None
1419+
# All pre-rows have ``se == 0`` — undefined inference per the
1420+
# safe-inference contract (``utils.py:175``). The collector's
1421+
# ``se > 0`` filter drops all of them, leaving pre_coefs=[]
1422+
# with n_dropped_undefined=2 (the R42 all-undefined case).
1423+
obj.event_study_effects = {
1424+
-2: {
1425+
"effect": 0.1,
1426+
"se": 0.0,
1427+
"p_value": 1.0,
1428+
"n_obs": 400,
1429+
},
1430+
-1: {
1431+
"effect": 0.05,
1432+
"se": 0.0,
1433+
"p_value": 1.0,
1434+
"n_obs": 400,
1435+
},
1436+
}
1437+
1438+
dr = DiagnosticReport(obj, run_sensitivity=False, run_bacon=False)
1439+
# Applicability gate: PT must be marked applicable (runs as
1440+
# inconclusive), not skipped with "no coefficients available".
1441+
assert "parallel_trends" in dr.applicable_checks, (
1442+
"All-undefined pre-period case must keep PT applicable so "
1443+
"the inconclusive runner can emit the explicit "
1444+
"n_dropped_undefined provenance. Current skipped reasons: "
1445+
f"{dr.skipped_checks}"
1446+
)
1447+
pt = dr.to_dict()["parallel_trends"]
1448+
assert pt["status"] == "ran", pt
1449+
assert pt["method"] == "inconclusive", (
1450+
f"All-undefined pre-period family must route to the "
1451+
f"inconclusive runner, not 'skipped'. Got status="
1452+
f"{pt.get('status')!r}, method={pt.get('method')!r}, "
1453+
f"reason={pt.get('reason')!r}"
1454+
)
1455+
assert pt["verdict"] == "inconclusive"
1456+
assert pt["joint_p_value"] is None
1457+
# All-undefined: n_dropped_undefined equals attempted pre-period
1458+
# count (2 rows here), and the valid subset is empty.
1459+
assert pt["n_dropped_undefined"] == 2
1460+
assert pt["n_pre_periods"] == 0
1461+
1462+
# BR must surface this as an inconclusive identifying-
1463+
# assumption warning, not silently omit PT. The "inconclusive"
1464+
# verdict phrasing is the load-bearing contract for
1465+
# stakeholders.
1466+
br_summary = BusinessReport(obj).summary().lower()
1467+
assert "inconclusive" in br_summary, (
1468+
f"All-undefined PT must surface 'inconclusive' in BR " f"summary. Got: {br_summary!r}"
1469+
)
1470+
# And must not claim PT was untested / no-coefs.
1471+
assert "no pre-period event-study coefficients" not in br_summary
1472+
assert "consistent with parallel trends" not in br_summary
1473+
13911474
def test_pretrends_power_adapter_filters_zero_se_cs(self):
13921475
"""Round-33 P0 regression: CS / SA ``compute_pretrends_power``
13931476
adapters also use the ``se > 0`` filter alongside

0 commit comments

Comments
 (0)