Address forty-second round of CI review findings on PR #318

igerber · claude · igerber · commit 5788e26300ef · 2026-04-19T16:24:03.000-04:00
Round-42 landed two P1 findings:

1. All-undefined pre-period surface routed to ``skipped`` instead of
   ``inconclusive`` (``diagnostic_report.py``). When every pre-row is
   dropped by ``_collect_pre_period_coefs`` for undefined inference
   (all ``se &lt;= 0`` / non-finite effect/se), the collector returns
   ``([], n_dropped_undefined &gt; 0)``. Both the applicability gate and
   ``_pt_event_study`` treated that as "no coefficients available" and
   skipped, letting BR drop the identifying-assumption warning. Fixed
   both sites to detect the all-undefined case and route to the
   explicit ``method="inconclusive"`` runner alongside the partial-
   undefined case already covered by R33. BR's existing inconclusive
   phrasing lifts through unchanged.

2. Source-faithful assumption text for ``ImputationDiDResults`` and
   ``TwoStageDiDResults`` (``business_report.py``). BR's
   ``_describe_assumption`` was grouping both with CS / SA / Wooldridge
   under the generic "parallel trends across treatment cohorts and
   time periods (group-time ATT)" template, but BJS (2024) and Gardner
   (2022) both identify through an untreated-potential-outcome model:
   unit+time FE fitted on untreated observations (``Omega_0`` =
   never-treated + not-yet-treated) deliver the counterfactual, and
   the identifying restriction is on ``E[Y_it(0)] = alpha_i + beta_t``
   — not on cohort-time ATT equality. Split each into its own branch
   mirroring REGISTRY.md §ImputationDiD (lines 1000-1013) and
   §TwoStageDiD (lines 1113-1128), including the Gardner-BJS
   algebraic-equivalence note.

Tests: 3 new regressions.
- ``test_all_pre_periods_undefined_yields_inconclusive_not_skipped``:
  all pre-rows with ``se == 0``, asserts DR emits ``method="inconclusive"``
  / ``status="ran"`` / ``n_pre_periods=0`` / ``n_dropped_undefined=2``,
  and BR summary emits "inconclusive".
- ``test_imputation_did_assumption_uses_untreated_fe_model`` and
  ``test_two_stage_did_assumption_uses_untreated_fe_model``: lock the
  new ``parallel_trends_variant="untreated_outcome_fe_model"`` tag,
  require the registry-backed source attribution and untreated-subset
  detail, and reject the pre-R42 generic-PT template.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
@@ -1232,11 +1232,79 @@ def _describe_assumption(estimator_name: str, results: Any = None) -> Dict[str,
             block["control_group"] = clean_control
             block["clean_control"] = clean_control
         return block
+    if estimator_name == "ImputationDiDResults":
+        # Borusyak, Jaravel & Spiess (2024) — identification is through
+        # an untreated-potential-outcome model: unit+time FE (optionally
+        # plus covariates) fitted on untreated observations only
+        # (``Omega_0``) deliver the counterfactual ``Y_it(0)``, and the
+        # treatment effect ``tau_it`` is the residual on treated
+        # observations. Writing this as generic "group-time ATT
+        # parallel trends" misstates the identifying model — the
+        # restriction is on the UNTREATED outcome's additive FE
+        # structure, not on cohort-time ATT equality. REGISTRY.md
+        # §ImputationDiD lines 1000-1013 and Assumption 1 (parallel
+        # trends) + Assumption 2 (no anticipation on untreated
+        # observations). Round-42 P1 CI review on PR #318 flagged this
+        # source-faithfulness gap.
+        return {
+            "parallel_trends_variant": "untreated_outcome_fe_model",
+            "no_anticipation": True,
+            "description": (
+                "Identification under Imputation DiD (Borusyak, Jaravel "
+                "& Spiess 2024): the untreated potential outcome "
+                "``Y_it(0)`` follows an additive unit+time fixed-effects "
+                "model ``Y_it(0) = alpha_i + beta_t [+ X'_it * delta] + "
+                "epsilon_it``. Step 1 estimates those FE on untreated "
+                "observations only (``Omega_0`` = never-treated plus "
+                "not-yet-treated cells); Step 2 imputes the "
+                "counterfactual for treated observations from the "
+                "fitted FE; Step 3 aggregates ``tau_hat_it = Y_it - "
+                "Y_hat_it(0)`` with researcher-chosen weights. The "
+                "identifying restriction is therefore parallel trends "
+                "of the UNTREATED outcome model (Assumption 1) — "
+                "``E[Y_it(0)] = alpha_i + beta_t``, holding across all "
+                "observations — rather than equality of cohort-time "
+                "ATTs. Also assumes no anticipation on untreated "
+                "observations (Assumption 2) and absorbing treatment."
+            ),
+        }
+    if estimator_name == "TwoStageDiDResults":
+        # Gardner (2022) — identification is the same as BJS
+        # ImputationDiD (point estimates are algebraically equivalent
+        # per REGISTRY.md §TwoStageDiD line 1130): unit+time FE
+        # estimated on untreated observations only deliver the
+        # untreated potential-outcome trajectory; Stage 2 regresses
+        # the resulting residuals on treatment indicators. Writing
+        # this as generic "group-time ATT parallel trends" loses the
+        # load-bearing detail that Stage 1 operates only on untreated
+        # cells. REGISTRY.md §TwoStageDiD lines 1113-1128 and
+        # Assumption (same as ImputationDiD). Round-42 P1 CI review on
+        # PR #318 flagged this source-faithfulness gap.
+        return {
+            "parallel_trends_variant": "untreated_outcome_fe_model",
+            "no_anticipation": True,
+            "description": (
+                "Identification under Two-Stage DiD (Gardner 2022): "
+                "Stage 1 fits unit + time fixed effects on untreated "
+                "observations only (``Omega_0``), residualizing the "
+                "outcome as ``y_tilde_it = Y_it - alpha_hat_i - "
+                "beta_hat_t``; Stage 2 regresses residualized outcomes "
+                "on the treatment indicator across treated observations "
+                "to recover the ATT. The point estimates are "
+                "algebraically equivalent to Borusyak-Jaravel-Spiess "
+                "imputation (both rely on the same untreated-outcome FE "
+                "model to construct the counterfactual). The "
+                "identifying restriction is therefore parallel trends "
+                "of the UNTREATED outcome: ``E[Y_it(0)] = alpha_i + "
+                "beta_t`` for all observations (not a group-time ATT "
+                "equality across cohorts). Also assumes no anticipation "
+                "(``Y_it = Y_it(0)`` for all untreated observations) "
+                "and absorbing / irreversible treatment."
+            ),
+        }
     if estimator_name in {
         "CallawaySantAnnaResults",
         "SunAbrahamResults",
-        "ImputationDiDResults",
-        "TwoStageDiDResults",
         "WooldridgeDiDResults",
     }:
         return {
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
@@ -615,8 +615,17 @@ def _instance_skip_reason(self, check: str) -> Optional[str]:
                         "opt in."
                     )
             if method == "event_study":
-                pre_coefs, _ = _collect_pre_period_coefs(r)
-                if not pre_coefs:
+                pre_coefs, n_dropped_undefined = _collect_pre_period_coefs(r)
+                # Round-42 P1 CI review on PR #318: the all-undefined
+                # pre-period case (every pre-row dropped for ``se <= 0``
+                # / non-finite inference) is the twin of the partial-
+                # undefined case from round-33. It must route to the
+                # inconclusive runner rather than skip, so the explicit
+                # ``method="inconclusive"`` / ``n_dropped_undefined``
+                # provenance is surfaced through DR's schema and BR's
+                # summary emits the "inconclusive" identifying-
+                # assumption warning rather than silently dropping PT.
+                if not pre_coefs and n_dropped_undefined == 0:
                     return (
                         "No pre-period event-study coefficients are exposed on "
                         "this fit. For staggered estimators, re-fit with "
@@ -1083,20 +1092,19 @@ def _pt_event_study(self) -> Dict[str, Any]:
         """
         r = self._results
         pre_coefs, n_dropped_undefined = _collect_pre_period_coefs(r)
-        if not pre_coefs:
-            return {
-                "status": "skipped",
-                "reason": "No pre-period event-study coefficients available.",
-            }
-        # Round-33 P0 CI review on PR #318: if any real pre-period was
-        # rejected for undefined inference (``se <= 0`` or non-finite
-        # ``effect`` / ``se``), the Bonferroni fallback used to silently
-        # shrink the test family on the remaining subset and publish a
-        # finite joint p-value that then lifted into clean BR prose.
-        # That violates the ``safe_inference`` contract (``se <= 0`` ->
-        # NaN downstream). Return an explicit inconclusive PT result
-        # instead — the user cannot conclude "PT holds" from a
-        # partially-undefined pre-period surface.
+        # Round-33 P0 / Round-42 P1 CI review on PR #318: undefined-
+        # inference rows must drive an explicit ``inconclusive`` PT
+        # result rather than either (a) silently shrinking the
+        # Bonferroni family on the remaining subset and publishing a
+        # finite joint p-value (R33, mixed-partial case), or (b)
+        # routing through the empty-coefs ``skipped`` path when every
+        # pre-row was rejected (R42, all-undefined case). Both violate
+        # the ``safe_inference`` contract: ``se <= 0`` / non-finite
+        # effect or SE yields NaN downstream per ``utils.py`` line
+        # 175, REGISTRY.md line 197. The inconclusive block preserves
+        # the undefined-row count on the schema so BR's summary can
+        # quote it and stakeholders see an explicit "PT could not be
+        # assessed" warning rather than a silent PT-absent narrative.
         if n_dropped_undefined > 0:
             return {
                 "status": "ran",
@@ -1119,6 +1127,11 @@ def _pt_event_study(self) -> Dict[str, Any]:
                     "investigate why the per-period SE collapsed."
                 ),
             }
+        if not pre_coefs:
+            return {
+                "status": "skipped",
+                "reason": "No pre-period event-study coefficients available.",
+            }
         interaction_indices = getattr(r, "interaction_indices", None)
         vcov = getattr(r, "vcov", None)
 
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
@@ -750,6 +750,101 @@ class StaggeredTripleDiffResults:
         # Must NOT be the generic group-time PT text.
         assert "group-time ATT" not in desc
 
+    def test_imputation_did_assumption_uses_untreated_fe_model(self):
+        """Round-42 P1 regression: BJS (2024) identifies through the
+        untreated-outcome FE model (Step 1 estimates FE on ``Omega_0``
+        = never-treated + not-yet-treated observations, Assumption 1
+        parallel trends applies to ``E[Y_it(0)]``). The old generic
+        "group-time ATT" wording misstated this: the identifying
+        restriction is on the UNTREATED outcome's additive FE
+        structure, not on cohort-time ATT equality. REGISTRY.md
+        §ImputationDiD lines 1000-1013 and Assumption 1/2.
+        """
+
+        class ImputationDiDResults:
+            pass
+
+        obj = ImputationDiDResults()
+        obj.overall_att = 1.0
+        obj.overall_se = 0.1
+        obj.overall_p_value = 0.001
+        obj.overall_conf_int = (0.8, 1.2)
+        obj.alpha = 0.05
+        obj.n_obs = 100
+        obj.n_treated = 40
+        obj.n_control = 60
+        obj.survey_metadata = None
+        obj.event_study_effects = None
+        obj.inference_method = "analytical"
+        obj.anticipation = 0
+
+        br = BusinessReport(obj, auto_diagnostics=False)
+        assumption = br.to_dict()["assumption"]
+        assert assumption["parallel_trends_variant"] == "untreated_outcome_fe_model"
+        desc = assumption["description"]
+        # Registry-backed: Borusyak-Jaravel-Spiess attribution.
+        assert "Borusyak" in desc or "BJS" in desc or "2024" in desc
+        # Load-bearing source detail: untreated-observation FE model.
+        assert "untreated" in desc.lower()
+        assert "Omega_0" in desc or "fixed effect" in desc.lower()
+        # Must NOT render the pre-R42 generic group-time-ATT template
+        # that grouped BJS in with CS / SA.
+        assert (
+            "parallel trends across treatment cohorts and time periods (group-time ATT)" not in desc
+        ), (
+            "ImputationDiD identifies via untreated-outcome FE modelling "
+            "(BJS 2024 Assumption 1), not generic group-time ATT PT. The "
+            f"assumption description must not use the pre-R42 template. Got: {desc!r}"
+        )
+
+    def test_two_stage_did_assumption_uses_untreated_fe_model(self):
+        """Round-42 P1 regression: Gardner (2022) two-stage DiD shares
+        BJS's untreated-outcome FE identification (REGISTRY.md explicitly
+        states "Parallel trends (same as ImputationDiD)" and the point
+        estimates are algebraically equivalent). Stage 1 fits FE on
+        untreated observations, Stage 2 residualizes treated observations.
+        The old generic "group-time ATT" wording dropped the untreated-
+        subset detail. REGISTRY.md §TwoStageDiD lines 1113-1128.
+        """
+
+        class TwoStageDiDResults:
+            pass
+
+        obj = TwoStageDiDResults()
+        obj.overall_att = 1.0
+        obj.overall_se = 0.1
+        obj.overall_p_value = 0.001
+        obj.overall_conf_int = (0.8, 1.2)
+        obj.alpha = 0.05
+        obj.n_obs = 100
+        obj.n_treated = 40
+        obj.n_control = 60
+        obj.survey_metadata = None
+        obj.event_study_effects = None
+        obj.inference_method = "analytical"
+        obj.anticipation = 0
+
+        br = BusinessReport(obj, auto_diagnostics=False)
+        assumption = br.to_dict()["assumption"]
+        assert assumption["parallel_trends_variant"] == "untreated_outcome_fe_model"
+        desc = assumption["description"]
+        # Registry-backed: Gardner 2022 attribution.
+        assert "Gardner" in desc or "2022" in desc
+        # Load-bearing: Stage 1 operates on untreated observations.
+        assert "untreated" in desc.lower()
+        assert "Stage 1" in desc or "stage 1" in desc.lower()
+        # Must mention the two-stage procedure.
+        assert "two-stage" in desc.lower() or "Two-Stage" in desc
+        # Must NOT render the pre-R42 generic group-time-ATT template
+        # that grouped Gardner in with CS / SA.
+        assert (
+            "parallel trends across treatment cohorts and time periods (group-time ATT)" not in desc
+        ), (
+            "TwoStageDiD identifies via the same untreated-outcome FE "
+            "model as ImputationDiD (Gardner 2022); the assumption "
+            f"description must not use the pre-R42 template. Got: {desc!r}"
+        )
+
 
 class TestEfficientDiDAssumptionPtAllPtPost:
     """Round-8 regression: EfficientDiD has two distinct PT regimes
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
@@ -1388,6 +1388,89 @@ class MultiPeriodDiDResults:
         assert pt["method"] == "inconclusive"
         assert pt["n_dropped_undefined"] >= 1
 
+    def test_all_pre_periods_undefined_yields_inconclusive_not_skipped(self):
+        """Round-42 P1 regression: the twin of the partially-undefined
+        case. When every pre-period row is dropped by the collector
+        for undefined inference (all ``se <= 0`` or non-finite effect/SE),
+        ``_collect_pre_period_coefs`` returns ``([], n_dropped_undefined > 0)``.
+        The prior behavior routed through the empty-coefs ``skipped``
+        path ("No pre-period event-study coefficients available"),
+        which let BR drop the identifying-assumption warning and render
+        a silent-PT-absent narrative. That violates the inconclusive
+        contract documented in REPORTING.md: when any pre-row is
+        dropped for undefined inference, the joint PT test is
+        inconclusive, not skipped.
+        """
+        from diff_diff import BusinessReport
+
+        class StackedDiDResults:
+            pass
+
+        obj = StackedDiDResults()
+        obj.overall_att = 1.0
+        obj.overall_se = 0.2
+        obj.overall_p_value = 0.001
+        obj.overall_conf_int = (0.6, 1.4)
+        obj.alpha = 0.05
+        obj.n_obs = 400
+        obj.n_treated_units = 100
+        obj.n_control_units = 300
+        obj.survey_metadata = None
+        # All pre-rows have ``se == 0`` — undefined inference per the
+        # safe-inference contract (``utils.py:175``). The collector's
+        # ``se > 0`` filter drops all of them, leaving pre_coefs=[]
+        # with n_dropped_undefined=2 (the R42 all-undefined case).
+        obj.event_study_effects = {
+            -2: {
+                "effect": 0.1,
+                "se": 0.0,
+                "p_value": 1.0,
+                "n_obs": 400,
+            },
+            -1: {
+                "effect": 0.05,
+                "se": 0.0,
+                "p_value": 1.0,
+                "n_obs": 400,
+            },
+        }
+
+        dr = DiagnosticReport(obj, run_sensitivity=False, run_bacon=False)
+        # Applicability gate: PT must be marked applicable (runs as
+        # inconclusive), not skipped with "no coefficients available".
+        assert "parallel_trends" in dr.applicable_checks, (
+            "All-undefined pre-period case must keep PT applicable so "
+            "the inconclusive runner can emit the explicit "
+            "n_dropped_undefined provenance. Current skipped reasons: "
+            f"{dr.skipped_checks}"
+        )
+        pt = dr.to_dict()["parallel_trends"]
+        assert pt["status"] == "ran", pt
+        assert pt["method"] == "inconclusive", (
+            f"All-undefined pre-period family must route to the "
+            f"inconclusive runner, not 'skipped'. Got status="
+            f"{pt.get('status')!r}, method={pt.get('method')!r}, "
+            f"reason={pt.get('reason')!r}"
+        )
+        assert pt["verdict"] == "inconclusive"
+        assert pt["joint_p_value"] is None
+        # All-undefined: n_dropped_undefined equals attempted pre-period
+        # count (2 rows here), and the valid subset is empty.
+        assert pt["n_dropped_undefined"] == 2
+        assert pt["n_pre_periods"] == 0
+
+        # BR must surface this as an inconclusive identifying-
+        # assumption warning, not silently omit PT. The "inconclusive"
+        # verdict phrasing is the load-bearing contract for
+        # stakeholders.
+        br_summary = BusinessReport(obj).summary().lower()
+        assert "inconclusive" in br_summary, (
+            f"All-undefined PT must surface 'inconclusive' in BR " f"summary. Got: {br_summary!r}"
+        )
+        # And must not claim PT was untested / no-coefs.
+        assert "no pre-period event-study coefficients" not in br_summary
+        assert "consistent with parallel trends" not in br_summary
+
     def test_pretrends_power_adapter_filters_zero_se_cs(self):
         """Round-33 P0 regression: CS / SA ``compute_pretrends_power``
         adapters also use the ``se > 0`` filter alongside