Address PR #347 R4: propagate no_scalar_headline through BR/DR; Wooldridge method-aware

igerber · claude · igerber · commit a8e1719058c4 · 2026-04-20T19:55:12.000-04:00
R4 surfaced one P1 + one P2, both addressed.

P1 (methodology): the dCDH no-scalar branch was documented in the
schema but not plumbed through BR/DR rendering. When
``aggregation="no_scalar_headline"`` and ``headline_attribute=None``
(``trends_linear=True`` + ``L_max&gt;=2``), BR/DR still extracted
``overall_att`` (NaN by design) and narrated it via the estimation-
failure path, producing internally inconsistent output — the
``target_parameter`` block said "no scalar aggregate; consult
linear_trends_effects" while the headline prose told users to
inspect rank deficiency.

Fix (both surfaces):

- BR ``_build_schema``: compute ``target_parameter`` BEFORE
  ``_extract_headline``; if the aggregation tag is
  ``no_scalar_headline``, route through a dedicated headline block
  with ``status="no_scalar_by_design"`` / ``effect=None`` /
  ``sign="none"`` and an explicit ``reason`` field naming the
  ``linear_trends_effects`` alternative.
- BR ``_render_headline_sentence``: detect
  ``status == "no_scalar_by_design"`` and emit explicit "does not
  produce a scalar aggregate effect ... by design" prose instead
  of the non-finite / estimation-failure sentence.
- BR ``_build_caveats``: the existing ``sign == "undefined"``
  estimation-failure caveat does not fire because we emit
  ``sign == "none"`` (not ``"undefined"``) on the no-scalar case.
- DR ``_execute``: analogous headline-metric short-circuit with
  ``status="no_scalar_by_design"`` on detection of the
  no_scalar_headline tag.
- DR ``_render_overall_interpretation``: explicit no-scalar
  sentence takes precedence over the non-finite estimation-failure
  branch.

P2 (Wooldridge method awareness): the Wooldridge branch previously
labeled every fit as ASF-based, but REGISTRY.md Sec. WooldridgeDiD
splits OLS ETWFE (observation-count-weighted average of ATT(g,t)
from a saturated regression) from the nonlinear (logit / Poisson)
ASF path. Branch on ``results.method`` ("ols" -&gt; coefficient-
aggregation wording; other -&gt; ASF wording).

Tests: added 4 end-to-end regressions.

- ``test_dcdh_trends_linear_no_scalar_propagates_through_br``:
  real dCDH fit with ``trends_linear=True`` + ``L_max=2``; asserts
  BR schema emits ``status="no_scalar_by_design"``, summary prose
  contains "no scalar" / "does not produce a scalar", does NOT
  contain "rank deficiency" / "estimation failed", and caveats do
  NOT include ``estimation_failure``.
- ``test_dcdh_trends_linear_no_scalar_propagates_through_dr``:
  mirror on the DR side (``headline_metric`` status and
  ``overall_interpretation`` prose).
- ``test_wooldridge_ols``: asserts the OLS branch names
  ATT(g,t) aggregation and does NOT include "ASF" in the name.
- ``test_wooldridge_nonlinear``: asserts logit/poisson routes
  through the ASF branch.

336 BR/DR tests pass. Black and ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/_reporting_helpers.py b/diff_diff/_reporting_helpers.py
@@ -218,19 +218,56 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
         }
 
     if name == "WooldridgeDiDResults":
+        # PR #347 R4 P2: Wooldridge ETWFE has two identification paths
+        # (REGISTRY.md splits them at Sec. WooldridgeDiD): the OLS
+        # path computes ``overall_att`` as an observation-count-
+        # weighted aggregation of ``ATT(g, t)`` coefficients from the
+        # saturated regression, while the nonlinear (logit / Poisson)
+        # paths produce an ASF-based ATT from the average-structural-
+        # function contrast. ``WooldridgeDiDResults.method`` persists
+        # the choice; branch on it so OLS fits aren't mislabeled with
+        # nonlinear-ASF wording.
+        method = getattr(results, "method", "ols")
+        if method == "ols":
+            return {
+                "name": (
+                    "overall ATT (observation-count-weighted average of "
+                    "ATT(g,t) from saturated OLS ETWFE)"
+                ),
+                "definition": (
+                    "The overall ATT under OLS ETWFE (Wooldridge 2023): the "
+                    "saturated regression fits cohort x time ATT(g, t) "
+                    "coefficients, and ``overall_att`` is their "
+                    "observation-count-weighted average across post-"
+                    'treatment cells. Calling ``.aggregate("event")`` '
+                    "populates additional event-study tables but does NOT "
+                    "change the ``overall_att`` scalar."
+                ),
+                "aggregation": "simple",
+                "headline_attribute": "overall_att",
+                "reference": ("Wooldridge (2023); REGISTRY.md Sec. WooldridgeDiD (OLS path)"),
+            }
         return {
-            "name": "overall ATT (observation-count-weighted ASF ATT across cohort x time cells)",
+            "name": (
+                f"overall ATT (ASF-based average from Wooldridge ETWFE, " f"method={method!r})"
+            ),
             "definition": (
-                "The overall ATT under Wooldridge's ETWFE: the average-structural-"
-                "function (ASF) contrast between treated and counterfactual "
-                "untreated outcomes, averaged across cohort x time cells with "
-                'observation-count weights. Calling ``.aggregate("event")`` '
-                "populates additional event-study tables but does NOT change "
-                "the ``overall_att`` scalar."
+                f"The overall ATT under Wooldridge ETWFE with a nonlinear "
+                f"link function (``method={method!r}``, typically logit or "
+                f"Poisson QMLE): the average-structural-function (ASF) "
+                f"contrast between treated and counterfactual untreated "
+                f"outcomes averaged across cohort x time cells with "
+                f"observation-count weights. The ASF handles the "
+                f"nonlinearity; OLS ETWFE uses the saturated-regression "
+                f'coefficient path instead. Calling ``.aggregate("event")`` '
+                f"populates additional event-study tables but does NOT "
+                f"change the ``overall_att`` scalar."
             ),
             "aggregation": "simple",
             "headline_attribute": "overall_att",
-            "reference": "Wooldridge (2023); REGISTRY.md Sec. WooldridgeDiD",
+            "reference": (
+                "Wooldridge (2023, 2025); REGISTRY.md Sec. WooldridgeDiD " "(nonlinear / ASF path)"
+            ),
         }
 
     if name == "EfficientDiDResults":
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
@@ -433,9 +433,44 @@ def _build_schema(self) -> Dict[str, Any]:
             diagnostics_results.schema if diagnostics_results is not None else None
         )
 
-        headline = self._extract_headline(dr_schema)
-        sample = self._extract_sample()
+        # PR #347 R4 P1: compute target_parameter BEFORE extracting
+        # the headline so the no-scalar-by-design case
+        # (``aggregation == "no_scalar_headline"``, e.g., dCDH
+        # ``trends_linear=True`` with ``L_max >= 2``) can route the
+        # headline through a dedicated branch that names the intentional
+        # NaN rather than an estimation-failure path.
         target_parameter = describe_target_parameter(self._results)
+        if target_parameter.get("aggregation") == "no_scalar_headline":
+            headline = {
+                "status": "no_scalar_by_design",
+                "effect": None,
+                "se": None,
+                "ci_lower": None,
+                "ci_upper": None,
+                "alpha_was_honored": True,
+                "alpha_override_caveat": None,
+                "ci_level": int(round((1.0 - self._context.alpha) * 100)),
+                "p_value": None,
+                "is_significant": False,
+                "near_significance_threshold": False,
+                "unit": self._context.outcome_unit,
+                "unit_kind": _UNIT_KINDS.get(
+                    self._context.outcome_unit.lower() if self._context.outcome_unit else "",
+                    "unknown",
+                ),
+                "sign": "none",
+                "breakdown_M": None,
+                "reason": (
+                    "The fitted estimator intentionally does not produce a "
+                    "scalar overall ATT on this configuration "
+                    "(``trends_linear=True`` with ``L_max >= 2``). Per-horizon "
+                    "cumulated level effects are on "
+                    "``results.linear_trends_effects[l]``."
+                ),
+            }
+        else:
+            headline = self._extract_headline(dr_schema)
+        sample = self._extract_sample()
         heterogeneity = _lift_heterogeneity(dr_schema)
         pre_trends = _lift_pre_trends(dr_schema)
         sensitivity = _lift_sensitivity(dr_schema)
@@ -1931,6 +1966,22 @@ def _render_headline_sentence(schema: Dict[str, Any]) -> str:
     """
     ctx = schema.get("context", {})
     h = schema.get("headline", {})
+    # PR #347 R4 P1: the dCDH ``trends_linear=True`` + ``L_max>=2``
+    # configuration does not produce a scalar headline by design —
+    # ``overall_att`` is intentionally NaN (per
+    # ``chaisemartin_dhaultfoeuille.py:2828-2834``). Render explicit
+    # "no scalar headline by design" prose instead of routing through
+    # the non-finite / estimation-failure path.
+    if h.get("status") == "no_scalar_by_design":
+        treatment = ctx.get("treatment_label", "the treatment")
+        outcome_label = ctx.get("outcome_label", "the outcome")
+        treatment_sentence = _sentence_first_upper(treatment)
+        return (
+            f"{treatment_sentence} does not produce a scalar aggregate effect "
+            f"on {outcome_label} under this configuration (by design; see "
+            f"``linear_trends_effects`` for per-horizon cumulated level "
+            f"effects)."
+        )
     effect = h.get("effect")
     outcome = ctx.get("outcome_label", "the outcome")
     treatment = ctx.get("treatment_label", "the treatment")
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
@@ -928,7 +928,35 @@ def _execute(self) -> DiagnosticReportResults:
             }
 
         # Headline metric — best-effort across estimator types.
-        headline = self._extract_headline_metric()
+        # PR #347 R4 P1: the dCDH ``trends_linear=True`` + ``L_max>=2``
+        # configuration does not produce a scalar headline by design
+        # (``overall_att`` is intentionally NaN per
+        # ``chaisemartin_dhaultfoeuille.py:2828-2834``). Route the
+        # headline through a dedicated no-scalar block when the
+        # target-parameter helper flags this case so prose does not
+        # narrate it as an estimation failure.
+        _tp_agg = describe_target_parameter(self._results).get("aggregation")
+        if _tp_agg == "no_scalar_headline":
+            headline = {
+                "status": "no_scalar_by_design",
+                "name": "no scalar headline (see linear_trends_effects)",
+                "value": None,
+                "se": None,
+                "p_value": None,
+                "conf_int": (None, None),
+                "alpha": self._alpha,
+                "is_significant": False,
+                "sign": "none",
+                "reason": (
+                    "The fitted estimator intentionally does not produce a "
+                    "scalar overall ATT on this configuration "
+                    "(``trends_linear=True`` with ``L_max >= 2``). Per-horizon "
+                    "cumulated level effects are on "
+                    "``results.linear_trends_effects[l]``."
+                ),
+            }
+        else:
+            headline = self._extract_headline_metric()
 
         # Pull suggested next steps from the practitioner workflow.
         next_steps = self._collect_next_steps(sections)
@@ -2979,7 +3007,19 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
     ci = headline.get("conf_int") if isinstance(headline, dict) else None
     p = headline.get("p_value") if isinstance(headline, dict) else None
     val_finite = isinstance(val, (int, float)) and np.isfinite(val)
-    if val is not None and not val_finite:
+    # PR #347 R4 P1: if the estimator intentionally produces no scalar
+    # aggregate (dCDH ``trends_linear=True`` + ``L_max>=2``), route
+    # through explicit no-scalar prose rather than the
+    # estimation-failure branch below. The headline block carries
+    # ``status="no_scalar_by_design"`` in that case.
+    if isinstance(headline, dict) and headline.get("status") == "no_scalar_by_design":
+        sentences.append(
+            f"On {est}, {treatment} does not produce a scalar aggregate "
+            f"effect on {outcome} under this configuration (by design; "
+            f"see ``linear_trends_effects`` for per-horizon cumulated "
+            f"level effects)."
+        )
+    elif val is not None and not val_finite:
         sentences.append(
             f"On {est}, {treatment}'s effect on {outcome} is non-finite "
             "(the estimation did not produce a usable point estimate). "
diff --git a/tests/test_target_parameter.py b/tests/test_target_parameter.py
@@ -99,11 +99,30 @@ def test_stacked(self):
         assert tp["headline_attribute"] == "overall_att"
         assert "sub-experiment" in tp["definition"].lower()
 
-    def test_wooldridge(self):
-        tp = describe_target_parameter(_minimal_result("WooldridgeDiDResults"))
+    def test_wooldridge_ols(self):
+        """PR #347 R4 P2: OLS Wooldridge ETWFE must not be labeled with
+        ASF wording. The OLS path aggregates ATT(g,t) coefficients with
+        observation-count weights; the ASF path is for nonlinear links.
+        """
+        tp = describe_target_parameter(_minimal_result("WooldridgeDiDResults", method="ols"))
         assert tp["aggregation"] == "simple"
         assert tp["headline_attribute"] == "overall_att"
-        assert "ETWFE" in tp["name"] or "ETWFE" in tp["definition"] or "ASF" in tp["name"]
+        # OLS wording: mentions ATT(g,t) aggregation, not ASF.
+        assert (
+            "ATT(g,t)" in tp["name"] or "ATT(g,t)" in tp["definition"] or "OLS ETWFE" in tp["name"]
+        )
+        assert "ASF" not in tp["name"]
+
+    def test_wooldridge_nonlinear(self):
+        """Nonlinear (logit/Poisson) Wooldridge ETWFE uses the ASF-based
+        ATT path — different wording, different REGISTRY reference.
+        """
+        for method in ("logit", "poisson"):
+            tp = describe_target_parameter(_minimal_result("WooldridgeDiDResults", method=method))
+            assert tp["aggregation"] == "simple"
+            assert tp["headline_attribute"] == "overall_att"
+            assert "ASF" in tp["name"]
+            assert method in tp["name"] or method in tp["definition"]
 
     def test_efficient_did_pt_all(self):
         tp = describe_target_parameter(_minimal_result("EfficientDiDResults", pt_assumption="all"))
@@ -589,6 +608,73 @@ def test_dcdh_delta_fit_real(self):
         assert tp["headline_attribute"] == "overall_att"
         assert hasattr(fit, "overall_att")
 
+    def test_dcdh_trends_linear_no_scalar_propagates_through_br(self):
+        """PR #347 R4 P1 end-to-end: on the dCDH no-scalar
+        configuration (``trends_linear=True`` + ``L_max>=2``), BR's
+        ``to_dict()`` headline must carry ``status="no_scalar_by_design"``
+        and BR's summary / full report must emit explicit no-scalar
+        prose — NOT the generic "non-finite effect / inspect the fit
+        for rank deficiency" estimation-failure messaging.
+        """
+        import warnings
+
+        from diff_diff import BusinessReport, ChaisemartinDHaultfoeuille
+
+        warnings.filterwarnings("ignore")
+        df = self._dcdh_reversible_panel(seed=16)
+        fit = ChaisemartinDHaultfoeuille().fit(
+            df,
+            outcome="outcome",
+            group="unit",
+            time="period",
+            treatment="treated",
+            L_max=2,
+            trends_linear=True,
+        )
+        br = BusinessReport(fit, outcome_label="the outcome", auto_diagnostics=False)
+        schema = br.to_dict()
+        assert schema["headline"]["status"] == "no_scalar_by_design"
+        assert schema["headline"]["effect"] is None
+        # BR's summary prose must be explicit no-scalar, not
+        # "non-finite estimate / inspect rank deficiency".
+        summary = br.summary()
+        assert "no scalar" in summary.lower() or "does not produce a scalar" in summary.lower()
+        assert "rank deficiency" not in summary.lower()
+        assert "estimation failed" not in summary.lower()
+        # Must NOT emit the "estimation_failure" caveat either.
+        caveats = br.caveats()
+        topics = {c.get("topic") for c in caveats}
+        assert "estimation_failure" not in topics
+
+    def test_dcdh_trends_linear_no_scalar_propagates_through_dr(self):
+        """Same contract on the DR side: ``headline_metric`` carries
+        ``status="no_scalar_by_design"`` and the overall-interpretation
+        prose is explicit no-scalar, not an estimation-failure sentence.
+        """
+        import warnings
+
+        from diff_diff import ChaisemartinDHaultfoeuille, DiagnosticReport
+
+        warnings.filterwarnings("ignore")
+        df = self._dcdh_reversible_panel(seed=17)
+        fit = ChaisemartinDHaultfoeuille().fit(
+            df,
+            outcome="outcome",
+            group="unit",
+            time="period",
+            treatment="treated",
+            L_max=2,
+            trends_linear=True,
+        )
+        dr = DiagnosticReport(fit).run_all()
+        schema = dr.schema
+        assert schema["headline_metric"]["status"] == "no_scalar_by_design"
+        # DR interpretation must not narrate estimation failure.
+        prose = dr.interpretation
+        assert "does not produce a scalar" in prose.lower() or "no scalar" in prose.lower()
+        assert "rank deficiency" not in prose.lower()
+        assert "zero effective sample" not in prose.lower()
+
     def test_dcdh_trends_linear_with_l_max_geq_2_fit_real(self):
         """Real ``trends_linear=True`` + ``L_max>=2`` fit: the library
         intentionally sets ``overall_att=NaN`` and populates the