Skip to content

Commit a8e1719

Browse files
igerberclaude
andcommitted
Address PR #347 R4: propagate no_scalar_headline through BR/DR; Wooldridge method-aware
R4 surfaced one P1 + one P2, both addressed. P1 (methodology): the dCDH no-scalar branch was documented in the schema but not plumbed through BR/DR rendering. When ``aggregation="no_scalar_headline"`` and ``headline_attribute=None`` (``trends_linear=True`` + ``L_max>=2``), BR/DR still extracted ``overall_att`` (NaN by design) and narrated it via the estimation- failure path, producing internally inconsistent output — the ``target_parameter`` block said "no scalar aggregate; consult linear_trends_effects" while the headline prose told users to inspect rank deficiency. Fix (both surfaces): - BR ``_build_schema``: compute ``target_parameter`` BEFORE ``_extract_headline``; if the aggregation tag is ``no_scalar_headline``, route through a dedicated headline block with ``status="no_scalar_by_design"`` / ``effect=None`` / ``sign="none"`` and an explicit ``reason`` field naming the ``linear_trends_effects`` alternative. - BR ``_render_headline_sentence``: detect ``status == "no_scalar_by_design"`` and emit explicit "does not produce a scalar aggregate effect ... by design" prose instead of the non-finite / estimation-failure sentence. - BR ``_build_caveats``: the existing ``sign == "undefined"`` estimation-failure caveat does not fire because we emit ``sign == "none"`` (not ``"undefined"``) on the no-scalar case. - DR ``_execute``: analogous headline-metric short-circuit with ``status="no_scalar_by_design"`` on detection of the no_scalar_headline tag. - DR ``_render_overall_interpretation``: explicit no-scalar sentence takes precedence over the non-finite estimation-failure branch. P2 (Wooldridge method awareness): the Wooldridge branch previously labeled every fit as ASF-based, but REGISTRY.md Sec. WooldridgeDiD splits OLS ETWFE (observation-count-weighted average of ATT(g,t) from a saturated regression) from the nonlinear (logit / Poisson) ASF path. Branch on ``results.method`` ("ols" -> coefficient- aggregation wording; other -> ASF wording). Tests: added 4 end-to-end regressions. - ``test_dcdh_trends_linear_no_scalar_propagates_through_br``: real dCDH fit with ``trends_linear=True`` + ``L_max=2``; asserts BR schema emits ``status="no_scalar_by_design"``, summary prose contains "no scalar" / "does not produce a scalar", does NOT contain "rank deficiency" / "estimation failed", and caveats do NOT include ``estimation_failure``. - ``test_dcdh_trends_linear_no_scalar_propagates_through_dr``: mirror on the DR side (``headline_metric`` status and ``overall_interpretation`` prose). - ``test_wooldridge_ols``: asserts the OLS branch names ATT(g,t) aggregation and does NOT include "ASF" in the name. - ``test_wooldridge_nonlinear``: asserts logit/poisson routes through the ASF branch. 336 BR/DR tests pass. Black and ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 450c592 commit a8e1719

4 files changed

Lines changed: 229 additions & 15 deletions

File tree

diff_diff/_reporting_helpers.py

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -218,19 +218,56 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
218218
}
219219

220220
if name == "WooldridgeDiDResults":
221+
# PR #347 R4 P2: Wooldridge ETWFE has two identification paths
222+
# (REGISTRY.md splits them at Sec. WooldridgeDiD): the OLS
223+
# path computes ``overall_att`` as an observation-count-
224+
# weighted aggregation of ``ATT(g, t)`` coefficients from the
225+
# saturated regression, while the nonlinear (logit / Poisson)
226+
# paths produce an ASF-based ATT from the average-structural-
227+
# function contrast. ``WooldridgeDiDResults.method`` persists
228+
# the choice; branch on it so OLS fits aren't mislabeled with
229+
# nonlinear-ASF wording.
230+
method = getattr(results, "method", "ols")
231+
if method == "ols":
232+
return {
233+
"name": (
234+
"overall ATT (observation-count-weighted average of "
235+
"ATT(g,t) from saturated OLS ETWFE)"
236+
),
237+
"definition": (
238+
"The overall ATT under OLS ETWFE (Wooldridge 2023): the "
239+
"saturated regression fits cohort x time ATT(g, t) "
240+
"coefficients, and ``overall_att`` is their "
241+
"observation-count-weighted average across post-"
242+
'treatment cells. Calling ``.aggregate("event")`` '
243+
"populates additional event-study tables but does NOT "
244+
"change the ``overall_att`` scalar."
245+
),
246+
"aggregation": "simple",
247+
"headline_attribute": "overall_att",
248+
"reference": ("Wooldridge (2023); REGISTRY.md Sec. WooldridgeDiD (OLS path)"),
249+
}
221250
return {
222-
"name": "overall ATT (observation-count-weighted ASF ATT across cohort x time cells)",
251+
"name": (
252+
f"overall ATT (ASF-based average from Wooldridge ETWFE, " f"method={method!r})"
253+
),
223254
"definition": (
224-
"The overall ATT under Wooldridge's ETWFE: the average-structural-"
225-
"function (ASF) contrast between treated and counterfactual "
226-
"untreated outcomes, averaged across cohort x time cells with "
227-
'observation-count weights. Calling ``.aggregate("event")`` '
228-
"populates additional event-study tables but does NOT change "
229-
"the ``overall_att`` scalar."
255+
f"The overall ATT under Wooldridge ETWFE with a nonlinear "
256+
f"link function (``method={method!r}``, typically logit or "
257+
f"Poisson QMLE): the average-structural-function (ASF) "
258+
f"contrast between treated and counterfactual untreated "
259+
f"outcomes averaged across cohort x time cells with "
260+
f"observation-count weights. The ASF handles the "
261+
f"nonlinearity; OLS ETWFE uses the saturated-regression "
262+
f'coefficient path instead. Calling ``.aggregate("event")`` '
263+
f"populates additional event-study tables but does NOT "
264+
f"change the ``overall_att`` scalar."
230265
),
231266
"aggregation": "simple",
232267
"headline_attribute": "overall_att",
233-
"reference": "Wooldridge (2023); REGISTRY.md Sec. WooldridgeDiD",
268+
"reference": (
269+
"Wooldridge (2023, 2025); REGISTRY.md Sec. WooldridgeDiD " "(nonlinear / ASF path)"
270+
),
234271
}
235272

236273
if name == "EfficientDiDResults":

diff_diff/business_report.py

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -433,9 +433,44 @@ def _build_schema(self) -> Dict[str, Any]:
433433
diagnostics_results.schema if diagnostics_results is not None else None
434434
)
435435

436-
headline = self._extract_headline(dr_schema)
437-
sample = self._extract_sample()
436+
# PR #347 R4 P1: compute target_parameter BEFORE extracting
437+
# the headline so the no-scalar-by-design case
438+
# (``aggregation == "no_scalar_headline"``, e.g., dCDH
439+
# ``trends_linear=True`` with ``L_max >= 2``) can route the
440+
# headline through a dedicated branch that names the intentional
441+
# NaN rather than an estimation-failure path.
438442
target_parameter = describe_target_parameter(self._results)
443+
if target_parameter.get("aggregation") == "no_scalar_headline":
444+
headline = {
445+
"status": "no_scalar_by_design",
446+
"effect": None,
447+
"se": None,
448+
"ci_lower": None,
449+
"ci_upper": None,
450+
"alpha_was_honored": True,
451+
"alpha_override_caveat": None,
452+
"ci_level": int(round((1.0 - self._context.alpha) * 100)),
453+
"p_value": None,
454+
"is_significant": False,
455+
"near_significance_threshold": False,
456+
"unit": self._context.outcome_unit,
457+
"unit_kind": _UNIT_KINDS.get(
458+
self._context.outcome_unit.lower() if self._context.outcome_unit else "",
459+
"unknown",
460+
),
461+
"sign": "none",
462+
"breakdown_M": None,
463+
"reason": (
464+
"The fitted estimator intentionally does not produce a "
465+
"scalar overall ATT on this configuration "
466+
"(``trends_linear=True`` with ``L_max >= 2``). Per-horizon "
467+
"cumulated level effects are on "
468+
"``results.linear_trends_effects[l]``."
469+
),
470+
}
471+
else:
472+
headline = self._extract_headline(dr_schema)
473+
sample = self._extract_sample()
439474
heterogeneity = _lift_heterogeneity(dr_schema)
440475
pre_trends = _lift_pre_trends(dr_schema)
441476
sensitivity = _lift_sensitivity(dr_schema)
@@ -1931,6 +1966,22 @@ def _render_headline_sentence(schema: Dict[str, Any]) -> str:
19311966
"""
19321967
ctx = schema.get("context", {})
19331968
h = schema.get("headline", {})
1969+
# PR #347 R4 P1: the dCDH ``trends_linear=True`` + ``L_max>=2``
1970+
# configuration does not produce a scalar headline by design —
1971+
# ``overall_att`` is intentionally NaN (per
1972+
# ``chaisemartin_dhaultfoeuille.py:2828-2834``). Render explicit
1973+
# "no scalar headline by design" prose instead of routing through
1974+
# the non-finite / estimation-failure path.
1975+
if h.get("status") == "no_scalar_by_design":
1976+
treatment = ctx.get("treatment_label", "the treatment")
1977+
outcome_label = ctx.get("outcome_label", "the outcome")
1978+
treatment_sentence = _sentence_first_upper(treatment)
1979+
return (
1980+
f"{treatment_sentence} does not produce a scalar aggregate effect "
1981+
f"on {outcome_label} under this configuration (by design; see "
1982+
f"``linear_trends_effects`` for per-horizon cumulated level "
1983+
f"effects)."
1984+
)
19341985
effect = h.get("effect")
19351986
outcome = ctx.get("outcome_label", "the outcome")
19361987
treatment = ctx.get("treatment_label", "the treatment")

diff_diff/diagnostic_report.py

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -928,7 +928,35 @@ def _execute(self) -> DiagnosticReportResults:
928928
}
929929

930930
# Headline metric — best-effort across estimator types.
931-
headline = self._extract_headline_metric()
931+
# PR #347 R4 P1: the dCDH ``trends_linear=True`` + ``L_max>=2``
932+
# configuration does not produce a scalar headline by design
933+
# (``overall_att`` is intentionally NaN per
934+
# ``chaisemartin_dhaultfoeuille.py:2828-2834``). Route the
935+
# headline through a dedicated no-scalar block when the
936+
# target-parameter helper flags this case so prose does not
937+
# narrate it as an estimation failure.
938+
_tp_agg = describe_target_parameter(self._results).get("aggregation")
939+
if _tp_agg == "no_scalar_headline":
940+
headline = {
941+
"status": "no_scalar_by_design",
942+
"name": "no scalar headline (see linear_trends_effects)",
943+
"value": None,
944+
"se": None,
945+
"p_value": None,
946+
"conf_int": (None, None),
947+
"alpha": self._alpha,
948+
"is_significant": False,
949+
"sign": "none",
950+
"reason": (
951+
"The fitted estimator intentionally does not produce a "
952+
"scalar overall ATT on this configuration "
953+
"(``trends_linear=True`` with ``L_max >= 2``). Per-horizon "
954+
"cumulated level effects are on "
955+
"``results.linear_trends_effects[l]``."
956+
),
957+
}
958+
else:
959+
headline = self._extract_headline_metric()
932960

933961
# Pull suggested next steps from the practitioner workflow.
934962
next_steps = self._collect_next_steps(sections)
@@ -2979,7 +3007,19 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
29793007
ci = headline.get("conf_int") if isinstance(headline, dict) else None
29803008
p = headline.get("p_value") if isinstance(headline, dict) else None
29813009
val_finite = isinstance(val, (int, float)) and np.isfinite(val)
2982-
if val is not None and not val_finite:
3010+
# PR #347 R4 P1: if the estimator intentionally produces no scalar
3011+
# aggregate (dCDH ``trends_linear=True`` + ``L_max>=2``), route
3012+
# through explicit no-scalar prose rather than the
3013+
# estimation-failure branch below. The headline block carries
3014+
# ``status="no_scalar_by_design"`` in that case.
3015+
if isinstance(headline, dict) and headline.get("status") == "no_scalar_by_design":
3016+
sentences.append(
3017+
f"On {est}, {treatment} does not produce a scalar aggregate "
3018+
f"effect on {outcome} under this configuration (by design; "
3019+
f"see ``linear_trends_effects`` for per-horizon cumulated "
3020+
f"level effects)."
3021+
)
3022+
elif val is not None and not val_finite:
29833023
sentences.append(
29843024
f"On {est}, {treatment}'s effect on {outcome} is non-finite "
29853025
"(the estimation did not produce a usable point estimate). "

tests/test_target_parameter.py

Lines changed: 89 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,30 @@ def test_stacked(self):
9999
assert tp["headline_attribute"] == "overall_att"
100100
assert "sub-experiment" in tp["definition"].lower()
101101

102-
def test_wooldridge(self):
103-
tp = describe_target_parameter(_minimal_result("WooldridgeDiDResults"))
102+
def test_wooldridge_ols(self):
103+
"""PR #347 R4 P2: OLS Wooldridge ETWFE must not be labeled with
104+
ASF wording. The OLS path aggregates ATT(g,t) coefficients with
105+
observation-count weights; the ASF path is for nonlinear links.
106+
"""
107+
tp = describe_target_parameter(_minimal_result("WooldridgeDiDResults", method="ols"))
104108
assert tp["aggregation"] == "simple"
105109
assert tp["headline_attribute"] == "overall_att"
106-
assert "ETWFE" in tp["name"] or "ETWFE" in tp["definition"] or "ASF" in tp["name"]
110+
# OLS wording: mentions ATT(g,t) aggregation, not ASF.
111+
assert (
112+
"ATT(g,t)" in tp["name"] or "ATT(g,t)" in tp["definition"] or "OLS ETWFE" in tp["name"]
113+
)
114+
assert "ASF" not in tp["name"]
115+
116+
def test_wooldridge_nonlinear(self):
117+
"""Nonlinear (logit/Poisson) Wooldridge ETWFE uses the ASF-based
118+
ATT path — different wording, different REGISTRY reference.
119+
"""
120+
for method in ("logit", "poisson"):
121+
tp = describe_target_parameter(_minimal_result("WooldridgeDiDResults", method=method))
122+
assert tp["aggregation"] == "simple"
123+
assert tp["headline_attribute"] == "overall_att"
124+
assert "ASF" in tp["name"]
125+
assert method in tp["name"] or method in tp["definition"]
107126

108127
def test_efficient_did_pt_all(self):
109128
tp = describe_target_parameter(_minimal_result("EfficientDiDResults", pt_assumption="all"))
@@ -589,6 +608,73 @@ def test_dcdh_delta_fit_real(self):
589608
assert tp["headline_attribute"] == "overall_att"
590609
assert hasattr(fit, "overall_att")
591610

611+
def test_dcdh_trends_linear_no_scalar_propagates_through_br(self):
612+
"""PR #347 R4 P1 end-to-end: on the dCDH no-scalar
613+
configuration (``trends_linear=True`` + ``L_max>=2``), BR's
614+
``to_dict()`` headline must carry ``status="no_scalar_by_design"``
615+
and BR's summary / full report must emit explicit no-scalar
616+
prose — NOT the generic "non-finite effect / inspect the fit
617+
for rank deficiency" estimation-failure messaging.
618+
"""
619+
import warnings
620+
621+
from diff_diff import BusinessReport, ChaisemartinDHaultfoeuille
622+
623+
warnings.filterwarnings("ignore")
624+
df = self._dcdh_reversible_panel(seed=16)
625+
fit = ChaisemartinDHaultfoeuille().fit(
626+
df,
627+
outcome="outcome",
628+
group="unit",
629+
time="period",
630+
treatment="treated",
631+
L_max=2,
632+
trends_linear=True,
633+
)
634+
br = BusinessReport(fit, outcome_label="the outcome", auto_diagnostics=False)
635+
schema = br.to_dict()
636+
assert schema["headline"]["status"] == "no_scalar_by_design"
637+
assert schema["headline"]["effect"] is None
638+
# BR's summary prose must be explicit no-scalar, not
639+
# "non-finite estimate / inspect rank deficiency".
640+
summary = br.summary()
641+
assert "no scalar" in summary.lower() or "does not produce a scalar" in summary.lower()
642+
assert "rank deficiency" not in summary.lower()
643+
assert "estimation failed" not in summary.lower()
644+
# Must NOT emit the "estimation_failure" caveat either.
645+
caveats = br.caveats()
646+
topics = {c.get("topic") for c in caveats}
647+
assert "estimation_failure" not in topics
648+
649+
def test_dcdh_trends_linear_no_scalar_propagates_through_dr(self):
650+
"""Same contract on the DR side: ``headline_metric`` carries
651+
``status="no_scalar_by_design"`` and the overall-interpretation
652+
prose is explicit no-scalar, not an estimation-failure sentence.
653+
"""
654+
import warnings
655+
656+
from diff_diff import ChaisemartinDHaultfoeuille, DiagnosticReport
657+
658+
warnings.filterwarnings("ignore")
659+
df = self._dcdh_reversible_panel(seed=17)
660+
fit = ChaisemartinDHaultfoeuille().fit(
661+
df,
662+
outcome="outcome",
663+
group="unit",
664+
time="period",
665+
treatment="treated",
666+
L_max=2,
667+
trends_linear=True,
668+
)
669+
dr = DiagnosticReport(fit).run_all()
670+
schema = dr.schema
671+
assert schema["headline_metric"]["status"] == "no_scalar_by_design"
672+
# DR interpretation must not narrate estimation failure.
673+
prose = dr.interpretation
674+
assert "does not produce a scalar" in prose.lower() or "no scalar" in prose.lower()
675+
assert "rank deficiency" not in prose.lower()
676+
assert "zero effective sample" not in prose.lower()
677+
592678
def test_dcdh_trends_linear_with_l_max_geq_2_fit_real(self):
593679
"""Real ``trends_linear=True`` + ``L_max>=2`` fit: the library
594680
intentionally sets ``overall_att=NaN`` and populates the

0 commit comments

Comments
 (0)