Skip to content

Commit b1946fb

Browse files
igerberclaude
andcommitted
Close BR/DR gap #6: target-parameter clarity block in schemas
Closes BR/DR foundation gap #6 from project_br_dr_foundation.md: BusinessReport and DiagnosticReport now name what the headline scalar actually represents as an estimand, for each of the 16 result classes. Baker et al. (2025) Step 2 ("define the target parameter") was previously in BR's next_steps list but not done by BR itself — this PR closes that gap. New top-level ``target_parameter`` block (additive schema change; experimental per REPORTING.md stability policy): { "name": str, # stakeholder-facing name "definition": str, # plain-English description "aggregation": str, # machine-readable dispatch tag "headline_attribute": str, # which raw result attribute "reference": str, # REGISTRY.md citation pointer } Schema placement: top-level block (user preference, selected via AskUserQuestion in planning). Aggregation tags include "simple", "event_study", "group", "2x2", "twfe", "iw", "stacked", "ddd", "staggered_ddd", "synthetic", "factor_model", "M", "l", "l_x", "l_fd", "l_x_fd", "dose_overall", "pt_all_combined", "pt_post_single_baseline", "unknown". Per-estimator dispatch lives in the new ``diff_diff/_reporting_helpers.py::describe_target_parameter`` (own module rather than business_report / diagnostic_report to avoid circular-import risk — plan-review LOW #7). All 17 result classes covered (16 from _APPLICABILITY + BaconDecompositionResults); exhaustiveness locked in by TestTargetParameterCoversEveryResultClass. Fit-time config reads: - ``EfficientDiDResults.pt_assumption`` branches the aggregation tag between pt_all_combined and pt_post_single_baseline. - ``StackedDiDResults.clean_control`` varies the definition clause (never_treated / strict / not_yet_treated). - ``ChaisemartinDHaultfoeuilleResults.L_max`` + ``covariate_residuals`` + ``linear_trends_effects`` branches the dCDH estimand between DID_M / DID_l / DID^X_l / DID^{fd}_l / DID^{X,fd}_l. Fixed-tag branches (per plan-review CRITICAL #1 and #2): - ``CallawaySantAnna`` / ``ImputationDiD`` / ``TwoStageDiD`` / ``WooldridgeDiD``: the fit-time ``aggregate`` kwarg does not change the ``overall_att`` scalar — it only populates additional horizon / group tables on the result object. Disambiguating those tables in prose is tracked under gap #9. - ``ContinuousDiDResults``: the PT-vs-SPT regime is a user-level assumption, not a library setting. Emits a single "dose_overall" tag with disjunctive definition naming both regime readings (ATT^loc under PT, ATT^glob under SPT). Prose rendering: - BR ``_render_summary``: emits "Target parameter: <name>." after the headline sentence (short name only; full definition lives in the full_report and schema). - BR ``_render_full_report``: "## Target Parameter" section between "## Headline" and "## Identifying Assumption". - DR ``_render_overall_interpretation``: mirror sentence. - DR ``_render_dr_full_report``: "## Target Parameter" section with name, definition, aggregation tag, headline attribute, and reference. Cross-surface parity: both BR and DR consume the same helper (the single source of truth), so their ``target_parameter`` blocks are byte-identical (verified by TestTargetParameterCrossSurfaceParity). Tests: 37 new (TestTargetParameterPerEstimator + TestTargetParameterFitConfigReads + TestTargetParameterCoversEveryResultClass + TestTargetParameterCrossSurfaceParity + TestTargetParameterProseRendering). Existing BR/DR top-level-key contract tests updated to include ``target_parameter``. Total 319 tests pass (282 prior + 37 new). Docs: REPORTING.md gains a "Target parameter" section documenting the per-estimator dispatch and schema shape. business_report.rst and diagnostic_report.rst note the new field with a pointer to REPORTING.md. CHANGELOG entry under Unreleased. Out of scope: REGISTRY.md per-estimator "Target parameter" sub-sections (plan-review additional-note); the reporting-layer doc in REPORTING.md is the current source of truth. A follow-up docs PR can land those sub-sections if maintainers want the registry to own the canonical wording directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9146f1e commit b1946fb

10 files changed

Lines changed: 950 additions & 1 deletion

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- **`target_parameter` block in BR/DR schemas (experimental)** — BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
12+
1013
## [3.2.0] - 2026-04-19
1114

1215
### Added

diff_diff/_reporting_helpers.py

Lines changed: 439 additions & 0 deletions
Large diffs are not rendered by default.

diff_diff/business_report.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242

4343
import numpy as np
4444

45+
from diff_diff._reporting_helpers import describe_target_parameter
4546
from diff_diff.diagnostic_report import DiagnosticReport, DiagnosticReportResults
4647

4748
BUSINESS_REPORT_SCHEMA_VERSION = "1.0"
@@ -434,6 +435,7 @@ def _build_schema(self) -> Dict[str, Any]:
434435

435436
headline = self._extract_headline(dr_schema)
436437
sample = self._extract_sample()
438+
target_parameter = describe_target_parameter(self._results)
437439
heterogeneity = _lift_heterogeneity(dr_schema)
438440
pre_trends = _lift_pre_trends(dr_schema)
439441
sensitivity = _lift_sensitivity(dr_schema)
@@ -475,6 +477,7 @@ def _build_schema(self) -> Dict[str, Any]:
475477
"alpha": self._context.alpha,
476478
},
477479
"headline": headline,
480+
"target_parameter": target_parameter,
478481
"assumption": assumption,
479482
"pre_trends": pre_trends,
480483
"sensitivity": sensitivity,
@@ -1993,6 +1996,17 @@ def _render_summary(schema: Dict[str, Any]) -> str:
19931996

19941997
# Headline sentence with significance phrase.
19951998
sentences.append(_render_headline_sentence(schema))
1999+
# BR/DR gap #6 (target-parameter clarity): name what the headline
2000+
# scalar actually represents so the stakeholder can map the number
2001+
# to a specific estimand. Rendered immediately after the headline
2002+
# and before the significance phrase. The summary surfaces only
2003+
# the short ``name`` so the paragraph stays within the
2004+
# 6-10-sentence target; ``definition`` lives in the full report
2005+
# and in the structured schema for agents that want the long form.
2006+
tp = schema.get("target_parameter", {}) or {}
2007+
tp_name = tp.get("name")
2008+
if tp_name:
2009+
sentences.append(f"Target parameter: {tp_name}.")
19962010
h = schema.get("headline", {})
19972011
p = h.get("p_value")
19982012
alpha = ctx.get("alpha", 0.05)
@@ -2314,6 +2328,21 @@ def _render_full_report(schema: Dict[str, Any]) -> str:
23142328
lines.append(f"Statistically, {_significance_phrase(p, alpha)}.")
23152329
lines.append("")
23162330

2331+
# Target parameter (BR/DR gap #6): name what the headline scalar
2332+
# represents so the stakeholder can map the number to a specific
2333+
# estimand. Rendered between "Headline" and "Identifying Assumption"
2334+
# because the target parameter is about what the scalar IS, whereas
2335+
# identifying assumption is about what makes it valid.
2336+
tp = schema.get("target_parameter", {}) or {}
2337+
if tp.get("name") or tp.get("definition"):
2338+
lines.append("## Target Parameter")
2339+
lines.append("")
2340+
if tp.get("name"):
2341+
lines.append(f"- **{tp['name']}**")
2342+
if tp.get("definition"):
2343+
lines.append(f"- {tp['definition']}")
2344+
lines.append("")
2345+
23172346
# Identifying assumption
23182347
lines.append("## Identifying Assumption")
23192348
lines.append("")

diff_diff/diagnostic_report.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838
import numpy as np
3939
import pandas as pd
4040

41+
from diff_diff._reporting_helpers import describe_target_parameter # noqa: E402 (top-level import)
42+
4143
DIAGNOSTIC_REPORT_SCHEMA_VERSION = "1.0"
4244

4345
__all__ = [
@@ -962,6 +964,7 @@ def _execute(self) -> DiagnosticReportResults:
962964
"schema_version": DIAGNOSTIC_REPORT_SCHEMA_VERSION,
963965
"estimator": type(self._results).__name__,
964966
"headline_metric": headline,
967+
"target_parameter": describe_target_parameter(self._results),
965968
"parallel_trends": sections["parallel_trends"],
966969
"pretrends_power": sections["pretrends_power"],
967970
"sensitivity": sections["sensitivity"],
@@ -3003,7 +3006,19 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
30033006
f"On {est}, {treatment} {direction} {outcome} by {val:.3g}{ci_str}{p_str}."
30043007
)
30053008

3006-
# Sentence 2: parallel trends + power (method-aware prose per the
3009+
# Sentence 2: name the target parameter (BR/DR gap #6). Rendered
3010+
# right after the headline so the reader sees what the scalar
3011+
# represents before pre-trends / sensitivity context. Only the
3012+
# terse ``name`` goes in the interpretation paragraph; the full
3013+
# ``definition`` lives in DR's "## Target Parameter" markdown
3014+
# section and in the structured ``schema["target_parameter"]``
3015+
# dict for agents that want the long form.
3016+
tp = schema.get("target_parameter") or {}
3017+
tp_name = tp.get("name")
3018+
if tp_name:
3019+
sentences.append(f"Target parameter: {tp_name}.")
3020+
3021+
# Sentence 3: parallel trends + power (method-aware prose per the
30073022
# round-8 CI review on PR #318; PT method can be slope_difference
30083023
# (2x2), joint_wald / bonferroni (event study), hausman (EfficientDiD
30093024
# PT-All vs PT-Post), synthetic_fit (SDiD), or factor (TROP), and the
@@ -3221,6 +3236,25 @@ def _render_dr_full_report(results: "DiagnosticReportResults") -> str:
32213236
f"(SE {headline.get('se')}, p = {headline.get('p_value')})"
32223237
)
32233238
lines.append("")
3239+
3240+
# BR/DR gap #6: target-parameter section between headline metadata
3241+
# and the overall-interpretation paragraph.
3242+
tp = schema.get("target_parameter") or {}
3243+
if tp.get("name") or tp.get("definition"):
3244+
lines.append("## Target Parameter")
3245+
lines.append("")
3246+
if tp.get("name"):
3247+
lines.append(f"- **{tp['name']}**")
3248+
if tp.get("definition"):
3249+
lines.append(f"- {tp['definition']}")
3250+
if tp.get("aggregation"):
3251+
lines.append(f"- Aggregation tag: `{tp['aggregation']}`")
3252+
if tp.get("headline_attribute"):
3253+
lines.append(f"- Headline attribute: `{tp['headline_attribute']}`")
3254+
if tp.get("reference"):
3255+
lines.append(f"- Reference: {tp['reference']}")
3256+
lines.append("")
3257+
32243258
lines.append("## Overall Interpretation")
32253259
lines.append("")
32263260
lines.append(schema.get("overall_interpretation", "") or "_No synthesis available._")

docs/api/business_report.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,13 @@ Methodology deviations (no traffic-light gates, pre-trends verdict
4949
thresholds, power-aware phrasing, unit-translation policy, schema
5050
stability) are documented in :doc:`../methodology/REPORTING`.
5151

52+
The schema carries a top-level ``target_parameter`` block
53+
(experimental) naming what the headline scalar represents per
54+
estimator — simple ATT, event-study average, DID_M, DID_l,
55+
dose-response aggregate, factor-model residual, etc. See the
56+
"Target parameter" section of :doc:`../methodology/REPORTING` for
57+
the per-estimator dispatch and schema shape.
58+
5259
Example
5360
-------
5461

docs/api/diagnostic_report.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@ Methodology deviations (no traffic-light gates, opt-in placebo
1515
battery, estimator-native diagnostic routing, power-aware phrasing
1616
threshold) are documented in :doc:`../methodology/REPORTING`.
1717

18+
The schema carries a top-level ``target_parameter`` block
19+
(experimental) naming what the headline scalar represents per
20+
estimator. See the "Target parameter" section of
21+
:doc:`../methodology/REPORTING` for the per-estimator dispatch and
22+
schema shape.
23+
1824
Data-dependent checks (2x2 parallel trends on simple DiD,
1925
Goodman-Bacon decomposition on staggered estimators, the EfficientDiD
2026
Hausman PT-All vs PT-Post pretest) require the raw panel + column

docs/methodology/REPORTING.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,89 @@ effects, pre-period and reference-marker rows excluded). These are
5353
reporting-layer aggregations of inputs already in the result object,
5454
not new inference.
5555

56+
## Target parameter
57+
58+
The BusinessReport and DiagnosticReport schemas both carry a
59+
top-level `target_parameter` block that names what scalar the
60+
headline number actually represents. The 16 result classes have
61+
meaningfully different estimands — a stakeholder reading
62+
`overall_att = -0.0214` on a Callaway-Sant'Anna fit cannot tell
63+
whether that is the simple-weighted average across `ATT(g,t)`
64+
cells, an event-study-weighted aggregate, or a group-weighted
65+
aggregate. Baker et al. (2025) Step 2 is "Define the target
66+
parameter"; BR/DR does that work for the user.
67+
68+
Schema shape:
69+
70+
```json
71+
"target_parameter": {
72+
"name": "overall ATT (cohort-size-weighted average of ATT(g,t))",
73+
"definition": "A cohort-size-weighted average of group-time ATTs ...",
74+
"aggregation": "simple",
75+
"headline_attribute": "overall_att",
76+
"reference": "Callaway & Sant'Anna (2021); REGISTRY.md Sec. CallawaySantAnna"
77+
}
78+
```
79+
80+
Field semantics:
81+
82+
- `name` — short stakeholder-facing name. Rendered verbatim in
83+
BR's summary paragraph and DR's overall-interpretation
84+
paragraph. Always non-empty.
85+
- `definition` — plain-English description of what the scalar is
86+
and how it is aggregated. Rendered in BR's and DR's full-report
87+
markdown (under "## Target Parameter") but omitted from the
88+
summary paragraph so stakeholder prose stays within the 6-10-
89+
sentence target.
90+
- `aggregation` — machine-readable tag dispatching agents can
91+
branch on: `"simple"`, `"event_study"`, `"group"`, `"2x2"`,
92+
`"twfe"`, `"iw"`, `"stacked"`, `"ddd"`, `"staggered_ddd"`,
93+
`"synthetic"`, `"factor_model"`, `"M"`, `"l"`, `"l_x"`,
94+
`"l_fd"`, `"l_x_fd"`, `"dose_overall"`,
95+
`"pt_all_combined"`, `"pt_post_single_baseline"`, `"unknown"`.
96+
- `headline_attribute` — the raw result attribute the scalar
97+
comes from (`"overall_att"` / `"att"` / `"avg_att"` /
98+
`"twfe_estimate"`). Different result classes use different
99+
attribute names; agents that want to re-read the raw value
100+
can dispatch on this.
101+
- `reference` — one-line citation pointer to the canonical paper
102+
and the REGISTRY.md section.
103+
104+
Per-estimator dispatch lives in
105+
`diff_diff/_reporting_helpers.py::describe_target_parameter`. Each
106+
branch is sourced from the corresponding estimator's section in
107+
REGISTRY.md; new result classes must add an explicit branch (the
108+
exhaustiveness test `TestTargetParameterCoversEveryResultClass`
109+
locks this in).
110+
111+
A few branches read fit-time config from the result object:
112+
113+
- `EfficientDiDResults.pt_assumption`: `"all"` (over-identified
114+
combined) vs `"post"` (just-identified single-baseline) branches
115+
`aggregation` between `"pt_all_combined"` and
116+
`"pt_post_single_baseline"`.
117+
- `StackedDiDResults.clean_control`: `"never_treated"` /
118+
`"strict"` / `"not_yet_treated"` varies the `definition` clause
119+
describing which units qualify as controls.
120+
- `ChaisemartinDHaultfoeuilleResults.L_max` +
121+
`covariate_residuals` + `linear_trends_effects`: branches the
122+
dCDH estimand tag between `DID_M` / `DID_l` / `DID^X_l` /
123+
`DID^{fd}_l` / `DID^{X,fd}_l`.
124+
125+
A few branches emit a fixed tag regardless of fit-time config —
126+
notably `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, and
127+
`WooldridgeDiD`. For these estimators the `overall_att`
128+
(or `att` / `avg_att`) scalar is ALWAYS the simple weighted
129+
aggregation; the fit-time `aggregate` kwarg populates additional
130+
horizon / group tables on the result object but does not change
131+
the headline scalar. Disambiguating those tables in prose is
132+
tracked under BR/DR gap #9 (per-cohort narrative rendering).
133+
134+
`ContinuousDiDResults` emits a single `"dose_overall"` tag with a
135+
disjunctive definition (`ATT^loc` under PT; `ATT^glob` under
136+
SPT) because the PT-vs-SPT regime is a user-level assumption, not
137+
a library setting.
138+
56139
## Design deviations
57140

58141
- **Note:** No hard pass/fail gates. `DiagnosticReport` does not produce

tests/test_business_report.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
"estimator",
5050
"context",
5151
"headline",
52+
"target_parameter",
5253
"assumption",
5354
"pre_trends",
5455
"sensitivity",

tests/test_diagnostic_report.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
"schema_version",
4949
"estimator",
5050
"headline_metric",
51+
"target_parameter",
5152
"parallel_trends",
5253
"pretrends_power",
5354
"sensitivity",

0 commit comments

Comments
 (0)