Close BR/DR gap #6: target-parameter clarity block in schemas

igerber · claude · igerber · commit b1946fb4a1cd · 2026-04-20T18:47:57.000-04:00
Closes BR/DR foundation gap #6 from project_br_dr_foundation.md: BusinessReport and DiagnosticReport now name what the headline scalar actually represents as an estimand, for each of the 16 result classes. Baker et al. (2025) Step 2 ("define the target parameter") was previously in BR's next_steps list but not done by BR itself — this PR closes that gap. New top-level ``target_parameter`` block (additive schema change; experimental per REPORTING.md stability policy): { "name": str, # stakeholder-facing name "definition": str, # plain-English description "aggregation": str, # machine-readable dispatch tag "headline_attribute": str, # which raw result attribute "reference": str, # REGISTRY.md citation pointer } Schema placement: top-level block (user preference, selected via AskUserQuestion in planning). Aggregation tags include "simple", "event_study", "group", "2x2", "twfe", "iw", "stacked", "ddd", "staggered_ddd", "synthetic", "factor_model", "M", "l", "l_x", "l_fd", "l_x_fd", "dose_overall", "pt_all_combined", "pt_post_single_baseline", "unknown". Per-estimator dispatch lives in the new ``diff_diff/_reporting_helpers.py::describe_target_parameter`` (own module rather than business_report / diagnostic_report to avoid circular-import risk — plan-review LOW #7). All 17 result classes covered (16 from _APPLICABILITY + BaconDecompositionResults); exhaustiveness locked in by TestTargetParameterCoversEveryResultClass. Fit-time config reads: - ``EfficientDiDResults.pt_assumption`` branches the aggregation tag between pt_all_combined and pt_post_single_baseline. - ``StackedDiDResults.clean_control`` varies the definition clause (never_treated / strict / not_yet_treated). - ``ChaisemartinDHaultfoeuilleResults.L_max`` + ``covariate_residuals`` + ``linear_trends_effects`` branches the dCDH estimand between DID_M / DID_l / DID^X_l / DID^{fd}_l / DID^{X,fd}_l. Fixed-tag branches (per plan-review CRITICAL #1 and #2): - ``CallawaySantAnna`` / ``ImputationDiD`` / ``TwoStageDiD`` / ``WooldridgeDiD``: the fit-time ``aggregate`` kwarg does not change the ``overall_att`` scalar — it only populates additional horizon / group tables on the result object. Disambiguating those tables in prose is tracked under gap #9. - ``ContinuousDiDResults``: the PT-vs-SPT regime is a user-level assumption, not a library setting. Emits a single "dose_overall" tag with disjunctive definition naming both regime readings (ATT^loc under PT, ATT^glob under SPT). Prose rendering: - BR ``_render_summary``: emits "Target parameter: <name>." after the headline sentence (short name only; full definition lives in the full_report and schema). - BR ``_render_full_report``: "## Target Parameter" section between "## Headline" and "## Identifying Assumption". - DR ``_render_overall_interpretation``: mirror sentence. - DR ``_render_dr_full_report``: "## Target Parameter" section with name, definition, aggregation tag, headline attribute, and reference. Cross-surface parity: both BR and DR consume the same helper (the single source of truth), so their ``target_parameter`` blocks are byte-identical (verified by TestTargetParameterCrossSurfaceParity). Tests: 37 new (TestTargetParameterPerEstimator + TestTargetParameterFitConfigReads + TestTargetParameterCoversEveryResultClass + TestTargetParameterCrossSurfaceParity + TestTargetParameterProseRendering). Existing BR/DR top-level-key contract tests updated to include ``target_parameter``. Total 319 tests pass (282 prior + 37 new). Docs: REPORTING.md gains a "Target parameter" section documenting the per-estimator dispatch and schema shape. business_report.rst and diagnostic_report.rst note the new field with a pointer to REPORTING.md. CHANGELOG entry under Unreleased. Out of scope: REGISTRY.md per-estimator "Target parameter" sub-sections (plan-review additional-note); the reporting-layer doc in REPORTING.md is the current source of truth. A follow-up docs PR can land those sub-sections if maintainers want the registry to own the canonical wording directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **`target_parameter` block in BR/DR schemas (experimental)** — BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
+
 ## [3.2.0] - 2026-04-19
 
 ### Added
diff --git a/diff_diff/_reporting_helpers.py b/diff_diff/_reporting_helpers.py
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
@@ -42,6 +42,7 @@
 
 import numpy as np
 
+from diff_diff._reporting_helpers import describe_target_parameter
 from diff_diff.diagnostic_report import DiagnosticReport, DiagnosticReportResults
 
 BUSINESS_REPORT_SCHEMA_VERSION = "1.0"
@@ -434,6 +435,7 @@ def _build_schema(self) -> Dict[str, Any]:
 
         headline = self._extract_headline(dr_schema)
         sample = self._extract_sample()
+        target_parameter = describe_target_parameter(self._results)
         heterogeneity = _lift_heterogeneity(dr_schema)
         pre_trends = _lift_pre_trends(dr_schema)
         sensitivity = _lift_sensitivity(dr_schema)
@@ -475,6 +477,7 @@ def _build_schema(self) -> Dict[str, Any]:
                 "alpha": self._context.alpha,
             },
             "headline": headline,
+            "target_parameter": target_parameter,
             "assumption": assumption,
             "pre_trends": pre_trends,
             "sensitivity": sensitivity,
@@ -1993,6 +1996,17 @@ def _render_summary(schema: Dict[str, Any]) -> str:
 
     # Headline sentence with significance phrase.
     sentences.append(_render_headline_sentence(schema))
+    # BR/DR gap #6 (target-parameter clarity): name what the headline
+    # scalar actually represents so the stakeholder can map the number
+    # to a specific estimand. Rendered immediately after the headline
+    # and before the significance phrase. The summary surfaces only
+    # the short ``name`` so the paragraph stays within the
+    # 6-10-sentence target; ``definition`` lives in the full report
+    # and in the structured schema for agents that want the long form.
+    tp = schema.get("target_parameter", {}) or {}
+    tp_name = tp.get("name")
+    if tp_name:
+        sentences.append(f"Target parameter: {tp_name}.")
     h = schema.get("headline", {})
     p = h.get("p_value")
     alpha = ctx.get("alpha", 0.05)
@@ -2314,6 +2328,21 @@ def _render_full_report(schema: Dict[str, Any]) -> str:
         lines.append(f"Statistically, {_significance_phrase(p, alpha)}.")
     lines.append("")
 
+    # Target parameter (BR/DR gap #6): name what the headline scalar
+    # represents so the stakeholder can map the number to a specific
+    # estimand. Rendered between "Headline" and "Identifying Assumption"
+    # because the target parameter is about what the scalar IS, whereas
+    # identifying assumption is about what makes it valid.
+    tp = schema.get("target_parameter", {}) or {}
+    if tp.get("name") or tp.get("definition"):
+        lines.append("## Target Parameter")
+        lines.append("")
+        if tp.get("name"):
+            lines.append(f"- **{tp['name']}**")
+        if tp.get("definition"):
+            lines.append(f"- {tp['definition']}")
+        lines.append("")
+
     # Identifying assumption
     lines.append("## Identifying Assumption")
     lines.append("")
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
@@ -38,6 +38,8 @@
 import numpy as np
 import pandas as pd
 
+from diff_diff._reporting_helpers import describe_target_parameter  # noqa: E402 (top-level import)
+
 DIAGNOSTIC_REPORT_SCHEMA_VERSION = "1.0"
 
 __all__ = [
@@ -962,6 +964,7 @@ def _execute(self) -> DiagnosticReportResults:
             "schema_version": DIAGNOSTIC_REPORT_SCHEMA_VERSION,
             "estimator": type(self._results).__name__,
             "headline_metric": headline,
+            "target_parameter": describe_target_parameter(self._results),
             "parallel_trends": sections["parallel_trends"],
             "pretrends_power": sections["pretrends_power"],
             "sensitivity": sections["sensitivity"],
@@ -3003,7 +3006,19 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
             f"On {est}, {treatment} {direction} {outcome} by {val:.3g}{ci_str}{p_str}."
         )
 
-    # Sentence 2: parallel trends + power (method-aware prose per the
+    # Sentence 2: name the target parameter (BR/DR gap #6). Rendered
+    # right after the headline so the reader sees what the scalar
+    # represents before pre-trends / sensitivity context. Only the
+    # terse ``name`` goes in the interpretation paragraph; the full
+    # ``definition`` lives in DR's "## Target Parameter" markdown
+    # section and in the structured ``schema["target_parameter"]``
+    # dict for agents that want the long form.
+    tp = schema.get("target_parameter") or {}
+    tp_name = tp.get("name")
+    if tp_name:
+        sentences.append(f"Target parameter: {tp_name}.")
+
+    # Sentence 3: parallel trends + power (method-aware prose per the
     # round-8 CI review on PR #318; PT method can be slope_difference
     # (2x2), joint_wald / bonferroni (event study), hausman (EfficientDiD
     # PT-All vs PT-Post), synthetic_fit (SDiD), or factor (TROP), and the
@@ -3221,6 +3236,25 @@ def _render_dr_full_report(results: "DiagnosticReportResults") -> str:
             f"(SE {headline.get('se')}, p = {headline.get('p_value')})"
         )
     lines.append("")
+
+    # BR/DR gap #6: target-parameter section between headline metadata
+    # and the overall-interpretation paragraph.
+    tp = schema.get("target_parameter") or {}
+    if tp.get("name") or tp.get("definition"):
+        lines.append("## Target Parameter")
+        lines.append("")
+        if tp.get("name"):
+            lines.append(f"- **{tp['name']}**")
+        if tp.get("definition"):
+            lines.append(f"- {tp['definition']}")
+        if tp.get("aggregation"):
+            lines.append(f"- Aggregation tag: `{tp['aggregation']}`")
+        if tp.get("headline_attribute"):
+            lines.append(f"- Headline attribute: `{tp['headline_attribute']}`")
+        if tp.get("reference"):
+            lines.append(f"- Reference: {tp['reference']}")
+        lines.append("")
+
     lines.append("## Overall Interpretation")
     lines.append("")
     lines.append(schema.get("overall_interpretation", "") or "_No synthesis available._")
diff --git a/docs/api/business_report.rst b/docs/api/business_report.rst
@@ -49,6 +49,13 @@ Methodology deviations (no traffic-light gates, pre-trends verdict
 thresholds, power-aware phrasing, unit-translation policy, schema
 stability) are documented in :doc:`../methodology/REPORTING`.
 
+The schema carries a top-level ``target_parameter`` block
+(experimental) naming what the headline scalar represents per
+estimator — simple ATT, event-study average, DID_M, DID_l,
+dose-response aggregate, factor-model residual, etc. See the
+"Target parameter" section of :doc:`../methodology/REPORTING` for
+the per-estimator dispatch and schema shape.
+
 Example
 -------
 
diff --git a/docs/api/diagnostic_report.rst b/docs/api/diagnostic_report.rst
@@ -15,6 +15,12 @@ Methodology deviations (no traffic-light gates, opt-in placebo
 battery, estimator-native diagnostic routing, power-aware phrasing
 threshold) are documented in :doc:`../methodology/REPORTING`.
 
+The schema carries a top-level ``target_parameter`` block
+(experimental) naming what the headline scalar represents per
+estimator. See the "Target parameter" section of
+:doc:`../methodology/REPORTING` for the per-estimator dispatch and
+schema shape.
+
 Data-dependent checks (2x2 parallel trends on simple DiD,
 Goodman-Bacon decomposition on staggered estimators, the EfficientDiD
 Hausman PT-All vs PT-Post pretest) require the raw panel + column
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
@@ -53,6 +53,89 @@ effects, pre-period and reference-marker rows excluded). These are
 reporting-layer aggregations of inputs already in the result object,
 not new inference.
 
+## Target parameter
+
+The BusinessReport and DiagnosticReport schemas both carry a
+top-level `target_parameter` block that names what scalar the
+headline number actually represents. The 16 result classes have
+meaningfully different estimands — a stakeholder reading
+`overall_att = -0.0214` on a Callaway-Sant'Anna fit cannot tell
+whether that is the simple-weighted average across `ATT(g,t)`
+cells, an event-study-weighted aggregate, or a group-weighted
+aggregate. Baker et al. (2025) Step 2 is "Define the target
+parameter"; BR/DR does that work for the user.
+
+Schema shape:
+
+```json
+"target_parameter": {
+  "name": "overall ATT (cohort-size-weighted average of ATT(g,t))",
+  "definition": "A cohort-size-weighted average of group-time ATTs ...",
+  "aggregation": "simple",
+  "headline_attribute": "overall_att",
+  "reference": "Callaway & Sant'Anna (2021); REGISTRY.md Sec. CallawaySantAnna"
+}
+```
+
+Field semantics:
+
+- `name` — short stakeholder-facing name. Rendered verbatim in
+  BR's summary paragraph and DR's overall-interpretation
+  paragraph. Always non-empty.
+- `definition` — plain-English description of what the scalar is
+  and how it is aggregated. Rendered in BR's and DR's full-report
+  markdown (under "## Target Parameter") but omitted from the
+  summary paragraph so stakeholder prose stays within the 6-10-
+  sentence target.
+- `aggregation` — machine-readable tag dispatching agents can
+  branch on: `"simple"`, `"event_study"`, `"group"`, `"2x2"`,
+  `"twfe"`, `"iw"`, `"stacked"`, `"ddd"`, `"staggered_ddd"`,
+  `"synthetic"`, `"factor_model"`, `"M"`, `"l"`, `"l_x"`,
+  `"l_fd"`, `"l_x_fd"`, `"dose_overall"`,
+  `"pt_all_combined"`, `"pt_post_single_baseline"`, `"unknown"`.
+- `headline_attribute` — the raw result attribute the scalar
+  comes from (`"overall_att"` / `"att"` / `"avg_att"` /
+  `"twfe_estimate"`). Different result classes use different
+  attribute names; agents that want to re-read the raw value
+  can dispatch on this.
+- `reference` — one-line citation pointer to the canonical paper
+  and the REGISTRY.md section.
+
+Per-estimator dispatch lives in
+`diff_diff/_reporting_helpers.py::describe_target_parameter`. Each
+branch is sourced from the corresponding estimator's section in
+REGISTRY.md; new result classes must add an explicit branch (the
+exhaustiveness test `TestTargetParameterCoversEveryResultClass`
+locks this in).
+
+A few branches read fit-time config from the result object:
+
+- `EfficientDiDResults.pt_assumption`: `"all"` (over-identified
+  combined) vs `"post"` (just-identified single-baseline) branches
+  `aggregation` between `"pt_all_combined"` and
+  `"pt_post_single_baseline"`.
+- `StackedDiDResults.clean_control`: `"never_treated"` /
+  `"strict"` / `"not_yet_treated"` varies the `definition` clause
+  describing which units qualify as controls.
+- `ChaisemartinDHaultfoeuilleResults.L_max` +
+  `covariate_residuals` + `linear_trends_effects`: branches the
+  dCDH estimand tag between `DID_M` / `DID_l` / `DID^X_l` /
+  `DID^{fd}_l` / `DID^{X,fd}_l`.
+
+A few branches emit a fixed tag regardless of fit-time config —
+notably `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, and
+`WooldridgeDiD`. For these estimators the `overall_att`
+(or `att` / `avg_att`) scalar is ALWAYS the simple weighted
+aggregation; the fit-time `aggregate` kwarg populates additional
+horizon / group tables on the result object but does not change
+the headline scalar. Disambiguating those tables in prose is
+tracked under BR/DR gap #9 (per-cohort narrative rendering).
+
+`ContinuousDiDResults` emits a single `"dose_overall"` tag with a
+disjunctive definition (`ATT^loc` under PT; `ATT^glob` under
+SPT) because the PT-vs-SPT regime is a user-level assumption, not
+a library setting.
+
 ## Design deviations
 
 - **Note:** No hard pass/fail gates. `DiagnosticReport` does not produce
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
@@ -49,6 +49,7 @@
     "estimator",
     "context",
     "headline",
+    "target_parameter",
     "assumption",
     "pre_trends",
     "sensitivity",
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
@@ -48,6 +48,7 @@
     "schema_version",
     "estimator",
     "headline_metric",
+    "target_parameter",
     "parallel_trends",
     "pretrends_power",
     "sensitivity",
diff --git a/tests/test_target_parameter.py b/tests/test_target_parameter.py