Address PR #347 R7: bump schema versions to 2.0 + EfficientDiD library vs ES_avg note

igerber · claude · igerber · commit fdaf94d81627 · 2026-04-21T05:58:03.000-04:00
Two P1 findings from R7, both addressed. P1 #1 (schema version bump): the new ``headline.status`` / ``headline_metric.status`` value ``"no_scalar_by_design"`` added in R4 for the dCDH ``trends_linear=True, L_max>=2`` configuration is a breaking change per REPORTING.md stability policy (new status-enum values are breaking — agents doing exhaustive match will break on unknown enums). Bumped ``BUSINESS_REPORT_SCHEMA_VERSION`` and ``DIAGNOSTIC_REPORT_SCHEMA_VERSION`` from ``"1.0"`` to ``"2.0"``, updated the in-tree schema-version tests (one explicit ``== "1.0"`` assertion and six ``"schema_version": "1.0"`` stub dicts in BR / DR test files), added a REPORTING.md "Schema version 2.0" note, and documented the bump in the CHANGELOG Unreleased entry. The schemas remain marked experimental so the formal deprecation policy does not yet apply. P1 #2 (EfficientDiD library vs paper estimand): both EfficientDiD branches now explicitly state that BR/DR's headline ``overall_att`` is the library's cohort-size-weighted average over post-treatment ``(g, t)`` cells, NOT the paper's ``ES_avg`` uniform event-time average. The regime (PT-All / PT-Post) describes identification; the aggregation choice is a separate library-level policy that REGISTRY.md Sec. EfficientDiD documents. Added ``cohort-size-weighted`` + ``ES_avg`` / ``post-treatment`` assertions to ``test_efficient_did_pt_all`` and ``test_efficient_did_pt_post`` so the wording is pinned. 354 BR/DR + guide + target-parameter tests pass. Black and ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-- **`target_parameter` block in BR/DR schemas (experimental)** — BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
+- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
 
 ## [3.2.0] - 2026-04-19
 
diff --git a/diff_diff/_reporting_helpers.py b/diff_diff/_reporting_helpers.py
@@ -275,7 +275,23 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
         }
 
     if name == "EfficientDiDResults":
+        # PR #347 R7 P1: the BR/DR headline ``overall_att`` is the
+        # library's cohort-size-weighted average over post-treatment
+        # ``(g, t)`` cells (see ``efficient_did.py`` around line 1274
+        # and REGISTRY.md Sec. EfficientDiD). This is distinct from
+        # the paper's ``ES_avg`` uniform event-time average.
+        # Disambiguating this in the stakeholder-facing definition
+        # keeps the user from mistaking one for the other — the
+        # regime (PT-All vs PT-Post) describes identification, not
+        # the aggregation choice for the headline scalar.
         pt_assumption = getattr(results, "pt_assumption", "all")
+        library_aggregation_note = (
+            " The BR/DR headline ``overall_att`` is the library's "
+            "cohort-size-weighted average of ATT(g, t) over post-"
+            "treatment cells, NOT the paper's ``ES_avg`` uniform event-"
+            "time average (see REGISTRY.md Sec. EfficientDiD for the "
+            "distinction)."
+        )
         if pt_assumption == "post":
             return {
                 "name": "overall ATT under PT-Post (single-baseline)",
@@ -284,7 +300,7 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
                     "regime (parallel trends hold only in post-treatment "
                     "periods). The baseline is period ``g - 1`` only; the "
                     "estimator is just-identified and reduces to standard "
-                    "single-baseline DiD (Corollary 3.2)."
+                    "single-baseline DiD (Corollary 3.2)." + library_aggregation_note
                 ),
                 "aggregation": "pt_post_single_baseline",
                 "headline_attribute": "overall_att",
@@ -297,7 +313,7 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
                 "(parallel trends hold for all groups and all periods). The "
                 "estimator is over-identified (Lemma 2.1) and applies "
                 "optimal-combination weights to achieve the semiparametric "
-                "efficiency bound on the no-covariate path."
+                "efficiency bound on the no-covariate path." + library_aggregation_note
             ),
             "aggregation": "pt_all_combined",
             "headline_attribute": "overall_att",
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
@@ -45,7 +45,7 @@
 from diff_diff._reporting_helpers import describe_target_parameter
 from diff_diff.diagnostic_report import DiagnosticReport, DiagnosticReportResults
 
-BUSINESS_REPORT_SCHEMA_VERSION = "1.0"
+BUSINESS_REPORT_SCHEMA_VERSION = "2.0"
 
 __all__ = [
     "BusinessReport",
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
@@ -40,7 +40,7 @@
 
 from diff_diff._reporting_helpers import describe_target_parameter  # noqa: E402 (top-level import)
 
-DIAGNOSTIC_REPORT_SCHEMA_VERSION = "1.0"
+DIAGNOSTIC_REPORT_SCHEMA_VERSION = "2.0"
 
 __all__ = [
     "DiagnosticReport",
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
@@ -358,6 +358,17 @@ a library setting.
   anchor tooling on them prematurely; a formal deprecation policy will
   land within two subsequent PRs.
 
+- **Note:** Schema version 2.0 (both BR and DR). The BR/DR gap #6
+  target-parameter PR adds the `headline.status` /
+  `headline_metric.status` value `"no_scalar_by_design"` (used for
+  the dCDH `trends_linear=True, L_max>=2` configuration where
+  `overall_att` is intentionally NaN). Per the stability policy
+  above, new enum values are breaking changes, so
+  `BUSINESS_REPORT_SCHEMA_VERSION` and
+  `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to
+  `"2.0"`. The schemas remain marked experimental, so the formal
+  deprecation policy does not yet apply.
+
 ## Reference implementation(s)
 
 The phrasing rules follow the guidance in:
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
@@ -959,7 +959,7 @@ class DiDResults:
         from diff_diff.diagnostic_report import DiagnosticReportResults
 
         fake_schema = {
-            "schema_version": "1.0",
+            "schema_version": "2.0",
             "estimator": "DiDResults",
             "headline_metric": {"name": "att", "value": 1.0},
             "parallel_trends": {
@@ -1058,7 +1058,7 @@ class DiDResults:
             pt_block["joint_p_value"] = 0.40
 
         fake_schema = {
-            "schema_version": "1.0",
+            "schema_version": "2.0",
             "estimator": "DiDResults",
             "headline_metric": {"name": "att", "value": 1.0},
             "parallel_trends": pt_block,
@@ -2321,7 +2321,7 @@ def test_br_schema_tier_is_downgraded(self):
         from diff_diff.diagnostic_report import DiagnosticReportResults
 
         schema = {
-            "schema_version": "1.0",
+            "schema_version": "2.0",
             "estimator": "CallawaySantAnnaResults",
             "headline_metric": {"name": "overall_att", "value": 1.0},
             "parallel_trends": {
@@ -4028,7 +4028,7 @@ def _fragile_dr_schema(self, breakdown_m: float, grid=None):
             for row in grid
         ]
         schema = {
-            "schema_version": "1.0",
+            "schema_version": "2.0",
             "estimator": {"class_name": "CallawaySantAnnaResults", "display_name": "CS"},
             "headline_metric": {},
             "parallel_trends": {"status": "skipped", "reason": "stub"},
@@ -4215,7 +4215,7 @@ def _bacon_schema_with_high_forbidden_weight():
         from diff_diff.diagnostic_report import DiagnosticReportResults
 
         schema = {
-            "schema_version": "1.0",
+            "schema_version": "2.0",
             "estimator": {"class_name": "Stub", "display_name": "Stub"},
             "headline_metric": {},
             "parallel_trends": {"status": "skipped", "reason": "stub"},
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
@@ -173,7 +173,7 @@ def test_schema_version_constant(self, multi_period_fit):
         fit, _ = multi_period_fit
         schema = DiagnosticReport(fit).to_dict()
         assert schema["schema_version"] == DIAGNOSTIC_REPORT_SCHEMA_VERSION
-        assert DIAGNOSTIC_REPORT_SCHEMA_VERSION == "1.0"
+        assert DIAGNOSTIC_REPORT_SCHEMA_VERSION == "2.0"
 
     def test_all_statuses_use_closed_enum(self, cs_fit):
         fit, sdf = cs_fit
@@ -1931,7 +1931,7 @@ def _render(self, sens_block):
         from diff_diff.diagnostic_report import _render_overall_interpretation
 
         schema = {
-            "schema_version": "1.0",
+            "schema_version": "2.0",
             "estimator": {"class_name": "CallawaySantAnnaResults", "display_name": "CS"},
             "headline_metric": {
                 "status": "ran",
diff --git a/tests/test_target_parameter.py b/tests/test_target_parameter.py
@@ -128,11 +128,21 @@ def test_efficient_did_pt_all(self):
         tp = describe_target_parameter(_minimal_result("EfficientDiDResults", pt_assumption="all"))
         assert tp["aggregation"] == "pt_all_combined"
         assert "PT-All" in tp["name"]
+        # PR #347 R7 P1 regression: the definition must disambiguate
+        # the library's cohort-size-weighted ``overall_att`` from the
+        # paper's uniform-event-time ``ES_avg``.
+        defn = tp["definition"]
+        assert "cohort-size-weighted" in defn
+        assert "ES_avg" in defn
+        assert "post-treatment" in defn.lower()
 
     def test_efficient_did_pt_post(self):
         tp = describe_target_parameter(_minimal_result("EfficientDiDResults", pt_assumption="post"))
         assert tp["aggregation"] == "pt_post_single_baseline"
         assert "PT-Post" in tp["name"]
+        defn = tp["definition"]
+        assert "cohort-size-weighted" in defn
+        assert "ES_avg" in defn
 
     def test_continuous_did(self):
         tp = describe_target_parameter(_minimal_result("ContinuousDiDResults"))