Skip to content

Commit 3167f05

Browse files
igerberclaude
andcommitted
Address twenty-fifth round of CI review findings on PR #318
P2 code quality (full_report PT label). ``BusinessReport. full_report()`` hard-coded ``joint p = ...`` in the Pre-Trends section, which mislabeled the 2x2 ``slope_difference`` and EfficientDiD ``hausman`` single-statistic tests (both emit a single ``p``, not a joint p) and invented a nonexistent label for design-enforced SDiD ``synthetic_fit`` / TROP ``factor`` paths that have no p-value at all. ``summary()`` was already method-aware via ``_pt_method_stat_label``; the markdown path now uses the same helper and omits the parenthetical entirely for no-p-value methods. P3 docs. ``REPORTING.md``'s "single-knob alpha" note said ``alpha`` drives both the CI level and the phrasing threshold. The implementation and regression tests actually preserve the fit's native CI on alpha mismatch (the stored CI is the only quantile the underlying estimator supplied; bootstrap distributions and finite-df analytical variances are not always retained) and only change the significance phrasing, with an ``alpha_override_preserved`` caveat. Updated the note to describe the preserved-native-CI fallback and the reason for the conservative choice. P3 coverage. Add ``TestFullReportMethodAwarePTLabel`` with three regressions using the same fake-DR-schema pattern the summary tests use: * ``slope_difference`` -> markdown uses ``p = ...``, not ``joint p``; * ``hausman`` -> markdown uses ``p = ...``, not ``joint p``; * ``synthetic_fit`` -> markdown omits any p-value label; verdict still renders. 233 BR / DR / practitioner tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent dae62ac commit 3167f05

3 files changed

Lines changed: 139 additions & 7 deletions

File tree

diff_diff/business_report.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1820,8 +1820,21 @@ def _render_full_report(schema: Dict[str, Any]) -> str:
18201820
jp = pt.get("joint_p_value")
18211821
verdict = pt.get("verdict")
18221822
tier = pt.get("power_tier")
1823-
jp_str = f"joint p = {jp:.3g}" if isinstance(jp, (int, float)) else "joint p unavailable"
1824-
lines.append(f"- Verdict: `{verdict}` ({jp_str})")
1823+
# Use the method-aware statistic label the summary path already
1824+
# uses: "joint p" for Wald / Bonferroni event-study, "p" for
1825+
# slope-difference / Hausman single-statistic tests, and None
1826+
# for design-enforced SDiD / TROP paths where there is no
1827+
# p-value at all. Round-25 P2 CI review on PR #318 flagged the
1828+
# hard-coded "joint p" wording as misdescribing 2x2 / Hausman
1829+
# fits and inventing a nonexistent p-value for SDiD / TROP.
1830+
method = pt.get("method")
1831+
stat_label = _pt_method_stat_label(method)
1832+
if stat_label and isinstance(jp, (int, float)):
1833+
lines.append(f"- Verdict: `{verdict}` ({stat_label} = {jp:.3g})")
1834+
elif stat_label:
1835+
lines.append(f"- Verdict: `{verdict}` ({stat_label} unavailable)")
1836+
else:
1837+
lines.append(f"- Verdict: `{verdict}`")
18251838
if tier:
18261839
lines.append(f"- Power tier: `{tier}`")
18271840
mdv = pt.get("mdv")

docs/methodology/REPORTING.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,11 +153,24 @@ not new inference.
153153
coefficients), which would be unsafe in the presence of non-linear
154154
link functions (Poisson QMLE, logit).
155155

156-
- **Note:** Single-knob `alpha`. BusinessReport exposes only `alpha`
157-
(defaults to `results.alpha`); there is no separate
158-
`significance_threshold` parameter. `alpha` drives both the CI level
159-
(`(1 - alpha) * 100`% interval) and the phrasing tier threshold
160-
("statistically significant at the (1 - alpha) * 100% level").
156+
- **Note:** Single-knob `alpha` with preserved-native-CI fallback.
157+
BusinessReport exposes only `alpha` (defaults to `results.alpha`);
158+
there is no separate `significance_threshold` parameter. When the
159+
requested `alpha` matches the fit's native level, it drives both the
160+
CI level (`(1 - alpha) * 100`% interval) and the phrasing tier
161+
threshold ("statistically significant at the (1 - alpha) * 100%
162+
level"). When the requested `alpha` differs from the fit's native
163+
level (e.g., the user asks for `alpha=0.10` on a result fit with
164+
`alpha=0.05`), BusinessReport does NOT recompute the CI at the
165+
requested level, because the stored CI is the only quantile the
166+
underlying estimator supplied (bootstrap distributions and
167+
finite-df analytical variances are not always retained on the
168+
result). Instead, the schema preserves the fit's native CI (with its
169+
original level) and uses the requested `alpha` only for the
170+
significance-phrasing threshold, and emits an
171+
`alpha_override_preserved` caveat describing the mismatch. This is
172+
the conservative choice: it avoids silently recomputing CIs under
173+
assumptions the estimator may not support.
161174

162175
- **Note:** Schema stability policy for the AI-legible `to_dict()`
163176
surface. New top-level keys count as additive (no version bump); new

tests/test_business_report.py

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -921,6 +921,112 @@ def test_dr_summary_uses_hausman_wording_for_efficient_did(self, edid_fit):
921921
assert "event-study coefficients" not in summary
922922

923923

924+
class TestFullReportMethodAwarePTLabel:
925+
"""Round-25 P2 CI review on PR #318: ``BusinessReport.full_report()``
926+
previously hard-coded ``joint p = ...`` in the Pre-Trends section,
927+
which mislabels the 2x2 ``slope_difference`` and EfficientDiD
928+
``hausman`` single-statistic tests and invents a nonexistent
929+
``joint p`` label for design-enforced SDiD / TROP paths that have
930+
no p-value at all. The markdown path must use the same
931+
method-aware label helper the summary path already uses
932+
(``_pt_method_stat_label``).
933+
"""
934+
935+
@staticmethod
936+
def _stub_result_with_method(method: str):
937+
from diff_diff.diagnostic_report import DiagnosticReportResults
938+
939+
class DiDResults:
940+
pass
941+
942+
stub = DiDResults()
943+
stub.att = 1.0
944+
stub.se = 0.2
945+
stub.p_value = 0.001
946+
stub.conf_int = (0.6, 1.4)
947+
stub.alpha = 0.05
948+
stub.n_obs = 100
949+
stub.n_treated = 40
950+
stub.n_control = 60
951+
stub.survey_metadata = None
952+
stub.inference_method = "analytical"
953+
954+
pt_block: dict = {
955+
"status": "ran",
956+
"method": method,
957+
"verdict": "no_detected_violation",
958+
}
959+
# SDiD's synthetic_fit path has no p-value by design; the other
960+
# methods do.
961+
if method != "synthetic_fit":
962+
pt_block["joint_p_value"] = 0.40
963+
964+
fake_schema = {
965+
"schema_version": "1.0",
966+
"estimator": "DiDResults",
967+
"headline_metric": {"name": "att", "value": 1.0},
968+
"parallel_trends": pt_block,
969+
"pretrends_power": {"status": "not_applicable"},
970+
"sensitivity": {"status": "not_applicable"},
971+
"placebo": {"status": "skipped", "reason": "opt-in"},
972+
"bacon": {"status": "not_applicable"},
973+
"design_effect": {"status": "not_applicable"},
974+
"heterogeneity": {"status": "not_applicable"},
975+
"epv": {"status": "not_applicable"},
976+
"estimator_native_diagnostics": {"status": "not_applicable"},
977+
"skipped": {},
978+
"warnings": [],
979+
"overall_interpretation": "",
980+
"next_steps": [],
981+
}
982+
fake_dr = DiagnosticReportResults(
983+
schema=fake_schema,
984+
interpretation="",
985+
applicable_checks=("parallel_trends",),
986+
skipped_checks={},
987+
warnings=(),
988+
)
989+
return stub, fake_dr
990+
991+
def _pt_section(self, md: str) -> str:
992+
# The Pre-Trends section is delimited by the next ``##`` heading.
993+
after = md.split("## Pre-Trends", 1)[1]
994+
return after.split("\n## ", 1)[0]
995+
996+
def test_full_report_slope_difference_uses_single_p_label(self):
997+
stub, fake_dr = self._stub_result_with_method("slope_difference")
998+
md = BusinessReport(stub, diagnostics=fake_dr).full_report()
999+
section = self._pt_section(md)
1000+
assert "joint p" not in section, (
1001+
f"2x2 slope_difference is a single-statistic test and must "
1002+
f"not be labeled ``joint p`` in the markdown. Got: {section!r}"
1003+
)
1004+
# The single-statistic label ``p = ...`` must be present.
1005+
assert "p = 0.4" in section
1006+
1007+
def test_full_report_hausman_uses_single_p_label(self):
1008+
stub, fake_dr = self._stub_result_with_method("hausman")
1009+
section = self._pt_section(
1010+
BusinessReport(stub, diagnostics=fake_dr).full_report()
1011+
)
1012+
assert "joint p" not in section, (
1013+
f"EfficientDiD Hausman is a single-statistic test and must "
1014+
f"not be labeled ``joint p`` in the markdown. Got: {section!r}"
1015+
)
1016+
assert "p = 0.4" in section
1017+
1018+
def test_full_report_synthetic_fit_omits_p_label(self):
1019+
stub, fake_dr = self._stub_result_with_method("synthetic_fit")
1020+
section = self._pt_section(
1021+
BusinessReport(stub, diagnostics=fake_dr).full_report()
1022+
)
1023+
# No p-value of any kind for design-enforced SDiD PT analogue.
1024+
assert "joint p" not in section
1025+
assert "p = " not in section
1026+
# Verdict must still render.
1027+
assert "Verdict:" in section
1028+
1029+
9241030
class TestHausmanPretestPropagatesFitDesign:
9251031
"""Round-9 regression: ``_pt_hausman`` must propagate the fitted
9261032
result's ``control_group`` and ``anticipation`` into

0 commit comments

Comments
 (0)