Skip to content

Commit 59d7df7

Browse files
igerberclaude
andcommitted
Address twenty-ninth round of CI review findings on PR #318
P3 code quality (power_reason provenance). REPORTING.md lines 118-125 say the ``pretrends_power`` fallback reason is recorded on the BR pre-trends block, but ``_lift_pre_trends`` only carried the enum status and dropped the reason. Downstream schema consumers saw ``power_status="not_applicable"`` with no explanation — e.g., on ``StackedDiDResults`` / ``EfficientDiDResults`` / ``StaggeredTripleDiffResults`` / ``WooldridgeDiDResults`` / ``ChaisemartinDHaultfoeuilleResults`` fits where the power adapter is not yet available. Add a dedicated ``power_reason`` field alongside the existing ``power_status`` enum (additive, no breaking change) and update ``REPORTING.md`` to describe both fields. P3 docs / tests (DR prose for survey PT variants). Round-28 added the ``_survey`` suffix and ``df_denom`` to ``_pt_event_study``, and BR's method-aware helpers were updated to recognize the variants. ``DiagnosticReport``'s own ``_pt_subject_phrase`` / ``_pt_stat_label`` prose helpers were not, so DR ``summary()`` / ``full_report()`` still rendered the generic "Pre-treatment data" subject on survey-backed fits. Recognize ``joint_wald_survey`` and ``joint_wald_event_study_survey`` alongside the non-survey variants: subject is the pre-period event-study coefficient vector, statistic label is ``joint p`` (the F-reference correction is a different reference distribution, not a different test). Tests: 2 new regressions. * ``test_lift_pre_trends_exposes_power_reason`` under ``TestSurveyPTProsePropagation``: a fake DR block with a skipped power section surfaces both the enum status and the plain-English reason on the BR schema. * ``test_dr_prose_uses_event_study_subject_for_survey_pt`` under ``TestJointWaldAlignment``: DR's own subject / stat-label helpers return the event-study phrasing and ``joint p`` for both survey variants. 244 BR / DR / practitioner tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent cd85eda commit 59d7df7

5 files changed

Lines changed: 105 additions & 4 deletions

File tree

diff_diff/business_report.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,15 @@ def _lift_pre_trends(dr: Optional[Dict[str, Any]]) -> Dict[str, Any]:
704704
# CI review on PR #318).
705705
"df_denom": pt.get("df_denom"),
706706
"power_status": pp.get("status"),
707+
# Dedicated reason field so schema consumers see the fallback
708+
# explanation when ``compute_pretrends_power`` cannot run
709+
# (``status in {"skipped", "error", "not_applicable"}``).
710+
# REPORTING.md lines 118-125 promise this provenance; round-29
711+
# P3 CI review on PR #318 flagged that only the enum status was
712+
# being exposed and the reason was dropped at the lift boundary.
713+
# ``power_status`` stays the machine-readable enum; ``power_reason``
714+
# carries the plain-English explanation.
715+
"power_reason": pp.get("reason"),
707716
"power_tier": pp.get("tier"),
708717
"mdv": pp.get("mdv"),
709718
"mdv_share_of_att": pp.get("mdv_share_of_att"),

diff_diff/diagnostic_report.py

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2540,7 +2540,21 @@ def _pt_subject_phrase(method: Optional[str]) -> str:
25402540
return "The pre-period slope-difference test"
25412541
if method == "hausman":
25422542
return "The Hausman PT-All vs PT-Post pretest"
2543-
if method in {"joint_wald", "joint_wald_event_study", "joint_wald_no_vcov", "bonferroni"}:
2543+
if method in {
2544+
"joint_wald",
2545+
"joint_wald_event_study",
2546+
"joint_wald_no_vcov",
2547+
"bonferroni",
2548+
# Survey-aware event-study PT variants use an F(k, df_survey)
2549+
# reference rather than chi-square(k); the subject is still the
2550+
# pre-period event-study coefficient vector — only the
2551+
# reference distribution changes (round-28 / round-29 CI
2552+
# review on PR #318). Recognizing the ``_survey`` suffix here
2553+
# lets DR prose match the BR prose and the REPORTING.md
2554+
# contract.
2555+
"joint_wald_survey",
2556+
"joint_wald_event_study_survey",
2557+
}:
25442558
return "Pre-treatment event-study coefficients"
25452559
if method == "synthetic_fit":
25462560
return "The synthetic-control pre-treatment fit"
@@ -2555,9 +2569,19 @@ def _pt_stat_label(method: Optional[str]) -> Optional[str]:
25552569
Wald / Bonferroni paths take a joint p-value (``joint p``); the 2x2
25562570
slope-difference and Hausman paths are single-statistic tests
25572571
(``p``). Design-enforced paths return ``None`` so the sentence
2558-
omits a statistic.
2572+
omits a statistic. Survey F-reference variants remain joint tests
2573+
on the pre-period coefficient vector and keep the ``joint p``
2574+
label — the correction is a different reference distribution, not
2575+
a different test.
25592576
"""
2560-
if method in {"joint_wald", "joint_wald_event_study", "joint_wald_no_vcov", "bonferroni"}:
2577+
if method in {
2578+
"joint_wald",
2579+
"joint_wald_event_study",
2580+
"joint_wald_no_vcov",
2581+
"bonferroni",
2582+
"joint_wald_survey",
2583+
"joint_wald_event_study_survey",
2584+
}:
25612585
return "joint p"
25622586
if method in {"slope_difference", "hausman"}:
25632587
return "p"

docs/methodology/REPORTING.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,10 @@ not new inference.
120120
`ChaisemartinDHaultfoeuilleResults`) do not yet have a power
121121
adapter and therefore render the `no_detected_violation` tier as
122122
`underpowered` with the fallback reason recorded in
123-
`schema["pre_trends"]["power_status"]`. BusinessReport then reads
123+
`schema["pre_trends"]["power_reason"]` (plain-English explanation)
124+
while `schema["pre_trends"]["power_status"]` carries the
125+
machine-readable enum (`"ran"` / `"skipped"` / `"error"` /
126+
`"not_applicable"`). BusinessReport then reads
124127
`mdv_share_of_att = mdv / abs(att)` and selects a tier:
125128

126129
- `< 0.25` &rarr; `well_powered` &mdash; "the test has 80% power to

tests/test_business_report.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2506,6 +2506,42 @@ def test_lift_pre_trends_preserves_df_denom(self):
25062506
assert lifted["method"] == "joint_wald_event_study_survey"
25072507
assert lifted["df_denom"] == 30.0
25082508

2509+
def test_lift_pre_trends_exposes_power_reason(self):
2510+
"""Round-29 P3 regression: when ``compute_pretrends_power`` cannot
2511+
run, REPORTING.md lines 118-125 promise the fallback reason is
2512+
recorded in the BR pre-trends block. Previously only the enum
2513+
status surfaced and the reason was dropped at the lift
2514+
boundary; the new ``power_reason`` field carries the
2515+
plain-English explanation alongside the existing enum
2516+
``power_status``.
2517+
"""
2518+
from diff_diff.business_report import _lift_pre_trends
2519+
2520+
fake_dr = {
2521+
"parallel_trends": {
2522+
"status": "ran",
2523+
"method": "joint_wald_event_study",
2524+
"joint_p_value": 0.35,
2525+
"n_pre_periods": 3,
2526+
"verdict": "no_detected_violation",
2527+
},
2528+
"pretrends_power": {
2529+
"status": "not_applicable",
2530+
"reason": (
2531+
"StackedDiDResults does not yet have a "
2532+
"compute_pretrends_power adapter."
2533+
),
2534+
},
2535+
}
2536+
lifted = _lift_pre_trends(fake_dr)
2537+
# Machine-readable status preserved.
2538+
assert lifted["power_status"] == "not_applicable"
2539+
# Plain-English reason now exposed on the schema.
2540+
assert lifted["power_reason"] == (
2541+
"StackedDiDResults does not yet have a "
2542+
"compute_pretrends_power adapter."
2543+
)
2544+
25092545
def test_survey_pt_method_stat_label_uses_joint_p(self):
25102546
from diff_diff.business_report import (
25112547
_pt_method_stat_label,

tests/test_diagnostic_report.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -887,6 +887,35 @@ def test_precomputed_survey_pt_replay_preserves_df_denom(self, cs_fit):
887887
assert pt["df_denom"] == 20.0
888888
assert pt["df"] == 3
889889

890+
def test_dr_prose_uses_event_study_subject_for_survey_pt(self):
891+
"""Round-29 P3 regression: DR's own ``_pt_subject_phrase`` /
892+
``_pt_stat_label`` helpers previously didn't recognize the
893+
``_survey`` variants, so summary / full_report prose fell
894+
through to the generic "Pre-treatment data" wording — BR's
895+
helpers were fixed last round but DR's were not. The survey
896+
variants must render with the event-study subject and the
897+
``joint p`` label; the F-reference correction is a different
898+
reference distribution, not a different test.
899+
"""
900+
from diff_diff.diagnostic_report import (
901+
_pt_stat_label,
902+
_pt_subject_phrase,
903+
)
904+
905+
for method in (
906+
"joint_wald_survey",
907+
"joint_wald_event_study_survey",
908+
):
909+
assert (
910+
_pt_subject_phrase(method)
911+
== "Pre-treatment event-study coefficients"
912+
), (
913+
f"DR subject for {method!r} must match the non-survey "
914+
f"event-study phrasing; got "
915+
f"{_pt_subject_phrase(method)!r}"
916+
)
917+
assert _pt_stat_label(method) == "joint p"
918+
890919
def test_joint_wald_ignores_non_finite_survey_df(self):
891920
"""If ``df_survey`` is NaN / inf / non-positive, fall back to
892921
chi-square (no finite-sample correction available).

0 commit comments

Comments
 (0)