Skip to content

Commit 345f65c

Browse files
igerberclaude
andcommitted
Address fourth round of CI review findings on PR #318
P0 fix: * **``inference_method == 'wild_bootstrap'`` was not detected as bootstrap-like.** My prior bootstrap check caught ``'bootstrap'`` and ``variance_method in {bootstrap, jackknife, placebo}`` plus an attached ``bootstrap_distribution``, but ``DifferenceInDifferences( inference='wild_bootstrap')`` returns ``inference_method='wild_bootstrap'`` and a percentile-bootstrap CI without necessarily attaching the raw distribution. The override path silently replaced that CI with a normal-approximation one. Fixed by matching both ``'bootstrap'`` and ``'wild_bootstrap'``; the preserved-CI caveat now calls out "wild cluster bootstrap" specifically when that path triggered. Regression: ``TestWildBootstrapAlphaOverride``. P1 fix: * **``_describe_assumption()`` emitted generic DiD PT text for ContinuousDiD / TripleDifference / StaggeredTripleDiff**, all of which have identifying logic different from ordinary group-time PT per ``docs/methodology/REGISTRY.md``. Replaced the generic fallback with source-backed branches: - ``ContinuousDiDResults``: two-level parallel trends (PT vs Strong PT) per Callaway, Goodman-Bacon & Sant'Anna (2024), with explicit mention of ATT(d|d), ATT(d), ACRT identification sets. - ``TripleDifferenceResults`` / ``StaggeredTripleDiffResults``: triple-difference cancellation across the 2x2x2 cells per Ortiz-Villavicencio & Sant'Anna (2025); notes that identification is weaker than ordinary DiD PT and depends on additive separability across the three dimensions. The ``parallel_trends_variant`` schema field gains two new values: ``"dose_pt_or_strong_pt"`` and ``"triple_difference_cancellation"``. Direct regressions in ``TestAssumptionBlockSourceFaithful`` assert registry-backed language (attribution phrases + method names) is present and generic group-time PT text is absent. 150 targeted tests pass; black, ruff, mypy clean on the new modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 959f84e commit 345f65c

2 files changed

Lines changed: 176 additions & 4 deletions

File tree

diff_diff/business_report.py

Lines changed: 55 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -364,8 +364,11 @@ def _extract_headline(self, dr_schema: Optional[Dict[str, Any]]) -> Dict[str, An
364364
)
365365
variance_method = getattr(r, "variance_method", None)
366366

367+
# Any non-analytic inference surface that stores a sampling /
368+
# resampling distribution (wild cluster bootstrap, percentile
369+
# bootstrap, jackknife, placebo) should preserve its native CI.
367370
bootstrap_like = (
368-
inference_method == "bootstrap"
371+
inference_method in {"bootstrap", "wild_bootstrap"}
369372
or has_bootstrap_dist
370373
or variance_method in {"bootstrap", "jackknife", "placebo"}
371374
)
@@ -375,10 +378,15 @@ def _extract_headline(self, dr_schema: Optional[Dict[str, Any]]) -> Dict[str, An
375378
# Preserve the fitted CI at its native level.
376379
alpha_was_honored = False
377380
alpha = float(result_alpha)
381+
if inference_method == "wild_bootstrap":
382+
inference_label = "wild cluster bootstrap"
383+
elif bootstrap_like:
384+
inference_label = "bootstrap"
385+
else:
386+
inference_label = "finite-df"
378387
alpha_override_caveat = (
379388
f"Requested alpha was not honored for the confidence "
380-
f"interval because this fit uses "
381-
f"{'bootstrap' if bootstrap_like else 'finite-df'} "
389+
f"interval because this fit uses {inference_label} "
382390
f"inference; the displayed CI remains at the fit's "
383391
f"native level ({int(round((1.0 - result_alpha) * 100))}%). "
384392
f"The significance phrasing still uses the requested alpha."
@@ -611,6 +619,50 @@ def _describe_assumption(estimator_name: str) -> Dict[str, Any]:
611619
"captured through latent factor loadings."
612620
),
613621
}
622+
if estimator_name == "ContinuousDiDResults":
623+
# Callaway, Goodman-Bacon & Sant'Anna (2024), two-level PT:
624+
# REGISTRY.md §ContinuousDiD > Identification.
625+
return {
626+
"parallel_trends_variant": "dose_pt_or_strong_pt",
627+
"no_anticipation": True,
628+
"description": (
629+
"ContinuousDiD identifies dose-specific treatment effects "
630+
"under two possible parallel-trends conditions (Callaway, "
631+
"Goodman-Bacon & Sant'Anna 2024). Parallel Trends (PT) "
632+
"assumes untreated potential outcome paths are equal across "
633+
"all dose groups and the untreated group (conditional on "
634+
"dose), identifying ATT(d|d) and the binarized ATT^loc but "
635+
"NOT ATT(d), ACRT, or cross-dose comparisons. Strong "
636+
"Parallel Trends (SPT) additionally rules out selection "
637+
"into dose on the basis of treatment effects and is "
638+
"required to identify the dose-response curve ATT(d), "
639+
"marginal effect ACRT(d), and cross-dose contrasts."
640+
),
641+
}
642+
if estimator_name in {"TripleDifferenceResults", "StaggeredTripleDiffResults"}:
643+
# Ortiz-Villavicencio & Sant'Anna (2025) — identification is the
644+
# triple-difference cancellation across the 2x2x2 cells, not
645+
# ordinary DiD parallel trends; see REGISTRY.md §TripleDifference
646+
# and §StaggeredTripleDifference.
647+
return {
648+
"parallel_trends_variant": "triple_difference_cancellation",
649+
"no_anticipation": True,
650+
"description": (
651+
"Triple-difference identification relies on the DDD "
652+
"decomposition (Ortiz-Villavicencio & Sant'Anna 2025): "
653+
"the ATT is recovered from `DDD = DiD_A + DiD_B - DiD_C` "
654+
"across the Group x Period x Eligibility (or Treatment) "
655+
"cells, which differences out group-specific and "
656+
"period-specific unobservables without requiring separate "
657+
"parallel trends to hold between each cell pair. The "
658+
"identifying restriction is therefore weaker than ordinary "
659+
"DiD parallel trends but assumes that the residual "
660+
"unobservable component is additively separable across the "
661+
"three dimensions; practical overlap and common-support "
662+
"conditions still apply on the propensity score when "
663+
"covariates are used."
664+
),
665+
}
614666
if estimator_name in {
615667
"CallawaySantAnnaResults",
616668
"SunAbrahamResults",
@@ -620,7 +672,6 @@ def _describe_assumption(estimator_name: str) -> Dict[str, Any]:
620672
"EfficientDiDResults",
621673
"WooldridgeDiDResults",
622674
"ChaisemartinDHaultfoeuilleResults",
623-
"StaggeredTripleDiffResults",
624675
}:
625676
return {
626677
"parallel_trends_variant": "conditional_or_group_time",

tests/test_business_report.py

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -606,6 +606,127 @@ def test_finite_df_fit_preserves_fitted_ci_on_alpha_mismatch(self):
606606
assert "alpha_override_preserved" in caveat_topics
607607

608608

609+
class TestWildBootstrapAlphaOverride:
610+
"""Regression for the round-4 P0 finding that ``inference='wild_bootstrap'``
611+
results were falling through to a normal-approximation recomputation."""
612+
613+
def test_wild_bootstrap_preserves_fitted_ci(self):
614+
class _WildBootstrapStub:
615+
def __init__(self):
616+
self.att = 1.0
617+
self.se = 0.5
618+
self.p_value = 0.04
619+
# 95% CI produced by the wild cluster bootstrap surface.
620+
self.conf_int = (0.10, 1.90)
621+
self.alpha = 0.05
622+
self.n_obs = 100
623+
self.n_treated = 40
624+
self.n_control = 60
625+
self.inference_method = "wild_bootstrap"
626+
self.survey_metadata = None
627+
# Wild-boot fits don't necessarily carry a raw distribution;
628+
# the inference_method string alone must be enough.
629+
self.bootstrap_distribution = None
630+
631+
stub = _WildBootstrapStub()
632+
br = BusinessReport(stub, alpha=0.10, auto_diagnostics=False)
633+
h = br.to_dict()["headline"]
634+
assert h["ci_level"] == 95, (
635+
"Wild cluster bootstrap must preserve fitted CI level on alpha "
636+
f"mismatch; got {h['ci_level']}"
637+
)
638+
assert h["ci_lower"] == pytest.approx(0.10)
639+
assert h["ci_upper"] == pytest.approx(1.90)
640+
caveats = br.caveats()
641+
assert any(c.get("topic") == "alpha_override_preserved" for c in caveats)
642+
# Caveat message should call out wild cluster bootstrap specifically.
643+
preserved_msg = next(
644+
c["message"] for c in caveats if c.get("topic") == "alpha_override_preserved"
645+
)
646+
assert "wild cluster bootstrap" in preserved_msg
647+
648+
649+
class TestAssumptionBlockSourceFaithful:
650+
"""Regression for the round-4 P1 finding that ``_describe_assumption``
651+
was producing generic DiD PT text for ContinuousDiD, TripleDifference,
652+
and StaggeredTripleDifference — all of which have different identifying
653+
logic per the Methodology Registry."""
654+
655+
def _stub(self, class_name):
656+
cls = type(class_name, (), {})
657+
obj = cls()
658+
obj.att = 1.0
659+
obj.se = 0.1
660+
obj.p_value = 0.001
661+
obj.conf_int = (0.8, 1.2)
662+
obj.alpha = 0.05
663+
obj.n_obs = 100
664+
obj.n_treated = 40
665+
obj.n_control = 60
666+
obj.survey_metadata = None
667+
obj.event_study_effects = None
668+
obj.inference_method = "analytical"
669+
return obj
670+
671+
def test_continuous_did_assumption_uses_two_level_pt(self):
672+
br = BusinessReport(self._stub("ContinuousDiDResults"), auto_diagnostics=False)
673+
assumption = br.to_dict()["assumption"]
674+
assert assumption["parallel_trends_variant"] == "dose_pt_or_strong_pt"
675+
desc = assumption["description"]
676+
# Registry-backed language: PT vs Strong PT + ACRT mention.
677+
assert "Strong Parallel Trends" in desc or "SPT" in desc
678+
assert "ATT(d" in desc or "ACRT" in desc
679+
assert "Callaway" in desc # attribution to CGBS 2024
680+
681+
def test_triple_difference_assumption_uses_ddd_decomposition(self):
682+
class TripleDifferenceResults:
683+
pass
684+
685+
obj = TripleDifferenceResults()
686+
obj.att = 1.0
687+
obj.se = 0.1
688+
obj.p_value = 0.001
689+
obj.conf_int = (0.8, 1.2)
690+
obj.alpha = 0.05
691+
obj.n_obs = 100
692+
obj.n_treated = 40
693+
obj.n_control = 60
694+
obj.survey_metadata = None
695+
obj.inference_method = "analytical"
696+
697+
br = BusinessReport(obj, auto_diagnostics=False)
698+
assumption = br.to_dict()["assumption"]
699+
assert assumption["parallel_trends_variant"] == "triple_difference_cancellation"
700+
desc = assumption["description"]
701+
assert "DDD" in desc
702+
assert "Ortiz-Villavicencio" in desc or "2025" in desc
703+
704+
def test_staggered_triple_diff_assumption_uses_ddd_not_generic_pt(self):
705+
class StaggeredTripleDiffResults:
706+
pass
707+
708+
obj = StaggeredTripleDiffResults()
709+
obj.overall_att = 1.0
710+
obj.overall_se = 0.1
711+
obj.overall_p_value = 0.001
712+
obj.overall_conf_int = (0.8, 1.2)
713+
obj.alpha = 0.05
714+
obj.n_obs = 100
715+
obj.n_treated = 40
716+
obj.n_control = 60
717+
obj.survey_metadata = None
718+
obj.event_study_effects = None
719+
obj.inference_method = "analytical"
720+
721+
br = BusinessReport(obj, auto_diagnostics=False)
722+
assumption = br.to_dict()["assumption"]
723+
assert assumption["parallel_trends_variant"] == "triple_difference_cancellation"
724+
desc = assumption["description"]
725+
assert "triple-difference" in desc.lower() or "DDD" in desc
726+
# Must NOT be the generic group-time PT text.
727+
assert "group-time ATT" not in desc
728+
729+
609730
class TestFullReportSingleM:
610731
"""Regression: ``full_report()`` must not claim full-grid robustness for a
611732
single-M HonestDiDResults passthrough. The summary path was fixed earlier;

0 commit comments

Comments
 (0)