Skip to content

Commit 7e8e264

Browse files
igerberclaude
andcommitted
Address PR #347 R1: fix StackedDiD wording, drop dead TWFE branch, fix dCDH headline_attribute
R1 surfaced three P1s, all legitimate: 1. StackedDiD wording mismatch. Claimed ``overall_att`` is a treated-share-weighted aggregate across sub-experiments; actual implementation (``stacked_did.py`` ~line 541) computes ``overall_att`` as the simple average of post-treatment event- study coefficients ``delta_h`` with delta-method SE. Per-horizon ``delta_h`` is the paper's ``theta_kappa^e`` cross-event aggregate, but the headline is an equally-weighted average over those per-horizon coefficients, not a separate cross-event weighting at the ATT level. Definition rewritten to describe the actual estimand. 2. Dead ``TwoWayFixedEffectsResults`` branch. ``TwoWayFixedEffects`` is a subclass of ``DifferenceInDifferences`` and its ``fit()`` returns ``DiDResults`` — there is no separate TWFE result class, so the ``type(results).__name__ == "TwoWayFixedEffectsResults"`` dispatch branch was unreachable on any real fit. Removed the dead branch and rewrote the ``DiDResults`` branch to cover both 2x2 DiD and TWFE interpretations explicitly (both estimators route here). Follow-up for future PR: persist estimator provenance on ``DiDResults`` (or return a dedicated TWFE result class) so the branch can split again; documented inline. 3. dCDH ``headline_attribute="att"``. Both dCDH branches (``DID_M`` for ``L_max=None``, ``DID_l``/derivatives for ``L_max >= 1``) named ``"att"`` as the headline attribute, but ``ChaisemartinDHaultfoeuilleResults`` stores the headline in ``overall_att`` (``chaisemartin_dhaultfoeuille_results.py:357``). Fixed both branches to ``"overall_att"``; downstream consumers using the machine-readable contract now point at the correct attribute. Tests: new ``TestTargetParameterRealFitIntegration`` covers the gap R1 P2 flagged — prior coverage was stub-based and would not have caught any of the three P1s. Four new real-fit tests: - ``TwoWayFixedEffects().fit(...)`` returns ``DiDResults``; target- parameter block uses the shared DiD/TWFE branch. - ``StackedDiD(...).fit(...)`` on a staggered panel; the ``headline_attribute`` matches the actual real attribute and the definition names the event-study-coefficient estimand. - ``ChaisemartinDHaultfoeuille().fit(...)`` on a reversible- treatment panel (both ``DID_M`` and ``DID_l`` regimes); ``headline_attribute == "overall_att"`` and the named attribute actually exists on the real fit object. Existing stub-based dispatch tests updated: the ``test_twfe_results`` test is now ``test_did_results_mentions_twfe`` (asserts the DiD branch describes both estimators). The dCDH stub tests now also assert ``headline_attribute == "overall_att"``. All 323 BR/DR tests pass (319 prior + 4 new real-fit integration). Out of scope (plan-review MEDIUM #2 — centralizing report metadata in a single registry shared by estimator outputs and reporting helpers): queued as a separate PR. Current approach (string dispatch on ``type(results).__name__`` + REGISTRY.md references) is working but brittle; a centralized registry is the principled fix for the TWFE-dispatch-dead-code class of bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b1946fb commit 7e8e264

2 files changed

Lines changed: 184 additions & 34 deletions

File tree

diff_diff/_reporting_helpers.py

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -70,17 +70,34 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
7070
name = type(results).__name__
7171

7272
if name == "DiDResults":
73+
# Covers both ``DifferenceInDifferences`` (2x2 DiD) and
74+
# ``TwoWayFixedEffects`` (TWFE with unit + time FE). Both
75+
# estimators return ``DiDResults``; there is no separate
76+
# ``TwoWayFixedEffectsResults`` class as of this PR
77+
# (confirmed in PR #347 R1 review). The description covers
78+
# both interpretations because the result carries no
79+
# estimator-provenance marker BR/DR can dispatch on. Adding a
80+
# dedicated TWFE result class (or persisting provenance on
81+
# DiDResults) is queued as follow-up so this branch can split
82+
# in a future PR.
7383
return {
74-
"name": "ATT (2x2)",
84+
"name": "ATT (2x2 or TWFE within-transformed coefficient)",
7585
"definition": (
76-
"The average treatment effect on the treated, estimated as a "
77-
"single 2x2 Difference-in-Differences contrast between the "
78-
"treated-unit change and the control-unit change across the "
79-
"pre / post period."
86+
"The average treatment effect on the treated. For "
87+
"``DifferenceInDifferences``, this is the 2x2 DiD "
88+
"contrast between treated-unit change and control-unit "
89+
"change across pre / post. For ``TwoWayFixedEffects``, "
90+
"this is the coefficient on the treatment-by-post "
91+
"interaction in a regression with unit and time fixed "
92+
"effects; under homogeneous treatment effects it is "
93+
"the ATT, and under heterogeneous effects with staggered "
94+
"adoption it is a weighted average of 2x2 comparisons "
95+
"that may include forbidden later-vs-earlier comparisons "
96+
"(see Goodman-Bacon)."
8097
),
8198
"aggregation": "2x2",
8299
"headline_attribute": "att",
83-
"reference": "REGISTRY.md Sec. DifferenceInDifferences",
100+
"reference": ("REGISTRY.md Sec. DifferenceInDifferences / TwoWayFixedEffects"),
84101
}
85102

86103
if name == "MultiPeriodDiDResults":
@@ -97,22 +114,6 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
97114
"reference": "REGISTRY.md Sec. MultiPeriodDiD",
98115
}
99116

100-
if name == "TwoWayFixedEffectsResults":
101-
return {
102-
"name": "TWFE ATT (within-transformed DiD coefficient)",
103-
"definition": (
104-
"The coefficient on the treatment-by-post interaction in a "
105-
"two-way-fixed-effects regression (unit + time FE). Under "
106-
"homogeneous treatment effects this is the ATT; under "
107-
"heterogeneous effects with staggered adoption it is a weighted "
108-
"average of 2x2 comparisons, possibly including forbidden "
109-
"comparisons (see Goodman-Bacon)."
110-
),
111-
"aggregation": "twfe",
112-
"headline_attribute": "att",
113-
"reference": "REGISTRY.md Sec. TwoWayFixedEffects",
114-
}
115-
116117
if name == "CallawaySantAnnaResults":
117118
return {
118119
"name": "overall ATT (cohort-size-weighted average of ATT(g,t))",
@@ -197,13 +198,19 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
197198
"(``A_s > a + kappa_post``)."
198199
)
199200
return {
200-
"name": "overall ATT (sub-experiment-weighted aggregate across stacked events)",
201+
"name": "overall ATT (average of post-treatment event-study coefficients)",
201202
"definition": (
202-
"A weighted aggregate of per-sub-experiment ATTs across stacked "
203-
"adoption events. Each sub-experiment aligns a treated cohort "
204-
"with its clean-control set over the event window "
205-
"``[-kappa_pre, +kappa_post]``; the overall ATT averages these "
206-
"sub-experiment ATTs using treated-unit share weights. " + control_clause
203+
"The average of post-treatment event-study coefficients "
204+
"``delta_h`` (h >= -anticipation), estimated from the stacked "
205+
"sub-experiment panel with delta-method SE "
206+
"(``stacked_did.py`` around line 541). Each sub-experiment "
207+
"aligns a treated cohort with its clean-control set over the "
208+
"event window ``[-kappa_pre, +kappa_post]``; each per-horizon "
209+
"``delta_h`` is the paper's ``theta_kappa^e`` "
210+
"treated-share-weighted cross-event aggregate. The "
211+
"``overall_att`` headline is the equally-weighted average of "
212+
"these per-horizon coefficients, not a separate cross-event "
213+
"weighted aggregate at the ATT level. " + control_clause
207214
),
208215
"aggregation": "stacked",
209216
"headline_attribute": "overall_att",
@@ -322,7 +329,7 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
322329
"contrasts."
323330
),
324331
"aggregation": "M",
325-
"headline_attribute": "att",
332+
"headline_attribute": "overall_att",
326333
"reference": (
327334
"de Chaisemartin & D'Haultfoeuille (2020); "
328335
"REGISTRY.md Sec. ChaisemartinDHaultfoeuille"
@@ -362,7 +369,7 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
362369
"share weights." + extra
363370
),
364371
"aggregation": agg_tag,
365-
"headline_attribute": "att",
372+
"headline_attribute": "overall_att",
366373
"reference": (
367374
"de Chaisemartin & D'Haultfoeuille (2020, 2024); "
368375
"REGISTRY.md Sec. ChaisemartinDHaultfoeuille"

tests/test_target_parameter.py

Lines changed: 147 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,15 @@ def test_multi_period_did_results(self):
5454
assert tp["headline_attribute"] == "avg_att"
5555
assert "event-study" in tp["name"].lower()
5656

57-
def test_twfe_results(self):
58-
tp = describe_target_parameter(_minimal_result("TwoWayFixedEffectsResults"))
59-
assert tp["aggregation"] == "twfe"
60-
assert "TWFE" in tp["name"]
57+
def test_did_results_mentions_twfe(self):
58+
"""``TwoWayFixedEffects.fit()`` returns ``DiDResults`` (verified in
59+
PR #347 R1 review), so the DiDResults branch must cover both
60+
the 2x2 DiD and TWFE interpretations. There is no separate
61+
``TwoWayFixedEffectsResults`` class.
62+
"""
63+
tp = describe_target_parameter(_minimal_result("DiDResults"))
64+
assert tp["aggregation"] == "2x2"
65+
assert "TWFE" in tp["name"] or "TWFE" in tp["definition"]
6166

6267
def test_callaway_santanna(self):
6368
tp = describe_target_parameter(_minimal_result("CallawaySantAnnaResults"))
@@ -133,6 +138,10 @@ def test_dcdh_m(self):
133138
)
134139
assert tp["aggregation"] == "M"
135140
assert "DID_M" in tp["name"]
141+
# R1 PR #347 review: headline lives in ``overall_att``, not
142+
# ``att``. Machine-readable field must match the raw attribute
143+
# downstream consumers should read.
144+
assert tp["headline_attribute"] == "overall_att"
136145

137146
def test_dcdh_l(self):
138147
tp = describe_target_parameter(
@@ -145,6 +154,7 @@ def test_dcdh_l(self):
145154
)
146155
assert tp["aggregation"] == "l"
147156
assert "DID_l" in tp["name"]
157+
assert tp["headline_attribute"] == "overall_att"
148158

149159
def test_dcdh_l_with_controls(self):
150160
tp = describe_target_parameter(
@@ -344,3 +354,136 @@ def test_dr_full_report_emits_target_parameter_section(self):
344354
assert "## Target Parameter" in md
345355
assert "Aggregation tag:" in md
346356
assert "Headline attribute:" in md
357+
358+
359+
class TestTargetParameterRealFitIntegration:
360+
"""PR #347 R1 review P2: stub-based dispatch tests missed (a) the
361+
TWFE-returns-DiDResults mismatch, (b) the dCDH headline_attribute
362+
bug. Exercise real fits so these contracts are enforced end-to-end.
363+
"""
364+
365+
def test_twfe_fit_returns_did_results_branch(self):
366+
"""``TwoWayFixedEffects.fit()`` returns ``DiDResults``, so the
367+
target-parameter block must be the DiD/TWFE-covering branch.
368+
Guards against reintroducing a dead-code
369+
``TwoWayFixedEffectsResults`` branch.
370+
"""
371+
import warnings
372+
373+
from diff_diff import TwoWayFixedEffects, generate_did_data
374+
375+
warnings.filterwarnings("ignore")
376+
df = generate_did_data(n_units=40, n_periods=4, seed=7)
377+
fit = TwoWayFixedEffects().fit(
378+
df, outcome="outcome", treatment="treated", time="post", unit="unit"
379+
)
380+
# Real TWFE fit returns DiDResults (no separate TWFE result class).
381+
assert type(fit).__name__ == "DiDResults"
382+
tp = describe_target_parameter(fit)
383+
assert tp["aggregation"] == "2x2"
384+
# The DiDResults branch must name TWFE explicitly so the
385+
# description is source-faithful for both DiD and TWFE fits.
386+
assert "TWFE" in tp["name"] or "TWFE" in tp["definition"]
387+
388+
def test_stacked_did_fit_headline_attribute_matches_real_estimand(self):
389+
"""``StackedDiDResults.overall_att`` is the average of
390+
post-treatment event-study coefficients ``delta_h`` with
391+
delta-method SE (``stacked_did.py`` around line 541). Real-fit
392+
regression against the reviewer's R1 P1 wording catch.
393+
"""
394+
import warnings
395+
396+
from diff_diff import StackedDiD, generate_staggered_data
397+
398+
warnings.filterwarnings("ignore")
399+
df = generate_staggered_data(n_units=60, n_periods=6, seed=13)
400+
fit = StackedDiD(kappa_pre=1, kappa_post=1).fit(
401+
df,
402+
outcome="outcome",
403+
unit="unit",
404+
time="period",
405+
first_treat="first_treat",
406+
)
407+
assert type(fit).__name__ == "StackedDiDResults"
408+
tp = describe_target_parameter(fit)
409+
assert tp["headline_attribute"] == "overall_att"
410+
# R1 P1 fix: the definition must describe the actual estimand —
411+
# the average of post-treatment delta_h event-study coefficients.
412+
assert (
413+
"event-study" in tp["definition"].lower()
414+
or "delta_h" in tp["definition"]
415+
or "post-treatment" in tp["definition"].lower()
416+
)
417+
418+
def _dcdh_reversible_panel(self, seed):
419+
"""Build a minimal reversible-treatment panel that dCDH
420+
accepts (group / time / treatment columns, at least one
421+
switcher)."""
422+
import numpy as np
423+
import pandas as pd
424+
425+
rng = np.random.default_rng(seed)
426+
units = list(range(20))
427+
periods = list(range(6))
428+
rows = []
429+
for u in units:
430+
# Half of units switch from 0 -> 1 at period 3.
431+
for t in periods:
432+
d = 1 if (u < 10 and t >= 3) else 0
433+
y = d * 2.0 + rng.normal(0.0, 0.5)
434+
rows.append({"unit": u, "period": t, "treated": d, "outcome": y})
435+
return pd.DataFrame(rows)
436+
437+
def test_dcdh_did_m_fit_headline_attribute_is_overall_att(self):
438+
"""``ChaisemartinDHaultfoeuilleResults.overall_att`` holds the
439+
DID_M headline scalar (``chaisemartin_dhaultfoeuille_results.py``
440+
line ~357). R1 P1: previously ``headline_attribute="att"``
441+
pointed at a non-existent attribute.
442+
"""
443+
import warnings
444+
445+
from diff_diff import ChaisemartinDHaultfoeuille
446+
447+
warnings.filterwarnings("ignore")
448+
df = self._dcdh_reversible_panel(seed=11)
449+
# DID_M regime: L_max not supplied.
450+
fit = ChaisemartinDHaultfoeuille().fit(
451+
df,
452+
outcome="outcome",
453+
group="unit",
454+
time="period",
455+
treatment="treated",
456+
)
457+
assert type(fit).__name__ == "ChaisemartinDHaultfoeuilleResults"
458+
tp = describe_target_parameter(fit)
459+
assert tp["aggregation"] == "M"
460+
assert tp["headline_attribute"] == "overall_att"
461+
# Sanity: the attribute BR/DR points at actually exists on the
462+
# real fit object.
463+
assert hasattr(fit, tp["headline_attribute"]), (
464+
f"headline_attribute={tp['headline_attribute']!r} must name "
465+
"an attribute that actually exists on the real result object."
466+
)
467+
468+
def test_dcdh_did_l_fit_headline_attribute_is_overall_att(self):
469+
"""Same guard for the DID_l dynamic-horizon regime
470+
(``L_max >= 1``). Real-fit regression.
471+
"""
472+
import warnings
473+
474+
from diff_diff import ChaisemartinDHaultfoeuille
475+
476+
warnings.filterwarnings("ignore")
477+
df = self._dcdh_reversible_panel(seed=12)
478+
fit = ChaisemartinDHaultfoeuille().fit(
479+
df,
480+
outcome="outcome",
481+
group="unit",
482+
time="period",
483+
treatment="treated",
484+
L_max=2,
485+
)
486+
tp = describe_target_parameter(fit)
487+
assert tp["aggregation"] == "l"
488+
assert tp["headline_attribute"] == "overall_att"
489+
assert hasattr(fit, tp["headline_attribute"])

0 commit comments

Comments
 (0)