Skip to content

Commit 6ca617e

Browse files
igerberclaude
andcommitted
Fix remaining step-numbering drift and NaN check for avg_att
P1: Align complete example and Bacon cross-reference to canonical numbering (Step 3=Test PT, Step 4=Choose estimator, Step 5=Estimate). Complete example now executes check_parallel_trends() and prints cluster count before estimation. Evaluation rubric S3-S5 labels updated to match. P2: _check_nan_att() now checks .avg_att (MultiPeriodDiDResults) in addition to .att and .overall_att. Added regression test. P2: Added TODO.md entry for extending snippet smoke tests to .txt AI guides (deferred, low priority). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 50bc3fc commit 6ca617e

5 files changed

Lines changed: 27 additions & 10 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ Deferred items from PR reviews that were not addressed before merge.
7979
| ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` || Low |
8080
| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
8181
| Survey bootstrap test coverage: add FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests | `tests/test_survey_phase*.py` | #237 | Medium |
82+
| Doc-snippet smoke tests only cover `.rst` files; new `.txt` AI guides are outside CI validation | `tests/test_doc_snippets.py` | #239 | Low |
8283

8384
---
8485

diff_diff/practitioner.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -591,10 +591,12 @@ def _handle_generic(results: Any):
591591
# ---------------------------------------------------------------------------
592592
def _check_nan_att(results: Any) -> List[str]:
593593
"""Return warnings if ATT is NaN."""
594-
# Check both .att (DiDResults) and .overall_att (staggered results)
594+
# Check .att (DiDResults), .overall_att (staggered), .avg_att (MultiPeriod)
595595
att = getattr(results, "att", None)
596596
if att is None:
597597
att = getattr(results, "overall_att", None)
598+
if att is None:
599+
att = getattr(results, "avg_att", None)
598600
if att is not None and isinstance(att, float) and math.isnan(att):
599601
return [
600602
"Estimation produced NaN ATT — check data preparation and "

docs/llms-practitioner.txt

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ for test_name, result in placebo.items():
255255

256256
### Bacon decomposition (for TWFE users)
257257
If you used TWFE, always run BaconDecomposition to check whether the
258-
estimate is contaminated by forbidden comparisons (see Step 3).
258+
estimate is contaminated by forbidden comparisons (see Step 4).
259259

260260
---
261261

@@ -389,16 +389,21 @@ data = load_mpdta()
389389
# Step 2: Assumptions — PT-GT-NYT (not-yet-treated parallel trends),
390390
# no anticipation, doubly robust for conditional PT
391391

392-
# Step 3: Estimator selection — staggered adoption → CS (primary), SA (robustness)
393-
# First, diagnose TWFE bias:
392+
# Step 3: Test parallel trends
393+
pt = check_parallel_trends(data, outcome='lemp', time='year',
394+
treatment_group='first_treat')
395+
print(f"Pre-trends p-value: {pt['p_value']:.4f}")
396+
397+
# Step 4: Choose estimator — staggered adoption → CS (primary), SA (robustness)
398+
# Diagnose TWFE bias first:
394399
bacon = BaconDecomposition()
395400
bacon_result = bacon.fit(data, outcome='lemp', unit='countyreal',
396401
time='year', first_treat='first_treat')
397402
print(bacon_result.summary())
398403

399-
# Step 4: Inference — cluster at county level (treatment assignment unit)
400-
401-
# Step 5: Estimate
404+
# Step 5: Estimate (cluster at county level treatment assignment unit)
405+
n_clusters = data['countyreal'].nunique()
406+
print(f"Clusters: {n_clusters} -> {'cluster-robust SEs' if n_clusters >= 50 else 'wild bootstrap'}")
402407
cs = CallawaySantAnna(
403408
control_group='not_yet_treated', estimation_method='dr',
404409
cluster='countyreal',

docs/practitioner-guide-evaluation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ instances with no shared context.
2525
|------|-------------|---|---|---|
2626
| S1 | Define target parameter | Not mentioned | Mentions ATT types | Explicitly defines weighted/unweighted, policy question |
2727
| S2 | State assumptions | Not mentioned | Mentions parallel trends | Formally names PT variant (PT-GT-NYT etc.) |
28-
| S3 | Appropriate estimator | Uses naive TWFE | Uses CS but no diagnostic | CS + Bacon diagnostic, explains choice |
29-
| S4 | Inference approach | Not discussed | Clusters SEs | Clusters + discusses alternatives (wild bootstrap) |
30-
| S5 | Estimation | No code | Partial code | Complete, working code |
28+
| S3 | Test parallel trends | Not done | Informal check (event study eyeball) | Runs check_parallel_trends / equivalence_test_trends |
29+
| S4 | Choose estimator | Uses naive TWFE | Uses CS but no diagnostic | CS + Bacon diagnostic, explains choice |
30+
| S5 | Estimate (with cluster check) | No code | Partial code | Complete code with cluster count check |
3131
| S6 | Sensitivity analysis | Not done | Mentions but doesn't run | Runs HonestDiD and/or placebo tests |
3232
| S7 | Heterogeneity | Not done | Some aggregation | Group + event study + subgroup |
3333
| S8 | Robustness | Not done | Compares 2 estimators | 3+ estimators + with/without covariates |

tests/test_practitioner.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,15 @@ def test_nan_att_produces_warning(self):
341341
assert len(output["warnings"]) > 0
342342
assert any("NaN" in w for w in output["warnings"])
343343

344+
def test_nan_avg_att_multi_period(self):
345+
"""MultiPeriodDiDResults uses avg_att, not att."""
346+
from diff_diff.results import MultiPeriodDiDResults
347+
348+
r = MultiPeriodDiDResults.__new__(MultiPeriodDiDResults)
349+
r.avg_att = float("nan")
350+
output = practitioner_next_steps(r, verbose=False)
351+
assert any("NaN" in w for w in output["warnings"])
352+
344353

345354
# ---------------------------------------------------------------------------
346355
# Tests: Bacon handler warnings

0 commit comments

Comments
 (0)