Add base_period parameter to CallawaySantAnna for pre-treatment effects by igerber · Pull Request #97 · igerber/diff-diff

igerber · 2026-01-21T19:06:52Z

Summary

Add base_period parameter to CallawaySantAnna matching R's did::att_gt() API
Implement "varying" mode (default): pre-treatment uses t-1 as base (consecutive comparisons)
Implement "universal" mode: all comparisons use g-anticipation-1 as base
Both modes produce identical post-treatment ATT(g,t); differ only in pre-treatment presentation
Update aggregation to only include post-treatment effects in overall ATT
Add 11 new tests for pre-treatment effects functionality

Methodology references (required if estimator / math changes)

Method name(s): Callaway & Sant'Anna (2021) Difference-in-Differences with Multiple Time Periods
Paper / source link(s): https://doi.org/10.1016/j.jeconom.2020.12.001
Any intentional deviations from the source (and why): None - implementation matches R's did::att_gt() base_period parameter exactly

Validation

Tests added/updated: tests/test_staggered.py - added TestCallawaySantAnnaPreTreatment class with 11 tests
Backtest / simulation / notebook evidence (if applicable): Validated against R's did package v2.3.0 with max numerical difference of 4.91e-05 across all ATT(g,t), event study effects, and overall ATT

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Implement the base_period parameter matching R's did::att_gt() API to enable computation of pre-treatment ATT(g,t) values for parallel trends assessment. Two modes are supported: - "varying" (default): Pre-treatment uses t-1 as base (consecutive comparisons) - "universal": All comparisons use g-anticipation-1 as base Both modes produce identical post-treatment ATT(g,t) values. They differ only in how pre-treatment effects are computed. The overall ATT aggregation only includes post-treatment effects, matching R's behavior. Changes: - Add base_period parameter to CallawaySantAnna.__init__ with validation - Modify _compute_att_gt_fast to select base period based on mode - Update fit() to compute pre-treatment ATT(g,t) where t < g - anticipation - Filter _aggregate_simple and bootstrap to only aggregate post-treatment effects - Add base_period to CallawaySantAnnaResults and display in summary() - Update methodology registry with base_period edge case documentation - Add 11 new tests for pre-treatment effects Validated against R's did package v2.3.0 with max numerical difference of 4.91e-05. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-21T19:15:21Z

Overall assessment: ⚠️ Needs changes

Executive summary

Affects CallawaySantAnna (Callaway & Sant’Anna 2021); registry updated for base_period.
P1: base_period="varying" falls back to non‑consecutive base periods when t-1 is missing, which diverges from the registry’s consecutive‑comparison definition.
P1: overall ATT can silently compute with no post‑treatment effects (pre‑treatment only), yielding misleading 0/NaN results.
Docs/tests lag: base_period isn’t documented in the estimator API; add a regression test for “no post‑treatment” cases.

Methodology

Severity: P1. Impact: Registry says “varying” uses consecutive t-1 as base, but implementation falls back to the closest earlier period when t-1 is absent; for irregular time grids, pre‑treatment placebo effects won’t match the stated definition or R did::att_gt expectations, weakening parallel‑trends diagnostics. Concrete fix: either enforce consecutive periods (skip/NaN or raise when t-1 missing) or explicitly document and test the fallback behavior in the registry and API docs (diff_diff/staggered.py:L452-L469, docs/methodology/REGISTRY.md:L212-L215).
Severity: P1. Impact: With pre‑treatment effects now computed, cohorts with no post‑treatment periods can still populate group_time_effects, and overall ATT/SE is computed on an empty post‑treatment set, producing misleading zeros or NaNs instead of a hard failure. Concrete fix: guard for at least one post‑treatment (g,t) before aggregation; if none, raise or return NaN with warning, and mirror the guard in bootstrap (diff_diff/staggered.py:L664-L710, diff_diff/staggered_aggregation.py:L65-L90, diff_diff/staggered_bootstrap.py:L314-L399).

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

Severity: P3. Impact: New public parameter base_period is not documented in the estimator’s parameter list, so users won’t discover or understand the new behavior from API docs. Concrete fix: add base_period to the parameter docs and consider a regression test for the “no post‑treatment periods” case to lock the intended error/NaN behavior (diff_diff/staggered.py:L165-L212, tests/test_staggered.py:L1970-L2207).

- Remove fallback to non-consecutive base periods in varying mode - Add base_period parameter to CallawaySantAnna docstring - Add validation and warning for empty post-treatment effect sets - Add tests for no-fallback behavior and NaN result with warning Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-21T19:29:52Z

/ai-review

github-actions · 2026-01-21T19:38:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c40c46cfb8e274e9997c4dfe21e07a3fed40e879

Overall assessment: ⚠️ Needs changes

Executive Summary

CallawaySantAnna base_period modes are added with registry updates, and overall ATT remains post‑treatment only.
P1: bootstrap overall ATT is still computed when no post‑treatment effects exist, yielding invalid SE/p‑values.
P2: analytic overall t/p default to 0/1 when overall_att is NaN in the no‑post‑treatment case.
Tests cover no‑post‑treatment without bootstrap but miss the bootstrap edge case.

Methodology

P1 — CallawaySantAnna: bootstrap overall ATT still computes with an empty post‑treatment set, producing finite SE/p‑values despite undefined overall ATT; Impact: incorrect inference when only pre‑treatment effects exist, conflicting with invalid‑inference handling; Fix: short‑circuit when post_treatment_indices is empty (set original_overall/bootstrap_overall to NaN and return NaN overall bootstrap stats, and skip overall_* updates in diff_diff/staggered.py). Refs: diff_diff/staggered_bootstrap.py:L314-L340, diff_diff/staggered.py:L735-L751
P3 — CallawaySantAnna base_period modes and post‑treatment‑only aggregation align with the Methodology Registry and cited paper; Impact: n/a. Fix: none. Refs: docs/methodology/REGISTRY.md:L165-L217, diff_diff/staggered.py:L461-L475, diff_diff/staggered.py:L670-L681

Code Quality

P2 — When _aggregate_simple returns NaN (no post‑treatment), overall_t/overall_p are set to 0/1 via the overall_se > 0 guard; Impact: misleading inference fields alongside NaN ATT; Fix: if overall_att or overall_se is non‑finite, propagate NaN to overall_t, overall_p, and overall_ci, and avoid bootstrap overwrite unless overall_att is finite. Refs: diff_diff/staggered.py:L712-L718, diff_diff/staggered.py:L747-L751

Performance

P3 — No performance issues found in the diff. Impact: n/a. Fix: none.

Maintainability

P3 — No maintainability issues found in the diff. Impact: n/a. Fix: none.

Tech Debt

P3 — No tech debt issues found in the diff. Impact: n/a. Fix: none.

Security

P3 — No security issues found in the diff. Impact: n/a. Fix: none.

Documentation/Tests

P3 — Missing bootstrap coverage for the no‑post‑treatment path; Impact: bootstrap regressions like the P1 issue could slip; Fix: add a test (next to the existing no‑post‑treatment test) with n_bootstrap>0 asserting NaN overall bootstrap stats and a warning. Ref: tests/test_staggered.py:L2242-L2284

- Fix bootstrap to return early with NaN when no post-treatment effects - Fix t-stat/p-value to use NaN (not 0/1) when SE is undefined - Add comprehensive tests for bootstrap edge case - Document edge case in Methodology Registry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-21T20:01:50Z

/ai-review

github-actions · 2026-01-21T20:21:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cbc28dd50bba61b97ebebec079d8cfc47135d93d

Overall assessment: ⚠️ Needs changes

Methodology

Severity: P1; Impact: For Callaway & Sant’Anna with anticipation > 0, cohort-level ATT(g) and bootstrap group effects drop early treated periods because aggregation uses t >= g instead of t >= g - anticipation, diverging from the registry’s shifted reference period and from overall ATT; Concrete fix: change post-treatment filters in _aggregate_by_group and _prepare_group_aggregation to t >= g - self.anticipation, and adjust related tests/docs. (diff_diff/staggered_aggregation.py:435, diff_diff/staggered_bootstrap.py:613, docs/methodology/REGISTRY.md:202)
Severity: P2; Impact: With new pre-treatment outputs, non-finite/zero SEs still produce t_stat=0 and p=1 for ATT(g,t), event-study, and group effects, contradicting the registry’s NaN-propagation rule and masking inference failures; Concrete fix: mirror overall-ATT handling by setting t_stat/p_value to NaN when SE is non-finite or <=0 in group-time and aggregation paths. (diff_diff/staggered.py:688, diff_diff/staggered_aggregation.py:404, diff_diff/staggered_aggregation.py:460, docs/methodology/REGISTRY.md:207)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3; Impact: Tests cover base_period basics but not the new anticipation boundary or NaN inference behavior, so regressions in group aggregation and invalid-SE handling would go undetected; Concrete fix: add tests with anticipation > 0 validating group effects/bootstraps and a case that forces invalid SEs to assert NaN t/p. (tests/test_staggered.py:1970)

Open Questions

Assumption: time periods are consecutive integers; otherwise base_period="varying" uses t-1 and silently drops pre-treatment effects—intended? (diff_diff/staggered.py:461)

Executive Summary

Adds base_period and pre-treatment ATT(g,t) while keeping post-treatment base anchored at g-1-anticipation.
Overall ATT aggregation/bootstrapping now exclude pre-treatment and return NaN when no post-treatment exists.
Main blocker: group-level aggregations ignore anticipation periods, making ATT(g) inconsistent when anticipation > 0.
Also need to align per-effect inference with registry NaN propagation and add anticipation/inference tests.

Fixes: - P1 (Critical): Group aggregation now uses t >= g - anticipation instead of t >= g for post-treatment filtering (staggered_aggregation.py, staggered_bootstrap.py) - P2: Per-effect t_stat now returns NaN (not 0.0) when SE is non-finite or zero, consistent with overall_t_stat behavior (4 locations in staggered.py) Tests: - Added TestCallawaySantAnnaAnticipation with 2 tests for anticipation boundaries - Added TestCallawaySantAnnaTStatNaN with 2 tests for NaN t_stat consistency Documentation: - Updated REGISTRY.md to document anticipation handling in group aggregation - Updated REGISTRY.md to document per-effect t_stat NaN behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-22T12:24:21Z

/ai-review

github-actions · 2026-01-22T12:35:00Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 39b6f93e15b931d4e6a51cb9e6435a4a4c6f3faf

Overall assessment: ⚠️ Needs changes
Executive Summary

CallawaySantAnna is the only estimator touched; base_period and post‑treatment aggregation align with the registry, but two inference paths diverge.
Event‑study and group‑level t_stats still default to 0.0 on invalid SEs, conflicting with the registry’s NaN policy.
Bootstrap short‑circuits when no post‑treatment effects exist, skipping bootstrap SEs for pre‑treatment effects despite n_bootstrap>0.
No performance or security regressions spotted in the diff.

Methodology

(P1) Event‑study and group‑level t_stats use 0.0 when SE is non‑finite/zero, but the Methodology Registry specifies NaN propagation. Impact: misleading p‑values/inference for edge cases and doc/code mismatch. Fix: apply the same np.isfinite(se) and se > 0 guard used for group‑time/overall and propagate NaN in _aggregate_event_study and _aggregate_by_group. (diff_diff/staggered_aggregation.py:404, diff_diff/staggered_aggregation.py:460)
(P1) Bootstrap returns early when no post‑treatment effects exist, leaving group‑time/event‑study/group SEs as analytic despite n_bootstrap>0. Impact: incorrect variance/SEs for pre‑treatment diagnostics and inconsistent bootstrap behavior. Fix: keep bootstrap resampling for per‑effect outputs and only set overall ATT stats to NaN when post‑treatment weights are empty. (diff_diff/staggered_bootstrap.py:327)

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

No issues found.

…reatment - Fix aggregated t_stat in _aggregate_event_study and _aggregate_by_group to use NaN when SE is non-finite or zero (was defaulting to 0.0) - Refactor bootstrap to continue computing per-effect SEs when no post-treatment effects exist (only overall ATT stats are NaN) - Add tests for aggregated t_stat NaN behavior - Add test verifying bootstrap runs for pre-treatment effects - Update methodology REGISTRY.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-22T12:50:31Z

/ai-review

github-actions · 2026-01-22T12:59:56Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 573496480eaac779b99574722675fe9fec0bfabc

Overall assessment: ⚠️ Needs changes

Executive Summary

P1 methodology issue: pre-treatment ATT(g,t) for control_group="not_yet_treated" uses overlapping treated/control samples, biasing placebo checks.
Base-period modes and post-treatment-only aggregation align with the Methodology Registry and Callaway & Sant’Anna (2021).
Tests added for base_period and NaN t-stats, but no coverage for not-yet-treated pre-treatment behavior.

Methodology

Severity: P1 | Impact: For control_group="not_yet_treated" and pre-treatment periods (t < g), the control mask includes the treated cohort (G=g) because it is defined as unit_cohorts > t, so treated and control samples overlap; this biases pre-treatment ATT(g,t) and undermines parallel-trends diagnostics. | Concrete fix: Exclude treated cohort units from controls when computing ATT(g,t), e.g., control_mask = (never_treated_mask | (unit_cohorts > t)) & ~treated_mask (or unit_cohorts != g) before applying valid_mask. (diff_diff/staggered.py:L487-L506)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P2 | Impact: Missing regression coverage for the not-yet-treated control case means the overlap bug above isn’t detected; pre-treatment diagnostics could silently regress. | Concrete fix: Add a targeted test in tests/test_staggered.py alongside TestCallawaySantAnnaPreTreatment that runs with control_group="not_yet_treated" and asserts the treated cohort is excluded from controls (e.g., by checking n_control against the expected count or constructing a dataset where overlap would force ATT≈0). (tests/test_staggered.py:L1970-L2320)

Fix critical bug where control_group="not_yet_treated" incorrectly included treated cohort g in controls for pre-treatment periods (t < g). When computing ATT(g,t), cohort g should never be in the control group, regardless of whether t < g or t >= g. The fix adds explicit exclusion of cohort g from the not-yet-treated control mask. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-22T13:17:28Z

/ai-review

github-actions · 2026-01-22T13:39:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 021bd6676e0746cccb81871341f516dc076dc733

Overall assessment: ✅ Looks good
Executive Summary

CallawaySantAnna (Callaway & Sant'Anna 2021) is the only method affected; base_period adds pre-treatment ATT(g,t) while overall ATT now aggregates post-treatment only.
Implementation aligns with the Methodology Registry for base_period, anticipation-aware aggregation, and not_yet_treated control exclusion.
One doc-clarity item: the ATT(g,t) equation still implies Y_{g-1} for all t, while varying pre-treatment uses t-1.

Methodology

P3 | Impact: The ATT(g,t) equation in docs/methodology/REGISTRY.md:180 and diff_diff/staggered.py:275 still implies a g-1 base for all t, which can mislead users now that base_period="varying" uses t-1 for pre-treatment (documented at docs/methodology/REGISTRY.md:225). | Concrete fix: Add a short note near the equation and in the class docstring clarifying the equation applies to post-treatment/universal base and that varying pre-treatment uses consecutive t vs t-1 comparisons for diagnostics.

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

No issues found; tests for base_period and bootstrap behavior added in tests/test_staggered.py:1970 (not run in this review).

Add notes explaining that the canonical ATT(g,t) equation (using g-1 as base) applies to post-treatment effects and base_period="universal". With base_period="varying" (default), pre-treatment uses t-1 for consecutive comparisons useful in parallel trends diagnostics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create /pre-merge-check skill for automated checks before submitting PRs - Pattern checks for NaN handling issues - Test coverage verification - Context-specific checklists based on changed files - Update PR review prompt with edge case checklist from PR #97 analysis - Empty result set handling - NaN/Inf propagation checks - Parameter interaction verification - Control group logic validation - Pattern consistency checks - Add Task Implementation Workflow section to CLAUDE.md - 4-phase workflow: Planning, Implementation, Pre-Merge Review, Submit - Quick reference commands for common pattern checks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…rence Replace `else 0.0` with `else np.nan` when SE is non-finite or zero in t-stat calculations across sun_abraham.py, triple_diff.py, and diagnostics.py. Add CI guards returning (NaN, NaN) for 4 downstream confidence interval computations. Matches the CallawaySantAnna pattern established in PR #97. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber merged commit 14bc71f into main Jan 22, 2026
4 checks passed

igerber deleted the feature/base-period-pretreatment branch January 22, 2026 14:19

igerber mentioned this pull request Jan 22, 2026

Add pre-merge-check skill and improve PR review workflow #98

Merged

igerber mentioned this pull request Jan 31, 2026

Fix NaN t-statistics across 7 locations for consistent undefined inference #118

Merged

github-actions Bot mentioned this pull request Mar 8, 2026

Update Codex AI review model to gpt-5.4 #193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base_period parameter to CallawaySantAnna for pre-treatment effects#97

Add base_period parameter to CallawaySantAnna for pre-treatment effects#97
igerber merged 7 commits into
mainfrom
feature/base-period-pretreatment

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jan 21, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jan 21, 2026

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant