Skip to content

Add base_period parameter to CallawaySantAnna for pre-treatment effects#97

Merged
igerber merged 7 commits into
mainfrom
feature/base-period-pretreatment
Jan 22, 2026
Merged

Add base_period parameter to CallawaySantAnna for pre-treatment effects#97
igerber merged 7 commits into
mainfrom
feature/base-period-pretreatment

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 21, 2026

Summary

  • Add base_period parameter to CallawaySantAnna matching R's did::att_gt() API
  • Implement "varying" mode (default): pre-treatment uses t-1 as base (consecutive comparisons)
  • Implement "universal" mode: all comparisons use g-anticipation-1 as base
  • Both modes produce identical post-treatment ATT(g,t); differ only in pre-treatment presentation
  • Update aggregation to only include post-treatment effects in overall ATT
  • Add 11 new tests for pre-treatment effects functionality

Methodology references (required if estimator / math changes)

  • Method name(s): Callaway & Sant'Anna (2021) Difference-in-Differences with Multiple Time Periods
  • Paper / source link(s): https://doi.org/10.1016/j.jeconom.2020.12.001
  • Any intentional deviations from the source (and why): None - implementation matches R's did::att_gt() base_period parameter exactly

Validation

  • Tests added/updated: tests/test_staggered.py - added TestCallawaySantAnnaPreTreatment class with 11 tests
  • Backtest / simulation / notebook evidence (if applicable): Validated against R's did package v2.3.0 with max numerical difference of 4.91e-05 across all ATT(g,t), event study effects, and overall ATT

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Implement the base_period parameter matching R's did::att_gt() API to enable
computation of pre-treatment ATT(g,t) values for parallel trends assessment.

Two modes are supported:
- "varying" (default): Pre-treatment uses t-1 as base (consecutive comparisons)
- "universal": All comparisons use g-anticipation-1 as base

Both modes produce identical post-treatment ATT(g,t) values. They differ only
in how pre-treatment effects are computed. The overall ATT aggregation only
includes post-treatment effects, matching R's behavior.

Changes:
- Add base_period parameter to CallawaySantAnna.__init__ with validation
- Modify _compute_att_gt_fast to select base period based on mode
- Update fit() to compute pre-treatment ATT(g,t) where t < g - anticipation
- Filter _aggregate_simple and bootstrap to only aggregate post-treatment effects
- Add base_period to CallawaySantAnnaResults and display in summary()
- Update methodology registry with base_period edge case documentation
- Add 11 new tests for pre-treatment effects

Validated against R's did package v2.3.0 with max numerical difference of 4.91e-05.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment: ⚠️ Needs changes

Executive summary

  • Affects CallawaySantAnna (Callaway & Sant’Anna 2021); registry updated for base_period.
  • P1: base_period="varying" falls back to non‑consecutive base periods when t-1 is missing, which diverges from the registry’s consecutive‑comparison definition.
  • P1: overall ATT can silently compute with no post‑treatment effects (pre‑treatment only), yielding misleading 0/NaN results.
  • Docs/tests lag: base_period isn’t documented in the estimator API; add a regression test for “no post‑treatment” cases.

Methodology

  • Severity: P1. Impact: Registry says “varying” uses consecutive t-1 as base, but implementation falls back to the closest earlier period when t-1 is absent; for irregular time grids, pre‑treatment placebo effects won’t match the stated definition or R did::att_gt expectations, weakening parallel‑trends diagnostics. Concrete fix: either enforce consecutive periods (skip/NaN or raise when t-1 missing) or explicitly document and test the fallback behavior in the registry and API docs (diff_diff/staggered.py:L452-L469, docs/methodology/REGISTRY.md:L212-L215).
  • Severity: P1. Impact: With pre‑treatment effects now computed, cohorts with no post‑treatment periods can still populate group_time_effects, and overall ATT/SE is computed on an empty post‑treatment set, producing misleading zeros or NaNs instead of a hard failure. Concrete fix: guard for at least one post‑treatment (g,t) before aggregation; if none, raise or return NaN with warning, and mirror the guard in bootstrap (diff_diff/staggered.py:L664-L710, diff_diff/staggered_aggregation.py:L65-L90, diff_diff/staggered_bootstrap.py:L314-L399).

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P3. Impact: New public parameter base_period is not documented in the estimator’s parameter list, so users won’t discover or understand the new behavior from API docs. Concrete fix: add base_period to the parameter docs and consider a regression test for the “no post‑treatment periods” case to lock the intended error/NaN behavior (diff_diff/staggered.py:L165-L212, tests/test_staggered.py:L1970-L2207).

- Remove fallback to non-consecutive base periods in varying mode
- Add base_period parameter to CallawaySantAnna docstring
- Add validation and warning for empty post-treatment effect sets
- Add tests for no-fallback behavior and NaN result with warning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 21, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c40c46cfb8e274e9997c4dfe21e07a3fed40e879


Overall assessment: ⚠️ Needs changes

Executive Summary

  • CallawaySantAnna base_period modes are added with registry updates, and overall ATT remains post‑treatment only.
  • P1: bootstrap overall ATT is still computed when no post‑treatment effects exist, yielding invalid SE/p‑values.
  • P2: analytic overall t/p default to 0/1 when overall_att is NaN in the no‑post‑treatment case.
  • Tests cover no‑post‑treatment without bootstrap but miss the bootstrap edge case.

Methodology

  • P1 — CallawaySantAnna: bootstrap overall ATT still computes with an empty post‑treatment set, producing finite SE/p‑values despite undefined overall ATT; Impact: incorrect inference when only pre‑treatment effects exist, conflicting with invalid‑inference handling; Fix: short‑circuit when post_treatment_indices is empty (set original_overall/bootstrap_overall to NaN and return NaN overall bootstrap stats, and skip overall_* updates in diff_diff/staggered.py). Refs: diff_diff/staggered_bootstrap.py:L314-L340, diff_diff/staggered.py:L735-L751
  • P3 — CallawaySantAnna base_period modes and post‑treatment‑only aggregation align with the Methodology Registry and cited paper; Impact: n/a. Fix: none. Refs: docs/methodology/REGISTRY.md:L165-L217, diff_diff/staggered.py:L461-L475, diff_diff/staggered.py:L670-L681

Code Quality

  • P2 — When _aggregate_simple returns NaN (no post‑treatment), overall_t/overall_p are set to 0/1 via the overall_se > 0 guard; Impact: misleading inference fields alongside NaN ATT; Fix: if overall_att or overall_se is non‑finite, propagate NaN to overall_t, overall_p, and overall_ci, and avoid bootstrap overwrite unless overall_att is finite. Refs: diff_diff/staggered.py:L712-L718, diff_diff/staggered.py:L747-L751

Performance

  • P3 — No performance issues found in the diff. Impact: n/a. Fix: none.

Maintainability

  • P3 — No maintainability issues found in the diff. Impact: n/a. Fix: none.

Tech Debt

  • P3 — No tech debt issues found in the diff. Impact: n/a. Fix: none.

Security

  • P3 — No security issues found in the diff. Impact: n/a. Fix: none.

Documentation/Tests

  • P3 — Missing bootstrap coverage for the no‑post‑treatment path; Impact: bootstrap regressions like the P1 issue could slip; Fix: add a test (next to the existing no‑post‑treatment test) with n_bootstrap>0 asserting NaN overall bootstrap stats and a warning. Ref: tests/test_staggered.py:L2242-L2284

- Fix bootstrap to return early with NaN when no post-treatment effects
- Fix t-stat/p-value to use NaN (not 0/1) when SE is undefined
- Add comprehensive tests for bootstrap edge case
- Document edge case in Methodology Registry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 21, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cbc28dd50bba61b97ebebec079d8cfc47135d93d


Overall assessment: ⚠️ Needs changes

Methodology

  • Severity: P1; Impact: For Callaway & Sant’Anna with anticipation > 0, cohort-level ATT(g) and bootstrap group effects drop early treated periods because aggregation uses t >= g instead of t >= g - anticipation, diverging from the registry’s shifted reference period and from overall ATT; Concrete fix: change post-treatment filters in _aggregate_by_group and _prepare_group_aggregation to t >= g - self.anticipation, and adjust related tests/docs. (diff_diff/staggered_aggregation.py:435, diff_diff/staggered_bootstrap.py:613, docs/methodology/REGISTRY.md:202)
  • Severity: P2; Impact: With new pre-treatment outputs, non-finite/zero SEs still produce t_stat=0 and p=1 for ATT(g,t), event-study, and group effects, contradicting the registry’s NaN-propagation rule and masking inference failures; Concrete fix: mirror overall-ATT handling by setting t_stat/p_value to NaN when SE is non-finite or <=0 in group-time and aggregation paths. (diff_diff/staggered.py:688, diff_diff/staggered_aggregation.py:404, diff_diff/staggered_aggregation.py:460, docs/methodology/REGISTRY.md:207)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P3; Impact: Tests cover base_period basics but not the new anticipation boundary or NaN inference behavior, so regressions in group aggregation and invalid-SE handling would go undetected; Concrete fix: add tests with anticipation > 0 validating group effects/bootstraps and a case that forces invalid SEs to assert NaN t/p. (tests/test_staggered.py:1970)

Open Questions

  • Assumption: time periods are consecutive integers; otherwise base_period="varying" uses t-1 and silently drops pre-treatment effects—intended? (diff_diff/staggered.py:461)

Executive Summary

  • Adds base_period and pre-treatment ATT(g,t) while keeping post-treatment base anchored at g-1-anticipation.
  • Overall ATT aggregation/bootstrapping now exclude pre-treatment and return NaN when no post-treatment exists.
  • Main blocker: group-level aggregations ignore anticipation periods, making ATT(g) inconsistent when anticipation > 0.
  • Also need to align per-effect inference with registry NaN propagation and add anticipation/inference tests.

Fixes:
- P1 (Critical): Group aggregation now uses t >= g - anticipation instead of
  t >= g for post-treatment filtering (staggered_aggregation.py, staggered_bootstrap.py)
- P2: Per-effect t_stat now returns NaN (not 0.0) when SE is non-finite or zero,
  consistent with overall_t_stat behavior (4 locations in staggered.py)

Tests:
- Added TestCallawaySantAnnaAnticipation with 2 tests for anticipation boundaries
- Added TestCallawaySantAnnaTStatNaN with 2 tests for NaN t_stat consistency

Documentation:
- Updated REGISTRY.md to document anticipation handling in group aggregation
- Updated REGISTRY.md to document per-effect t_stat NaN behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 22, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 39b6f93e15b931d4e6a51cb9e6435a4a4c6f3faf


Overall assessment: ⚠️ Needs changes
Executive Summary

  • CallawaySantAnna is the only estimator touched; base_period and post‑treatment aggregation align with the registry, but two inference paths diverge.
  • Event‑study and group‑level t_stats still default to 0.0 on invalid SEs, conflicting with the registry’s NaN policy.
  • Bootstrap short‑circuits when no post‑treatment effects exist, skipping bootstrap SEs for pre‑treatment effects despite n_bootstrap>0.
  • No performance or security regressions spotted in the diff.

Methodology

  • (P1) Event‑study and group‑level t_stats use 0.0 when SE is non‑finite/zero, but the Methodology Registry specifies NaN propagation. Impact: misleading p‑values/inference for edge cases and doc/code mismatch. Fix: apply the same np.isfinite(se) and se > 0 guard used for group‑time/overall and propagate NaN in _aggregate_event_study and _aggregate_by_group. (diff_diff/staggered_aggregation.py:404, diff_diff/staggered_aggregation.py:460)
  • (P1) Bootstrap returns early when no post‑treatment effects exist, leaving group‑time/event‑study/group SEs as analytic despite n_bootstrap>0. Impact: incorrect variance/SEs for pre‑treatment diagnostics and inconsistent bootstrap behavior. Fix: keep bootstrap resampling for per‑effect outputs and only set overall ATT stats to NaN when post‑treatment weights are empty. (diff_diff/staggered_bootstrap.py:327)

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found.

…reatment

- Fix aggregated t_stat in _aggregate_event_study and _aggregate_by_group
  to use NaN when SE is non-finite or zero (was defaulting to 0.0)
- Refactor bootstrap to continue computing per-effect SEs when no
  post-treatment effects exist (only overall ATT stats are NaN)
- Add tests for aggregated t_stat NaN behavior
- Add test verifying bootstrap runs for pre-treatment effects
- Update methodology REGISTRY.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 22, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 573496480eaac779b99574722675fe9fec0bfabc


Overall assessment: ⚠️ Needs changes

Executive Summary

  • P1 methodology issue: pre-treatment ATT(g,t) for control_group="not_yet_treated" uses overlapping treated/control samples, biasing placebo checks.
  • Base-period modes and post-treatment-only aggregation align with the Methodology Registry and Callaway & Sant’Anna (2021).
  • Tests added for base_period and NaN t-stats, but no coverage for not-yet-treated pre-treatment behavior.

Methodology

  • Severity: P1 | Impact: For control_group="not_yet_treated" and pre-treatment periods (t < g), the control mask includes the treated cohort (G=g) because it is defined as unit_cohorts > t, so treated and control samples overlap; this biases pre-treatment ATT(g,t) and undermines parallel-trends diagnostics. | Concrete fix: Exclude treated cohort units from controls when computing ATT(g,t), e.g., control_mask = (never_treated_mask | (unit_cohorts > t)) & ~treated_mask (or unit_cohorts != g) before applying valid_mask. (diff_diff/staggered.py:L487-L506)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P2 | Impact: Missing regression coverage for the not-yet-treated control case means the overlap bug above isn’t detected; pre-treatment diagnostics could silently regress. | Concrete fix: Add a targeted test in tests/test_staggered.py alongside TestCallawaySantAnnaPreTreatment that runs with control_group="not_yet_treated" and asserts the treated cohort is excluded from controls (e.g., by checking n_control against the expected count or constructing a dataset where overlap would force ATT≈0). (tests/test_staggered.py:L1970-L2320)

Fix critical bug where control_group="not_yet_treated" incorrectly included
treated cohort g in controls for pre-treatment periods (t < g).

When computing ATT(g,t), cohort g should never be in the control group,
regardless of whether t < g or t >= g. The fix adds explicit exclusion
of cohort g from the not-yet-treated control mask.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 22, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 021bd6676e0746cccb81871341f516dc076dc733


Overall assessment: ✅ Looks good
Executive Summary

  • CallawaySantAnna (Callaway & Sant'Anna 2021) is the only method affected; base_period adds pre-treatment ATT(g,t) while overall ATT now aggregates post-treatment only.
  • Implementation aligns with the Methodology Registry for base_period, anticipation-aware aggregation, and not_yet_treated control exclusion.
  • One doc-clarity item: the ATT(g,t) equation still implies Y_{g-1} for all t, while varying pre-treatment uses t-1.

Methodology

  • P3 | Impact: The ATT(g,t) equation in docs/methodology/REGISTRY.md:180 and diff_diff/staggered.py:275 still implies a g-1 base for all t, which can mislead users now that base_period="varying" uses t-1 for pre-treatment (documented at docs/methodology/REGISTRY.md:225). | Concrete fix: Add a short note near the equation and in the class docstring clarifying the equation applies to post-treatment/universal base and that varying pre-treatment uses consecutive t vs t-1 comparisons for diagnostics.

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found; tests for base_period and bootstrap behavior added in tests/test_staggered.py:1970 (not run in this review).

Add notes explaining that the canonical ATT(g,t) equation (using g-1 as base)
applies to post-treatment effects and base_period="universal". With
base_period="varying" (default), pre-treatment uses t-1 for consecutive
comparisons useful in parallel trends diagnostics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber igerber merged commit 14bc71f into main Jan 22, 2026
4 checks passed
@igerber igerber deleted the feature/base-period-pretreatment branch January 22, 2026 14:19
igerber added a commit that referenced this pull request Jan 22, 2026
- Create /pre-merge-check skill for automated checks before submitting PRs
  - Pattern checks for NaN handling issues
  - Test coverage verification
  - Context-specific checklists based on changed files
- Update PR review prompt with edge case checklist from PR #97 analysis
  - Empty result set handling
  - NaN/Inf propagation checks
  - Parameter interaction verification
  - Control group logic validation
  - Pattern consistency checks
- Add Task Implementation Workflow section to CLAUDE.md
  - 4-phase workflow: Planning, Implementation, Pre-Merge Review, Submit
  - Quick reference commands for common pattern checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
igerber added a commit that referenced this pull request Jan 31, 2026
…rence

Replace `else 0.0` with `else np.nan` when SE is non-finite or zero in
t-stat calculations across sun_abraham.py, triple_diff.py, and
diagnostics.py. Add CI guards returning (NaN, NaN) for 4 downstream
confidence interval computations. Matches the CallawaySantAnna pattern
established in PR #97.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant