Skip to content

Commit 5a3af95

Browse files
authored
Merge pull request #102 from igerber/docs/cs-sa-preperiod-explanation
Explain CS vs SA pre-period discrepancy in Tutorial 02
2 parents 63e403b + 2ac6605 commit 5a3af95

3 files changed

Lines changed: 112 additions & 96 deletions

File tree

docs/methodology/REGISTRY.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,11 @@ Aggregations:
229229
- "universal": All comparisons use g-anticipation-1 as base
230230
- Both produce identical post-treatment ATT(g,t); differ only pre-treatment
231231
- Matches R `did::att_gt()` base_period parameter
232+
- Base period interaction with Sun-Abraham comparison:
233+
- CS with `base_period="varying"` produces different pre-treatment estimates than SA
234+
- This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1-anticipation)
235+
- Use `base_period="universal"` for methodologically comparable pre-treatment effects
236+
- Post-treatment effects match regardless of base_period setting
232237
- Control group with `control_group="not_yet_treated"`:
233238
- Always excludes cohort g from controls when computing ATT(g,t)
234239
- This applies to both pre-treatment (t < g) and post-treatment (t >= g) periods
@@ -257,7 +262,7 @@ Aggregations:
257262
*Assumption checks / warnings:*
258263
- Requires never-treated units as control group
259264
- Warns if treatment effects may be heterogeneous across cohorts (which the method handles)
260-
- Reference period must be specified (default: e=-1)
265+
- Reference period: e=-1-anticipation (defaults to e=-1 when anticipation=0)
261266

262267
*Estimator equation (as implemented):*
263268

docs/tutorials/02_staggered_did.ipynb

Lines changed: 5 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,7 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": [
7-
"# Staggered Difference-in-Differences\n",
8-
"\n",
9-
"This notebook demonstrates how to handle **staggered treatment adoption** using modern DiD estimators. In staggered DiD settings:\n",
10-
"\n",
11-
"- Different units get treated at different times\n",
12-
"- Traditional TWFE can give biased estimates due to \"forbidden comparisons\"\n",
13-
"- Modern estimators compute group-time specific effects and aggregate them properly\n",
14-
"\n",
15-
"We'll cover:\n",
16-
"1. Understanding staggered adoption\n",
17-
"2. The problem with TWFE (and Goodman-Bacon decomposition)\n",
18-
"3. The Callaway-Sant'Anna estimator\n",
19-
"4. Group-time effects ATT(g,t)\n",
20-
"5. Aggregating effects (simple, group, event-study)\n",
21-
"6. Bootstrap inference for valid standard errors\n",
22-
"7. Visualization\n",
23-
"8. **Pre-treatment effects and parallel trends testing**\n",
24-
"9. Different control group options\n",
25-
"10. Handling anticipation effects\n",
26-
"11. Adding covariates\n",
27-
"12. Comparing with MultiPeriodDiD\n",
28-
"13. Sun-Abraham interaction-weighted estimator\n",
29-
"14. Comparing CS and SA as a robustness check"
30-
]
6+
"source": "# Staggered Difference-in-Differences\n\nThis notebook demonstrates how to handle **staggered treatment adoption** using modern DiD estimators. In staggered DiD settings:\n\n- Different units get treated at different times\n- Traditional TWFE can give biased estimates due to \"forbidden comparisons\"\n- Modern estimators compute group-time specific effects and aggregate them properly\n\nWe'll cover:\n1. Understanding staggered adoption\n2. The problem with TWFE (and Goodman-Bacon decomposition)\n3. The Callaway-Sant'Anna estimator\n4. Group-time effects ATT(g,t)\n5. Aggregating effects (simple, group, event-study)\n6. Bootstrap inference for valid standard errors\n7. Visualization\n8. Pre-treatment effects and parallel trends testing\n9. Different control group options\n10. Handling anticipation effects\n11. Adding covariates\n12. Comparing with MultiPeriodDiD\n13. Sun-Abraham interaction-weighted estimator\n14. Comparing CS and SA as a robustness check"
317
},
328
{
339
"cell_type": "code",
@@ -834,85 +810,19 @@
834810
{
835811
"cell_type": "markdown",
836812
"metadata": {},
837-
"source": [
838-
"## 14. Comparing CS and SA as a Robustness Check\n",
839-
"\n",
840-
"Running both estimators provides a useful robustness check. When they agree, results are more credible."
841-
]
813+
"source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - These smaller incremental changes often yield lower t-statistics\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1 when anticipation=0, or e=-1-anticipation otherwise)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates."
842814
},
843815
{
844816
"cell_type": "code",
845817
"execution_count": null,
846818
"metadata": {},
847819
"outputs": [],
848-
"source": [
849-
"# Compare overall ATT from both estimators\n",
850-
"print(\"Robustness Check: CS vs SA\")\n",
851-
"print(\"=\" * 50)\n",
852-
"print(f\"{'Estimator':<25} {'Overall ATT':>12} {'SE':>10}\")\n",
853-
"print(\"-\" * 50)\n",
854-
"print(f\"{'Callaway-SantAnna':<25} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}\")\n",
855-
"print(f\"{'Sun-Abraham':<25} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}\")\n",
856-
"\n",
857-
"# Compare event study effects\n",
858-
"print(\"\\n\\nEvent Study Comparison:\")\n",
859-
"print(f\"{'Rel. Time':>12} {'CS ATT':>10} {'SA ATT':>10} {'Difference':>12}\")\n",
860-
"print(\"-\" * 50)\n",
861-
"\n",
862-
"# Use the pre-computed event_study_effects from results_cs\n",
863-
"for rel_time in sorted(results_sa.event_study_effects.keys()):\n",
864-
" sa_eff = results_sa.event_study_effects[rel_time]['effect']\n",
865-
" if results_cs.event_study_effects and rel_time in results_cs.event_study_effects:\n",
866-
" cs_eff = results_cs.event_study_effects[rel_time]['effect']\n",
867-
" diff = sa_eff - cs_eff\n",
868-
" print(f\"{rel_time:>12} {cs_eff:>10.4f} {sa_eff:>10.4f} {diff:>12.4f}\")\n",
869-
"\n",
870-
"print(\"\\nSimilar results indicate robust findings across estimation methods\")"
871-
]
820+
"source": "# Compare overall ATT from both estimators\nprint(\"Robustness Check: CS vs SA\")\nprint(\"=\" * 60)\nprint(f\"{'Estimator':<30} {'Overall ATT':>12} {'SE':>10}\")\nprint(\"-\" * 60)\nprint(f\"{'Callaway-Sant\\\\'Anna (varying)':<30} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}\")\nprint(f\"{'Sun-Abraham':<30} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}\")\n\n# Also fit CS with universal base period for comparison\ncs_universal = CallawaySantAnna(control_group=\"never_treated\", base_period=\"universal\")\nresults_cs_univ = cs_universal.fit(\n df, outcome=\"outcome\", unit=\"unit\",\n time=\"period\", first_treat=\"first_treat\",\n aggregate=\"event_study\"\n)\n\n# Compare event study effects\nprint(\"\\n\\nEvent Study Comparison:\")\nprint(\"Note: Pre-periods differ due to base period methodology (see explanation above)\")\nprint(f\"{'Rel. Time':>10} {'CS (vary)':>12} {'CS (univ)':>12} {'SA':>10} {'Note':>20}\")\nprint(\"-\" * 70)\n\nfor rel_time in sorted(results_sa.event_study_effects.keys()):\n sa_eff = results_sa.event_study_effects[rel_time]['effect']\n cs_vary = results_cs.event_study_effects.get(rel_time, {}).get('effect', np.nan)\n cs_univ = results_cs_univ.event_study_effects.get(rel_time, {}).get('effect', np.nan)\n \n note = \"pre (differs)\" if rel_time < 0 else \"post (matches)\"\n print(f\"{rel_time:>10} {cs_vary:>12.4f} {cs_univ:>12.4f} {sa_eff:>10.4f} {note:>20}\")\n\nprint(\"\\nPost-treatment effects should be similar across all methods\")\nprint(\"Pre-treatment differences are expected due to base period methodology\")"
872821
},
873822
{
874823
"cell_type": "markdown",
875824
"metadata": {},
876-
"source": [
877-
"## Summary\n",
878-
"\n",
879-
"Key takeaways:\n",
880-
"\n",
881-
"1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n",
882-
"2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n",
883-
" - The implicit 2x2 comparisons and their weights\n",
884-
" - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n",
885-
"3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n",
886-
" - Computing group-time specific effects ATT(g,t)\n",
887-
" - Only using valid comparison groups\n",
888-
" - Properly aggregating effects\n",
889-
"4. **Sun-Abraham** provides an alternative approach using:\n",
890-
" - Interaction-weighted regression with cohort x relative-time indicators\n",
891-
" - Different weighting scheme than CS\n",
892-
" - More efficient under homogeneous effects\n",
893-
"5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n",
894-
"6. **Aggregation options**:\n",
895-
" - `\"simple\"`: Overall ATT\n",
896-
" - `\"group\"`: ATT by cohort\n",
897-
" - `\"event\"`: ATT by event time (for event-study plots)\n",
898-
"7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n",
899-
" - Use `n_bootstrap` parameter to enable multiplier bootstrap\n",
900-
" - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n",
901-
" - Bootstrap results include SEs, CIs, and p-values for all aggregations\n",
902-
"8. **Pre-treatment effects** provide parallel trends diagnostics:\n",
903-
" - Use `base_period=\"varying\"` for consecutive period comparisons\n",
904-
" - Pre-treatment ATT(g,t) should be near zero\n",
905-
" - 95% CIs including zero is consistent with parallel trends\n",
906-
" - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n",
907-
"9. **Control group choices** affect efficiency and assumptions:\n",
908-
" - `\"never_treated\"`: Stronger parallel trends assumption\n",
909-
" - `\"not_yet_treated\"`: Weaker assumption, uses more data\n",
910-
"\n",
911-
"For more details, see:\n",
912-
"- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n",
913-
"- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n",
914-
"- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*."
915-
]
825+
"source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1-anticipation) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*."
916826
}
917827
],
918828
"metadata": {
@@ -922,4 +832,4 @@
922832
},
923833
"nbformat": 4,
924834
"nbformat_minor": 4
925-
}
835+
}

tests/test_sun_abraham.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -526,6 +526,107 @@ def test_both_recover_treatment_effect(self):
526526
assert abs(sa_results.overall_att - 3.0) < 2.0
527527
assert abs(cs_results.overall_att - 3.0) < 2.0
528528

529+
def test_pre_period_difference_expected_between_cs_sa(self):
530+
"""Pre-periods differ between CS (varying) and SA; post-periods match.
531+
532+
This is expected: CS uses consecutive comparisons, SA uses fixed reference.
533+
CS with base_period="universal" should be closer to SA for pre-periods.
534+
"""
535+
from diff_diff import CallawaySantAnna
536+
537+
data = generate_staggered_data(
538+
n_units=200, treatment_effect=3.0, seed=42
539+
)
540+
541+
# Sun-Abraham (uses fixed reference period e=-1)
542+
sa = SunAbraham()
543+
sa_results = sa.fit(
544+
data,
545+
outcome="outcome",
546+
unit="unit",
547+
time="time",
548+
first_treat="first_treat",
549+
)
550+
551+
# Callaway-Sant'Anna with varying base (default: consecutive comparisons)
552+
cs_varying = CallawaySantAnna(base_period="varying")
553+
cs_varying_results = cs_varying.fit(
554+
data,
555+
outcome="outcome",
556+
unit="unit",
557+
time="time",
558+
first_treat="first_treat",
559+
aggregate="event_study",
560+
)
561+
562+
# Callaway-Sant'Anna with universal base (all compare to g-1)
563+
cs_universal = CallawaySantAnna(base_period="universal")
564+
cs_universal_results = cs_universal.fit(
565+
data,
566+
outcome="outcome",
567+
unit="unit",
568+
time="time",
569+
first_treat="first_treat",
570+
aggregate="event_study",
571+
)
572+
573+
# Find common event times
574+
sa_times = set(sa_results.event_study_effects.keys())
575+
cs_varying_times = set(cs_varying_results.event_study_effects.keys())
576+
cs_universal_times = set(cs_universal_results.event_study_effects.keys())
577+
common_times = sa_times & cs_varying_times & cs_universal_times
578+
579+
# Separate pre and post periods
580+
pre_times = [t for t in common_times if t < 0]
581+
post_times = [t for t in common_times if t > 0]
582+
583+
# Post-treatment effects should match across all methods
584+
for t in post_times:
585+
sa_eff = sa_results.event_study_effects[t]["effect"]
586+
cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"]
587+
cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"]
588+
589+
# All three should be similar for post-treatment
590+
max_se = max(
591+
sa_results.event_study_effects[t]["se"],
592+
cs_varying_results.event_study_effects[t]["se"],
593+
cs_universal_results.event_study_effects[t]["se"],
594+
)
595+
assert abs(sa_eff - cs_vary_eff) < 3 * max_se, (
596+
f"Post-period t={t}: SA and CS(varying) differ too much: "
597+
f"SA={sa_eff:.4f}, CS(vary)={cs_vary_eff:.4f}"
598+
)
599+
assert abs(sa_eff - cs_univ_eff) < 3 * max_se, (
600+
f"Post-period t={t}: SA and CS(universal) differ too much: "
601+
f"SA={sa_eff:.4f}, CS(univ)={cs_univ_eff:.4f}"
602+
)
603+
604+
# Require pre-periods exist for this test to be meaningful
605+
assert len(pre_times) > 0, (
606+
"Test requires pre-treatment periods to validate methodology difference. "
607+
"Increase n_periods or adjust cohort timing in test data."
608+
)
609+
610+
# Compute total absolute differences
611+
total_diff_varying = 0.0
612+
total_diff_universal = 0.0
613+
for t in pre_times:
614+
sa_eff = sa_results.event_study_effects[t]["effect"]
615+
cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"]
616+
cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"]
617+
618+
total_diff_varying += abs(sa_eff - cs_vary_eff)
619+
total_diff_universal += abs(sa_eff - cs_univ_eff)
620+
621+
# CS(universal) should generally be closer to SA than CS(varying)
622+
# for pre-treatment periods (due to similar reference period approach)
623+
# Allow some tolerance since weighting schemes still differ
624+
assert total_diff_universal <= total_diff_varying + 0.5, (
625+
f"Expected CS(universal) to be closer to SA than CS(varying) for pre-periods. "
626+
f"Got: CS(univ)-SA diff={total_diff_universal:.4f}, "
627+
f"CS(vary)-SA diff={total_diff_varying:.4f}"
628+
)
629+
529630
def test_agreement_under_homogeneous_effects(self):
530631
"""Test that SA and CS agree under homogeneous treatment effects."""
531632
from diff_diff import CallawaySantAnna

0 commit comments

Comments
 (0)