Skip to content

Commit 2ac6605

Browse files
igerberclaude
andcommitted
Address code review feedback for PR #102
- Fix misleading "smaller SEs" wording in Tutorial 02 Cell 43 → Changed to "lower t-statistics" which is more accurate - Update SA reference period notation throughout docs → e=-1 → e=-1-anticipation (defaults to e=-1 when anticipation=0) - Update REGISTRY.md SunAbraham section with correct reference period - Add meaningful assertions to pre-period test → Test now requires pre-periods exist (not vacuous) → Assert CS(universal) closer to SA than CS(varying) for pre-periods Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 17e4aa5 commit 2ac6605

3 files changed

Lines changed: 29 additions & 25 deletions

File tree

docs/methodology/REGISTRY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ Aggregations:
231231
- Matches R `did::att_gt()` base_period parameter
232232
- Base period interaction with Sun-Abraham comparison:
233233
- CS with `base_period="varying"` produces different pre-treatment estimates than SA
234-
- This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1)
234+
- This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1-anticipation)
235235
- Use `base_period="universal"` for methodologically comparable pre-treatment effects
236236
- Post-treatment effects match regardless of base_period setting
237237
- Control group with `control_group="not_yet_treated"`:
@@ -262,7 +262,7 @@ Aggregations:
262262
*Assumption checks / warnings:*
263263
- Requires never-treated units as control group
264264
- Warns if treatment effects may be heterogeneous across cohorts (which the method handles)
265-
- Reference period must be specified (default: e=-1)
265+
- Reference period: e=-1-anticipation (defaults to e=-1 when anticipation=0)
266266

267267
*Estimator equation (as implemented):*
268268

docs/tutorials/02_staggered_did.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -810,7 +810,7 @@
810810
{
811811
"cell_type": "markdown",
812812
"metadata": {},
813-
"source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - Smaller changes → typically smaller SEs → may not reach significance\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1, the period just before treatment)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates."
813+
"source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - These smaller incremental changes often yield lower t-statistics\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1 when anticipation=0, or e=-1-anticipation otherwise)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates."
814814
},
815815
{
816816
"cell_type": "code",
@@ -822,7 +822,7 @@
822822
{
823823
"cell_type": "markdown",
824824
"metadata": {},
825-
"source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*."
825+
"source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1-anticipation) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*."
826826
}
827827
],
828828
"metadata": {

tests/test_sun_abraham.py

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -601,27 +601,31 @@ def test_pre_period_difference_expected_between_cs_sa(self):
601601
f"SA={sa_eff:.4f}, CS(univ)={cs_univ_eff:.4f}"
602602
)
603603

604-
# For pre-treatment periods, CS(universal) should be closer to SA than CS(varying)
605-
# because both SA and CS(universal) use a fixed reference period
606-
if len(pre_times) > 0:
607-
total_diff_varying = 0.0
608-
total_diff_universal = 0.0
609-
for t in pre_times:
610-
sa_eff = sa_results.event_study_effects[t]["effect"]
611-
cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"]
612-
cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"]
613-
614-
total_diff_varying += abs(sa_eff - cs_vary_eff)
615-
total_diff_universal += abs(sa_eff - cs_univ_eff)
616-
617-
# CS(universal) should generally be closer to SA than CS(varying)
618-
# for pre-treatment periods (due to similar reference period approach)
619-
# Note: This is a soft assertion - in some data configurations
620-
# the relationship may not hold perfectly due to weighting differences
621-
# The key point is that the methodological difference exists
622-
assert (
623-
len(pre_times) > 0
624-
), "Test requires pre-treatment periods to verify methodology difference"
604+
# Require pre-periods exist for this test to be meaningful
605+
assert len(pre_times) > 0, (
606+
"Test requires pre-treatment periods to validate methodology difference. "
607+
"Increase n_periods or adjust cohort timing in test data."
608+
)
609+
610+
# Compute total absolute differences
611+
total_diff_varying = 0.0
612+
total_diff_universal = 0.0
613+
for t in pre_times:
614+
sa_eff = sa_results.event_study_effects[t]["effect"]
615+
cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"]
616+
cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"]
617+
618+
total_diff_varying += abs(sa_eff - cs_vary_eff)
619+
total_diff_universal += abs(sa_eff - cs_univ_eff)
620+
621+
# CS(universal) should generally be closer to SA than CS(varying)
622+
# for pre-treatment periods (due to similar reference period approach)
623+
# Allow some tolerance since weighting schemes still differ
624+
assert total_diff_universal <= total_diff_varying + 0.5, (
625+
f"Expected CS(universal) to be closer to SA than CS(varying) for pre-periods. "
626+
f"Got: CS(univ)-SA diff={total_diff_universal:.4f}, "
627+
f"CS(vary)-SA diff={total_diff_varying:.4f}"
628+
)
625629

626630
def test_agreement_under_homogeneous_effects(self):
627631
"""Test that SA and CS agree under homogeneous treatment effects."""

0 commit comments

Comments
 (0)