Skip to content

Commit 00476e2

Browse files
igerberclaude
andcommitted
Reject strata/PSU/FPC for CallawaySantAnna, accept weights-only survey
CallawaySantAnna per-cell and aggregation SEs use IF-based variance which does not incorporate the full survey design structure (strata/PSU/FPC). Rather than silently accepting a full design that doesn't affect SE magnitude, reject it at runtime with NotImplementedError. - Guard: strata/PSU/FPC in SurveyDesign raises NotImplementedError - REGISTRY.md: updated to reflect weights-only support, removed overstated claim about design effects entering via WIF - Roadmap: added CS strata/PSU/FPC, covariates+IPW/DR, and efficient DRDID IF to Phase 5 deferred work - TODO.md: updated CS entry to reflect weights-only constraint - Tests: updated CS tests to use weights-only, added strata/PSU/FPC rejection test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b816013 commit 00476e2

5 files changed

Lines changed: 45 additions & 65 deletions

File tree

TODO.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Deferred items from PR reviews that were not addressed before merge.
5252
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
5353
| EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
5454
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
55-
| CallawaySantAnna per-cell ATT(g,t) SEs under survey use influence-function variance, not full design-based TSL with strata/PSU/FPC. Design effects enter at aggregation via WIF and survey df. Full per-cell TSL would require constructing unit-level influence functions on the global index and passing through `compute_survey_vcov()`. | `staggered.py` | #233 | Medium |
55+
| CallawaySantAnna survey: strata/PSU/FPC rejected at runtime. Full design-based SEs require routing the combined IF/WIF through `compute_survey_vcov()`. Currently weights-only. | `staggered.py` | #233 | Medium |
5656
| CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
5757
| EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium |
5858
| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |

diff_diff/staggered.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1197,6 +1197,18 @@ def fit(
11971197
f"got '{resolved_survey.weight_type}'. The survey variance math "
11981198
f"assumes probability weights (pweight)."
11991199
)
1200+
if (
1201+
resolved_survey.strata is not None
1202+
or resolved_survey.psu is not None
1203+
or resolved_survey.fpc is not None
1204+
):
1205+
raise NotImplementedError(
1206+
"CallawaySantAnna does not yet support strata/PSU/FPC in "
1207+
"SurveyDesign. Per-cell and aggregation SEs use IF-based "
1208+
"variance which does not incorporate the full survey design "
1209+
"structure. Use SurveyDesign(weights=...) only. Full "
1210+
"design-based SEs via compute_survey_vcov() are planned."
1211+
)
12001212

12011213
# Guard bootstrap + survey
12021214
if self.n_bootstrap > 0 and resolved_survey is not None:

docs/methodology/REGISTRY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -416,9 +416,9 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
416416
a base period later than `t` (matching R's `did::att_gt()`)
417417
- Does not require never-treated units: when all units are eventually treated,
418418
not-yet-treated cohorts serve as controls for each other (requires ≥2 cohorts)
419-
- **Note:** CallawaySantAnna survey weights: regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR+survey raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Bootstrap + survey deferred.
419+
- **Note:** CallawaySantAnna survey support: weights-only (strata/PSU/FPC raise NotImplementedError — full design-based SEs via compute_survey_vcov not yet implemented). Regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Bootstrap + survey deferred.
420420
- **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
421-
- **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization with strata/PSU/FPC structure. Consequently, specifying strata/PSU/FPC in SurveyDesign does NOT change per-cell SE magnitudes — it only affects aggregation-level SEs (via the WIF), survey degrees of freedom (for t-distribution p-values/CIs), and reported metadata. This is consistent with R's approach where per-cell SEs are influence-function-based and design effects enter at the aggregation stage.
421+
- **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. Strata/PSU/FPC are rejected at runtime until full design-based SEs via `compute_survey_vcov()` on the combined IF/WIF are implemented.
422422

423423
**Reference implementation(s):**
424424
- R: `did::att_gt()` (Callaway & Sant'Anna's official package)

docs/survey-roadmap.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ message pointing to the planned phase or describing the limitation.
4646
|-----------|------|----------------|-------|
4747
| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap+survey deferred |
4848
| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap+survey deferred |
49-
| CallawaySantAnna | `staggered.py` | Analytical | Survey-weighted regression (all cases), IPW and DR (no-covariate only); survey-weighted WIF in aggregation; covariates+IPW/DR deferred (needs DRDID nuisance IF); bootstrap+survey deferred |
49+
| CallawaySantAnna | `staggered.py` | Weights-only | Weights-only SurveyDesign (strata/PSU/FPC rejected); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; full design SEs, covariates+IPW/DR, and bootstrap+survey deferred |
5050

5151
**Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights
5252
enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked
@@ -59,6 +59,9 @@ TripleDifference IPW/DR from Phase 3 deferred work.
5959
| ImputationDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction |
6060
| TwoStageDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction |
6161
| CallawaySantAnna | Bootstrap + survey | Phase 5: bootstrap+survey interaction |
62+
| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | Phase 5: route combined IF/WIF through `compute_survey_vcov()` for design-based aggregation SEs |
63+
| CallawaySantAnna | Covariates + IPW/DR + survey | Phase 5: DRDID panel nuisance IF corrections |
64+
| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Phase 5: replace conservative plug-in IF with semiparametrically efficient IF |
6265

6366
### Remaining for Phase 5
6467

tests/test_survey_phase4.py

Lines changed: 26 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -717,54 +717,34 @@ def test_uniform_weights_match_unweighted(self, staggered_survey_data):
717717
)
718718
assert abs(r_unw.overall_att - r_w.overall_att) < 1e-8, f"method={method}: ATT mismatch"
719719

720-
def test_survey_metadata_fields(self, staggered_survey_data, survey_design_full):
721-
"""survey_metadata has correct fields with full design."""
720+
def test_survey_metadata_fields(self, staggered_survey_data, survey_design_weights_only):
721+
"""survey_metadata has correct fields with weights-only design."""
722722
result = CallawaySantAnna(estimation_method="reg").fit(
723723
staggered_survey_data,
724724
"outcome",
725725
"unit",
726726
"period",
727727
"first_treat",
728-
survey_design=survey_design_full,
728+
survey_design=survey_design_weights_only,
729729
)
730730
sm = result.survey_metadata
731731
assert sm is not None
732732
assert sm.weight_type == "pweight"
733733
assert sm.effective_n > 0
734734
assert sm.design_effect > 0
735-
assert sm.n_strata is not None
736-
assert sm.n_psu is not None
737735

738-
def test_se_differs_with_design(self, staggered_survey_data):
739-
"""Weights-only vs full design: same ATT, different inference via survey df."""
740-
sd_w = SurveyDesign(weights="weight")
736+
def test_strata_psu_fpc_raises(self, staggered_survey_data):
737+
"""Strata/PSU/FPC should raise NotImplementedError."""
741738
sd_full = SurveyDesign(weights="weight", strata="stratum", psu="psu")
742-
743-
r_w = CallawaySantAnna(estimation_method="reg").fit(
744-
staggered_survey_data,
745-
"outcome",
746-
"unit",
747-
"period",
748-
"first_treat",
749-
survey_design=sd_w,
750-
)
751-
r_full = CallawaySantAnna(estimation_method="reg").fit(
752-
staggered_survey_data,
753-
"outcome",
754-
"unit",
755-
"period",
756-
"first_treat",
757-
survey_design=sd_full,
758-
)
759-
# ATTs should be the same (same weights)
760-
assert abs(r_w.overall_att - r_full.overall_att) < 1e-10
761-
# Full design should carry survey df (strata/PSU structure)
762-
assert r_full.survey_metadata is not None
763-
assert r_full.survey_metadata.n_strata is not None
764-
assert r_full.survey_metadata.n_psu is not None
765-
# P-values should differ due to t-distribution with survey df
766-
if np.isfinite(r_w.overall_p_value) and np.isfinite(r_full.overall_p_value):
767-
assert r_w.overall_p_value != r_full.overall_p_value
739+
with pytest.raises(NotImplementedError, match="strata/PSU/FPC"):
740+
CallawaySantAnna(estimation_method="reg").fit(
741+
staggered_survey_data,
742+
"outcome",
743+
"unit",
744+
"period",
745+
"first_treat",
746+
survey_design=sd_full,
747+
)
768748

769749
def test_bootstrap_survey_raises(self, staggered_survey_data, survey_design_weights_only):
770750
"""Bootstrap + survey should raise NotImplementedError."""
@@ -1197,34 +1177,19 @@ def test_survey_weights_change_per_cell_att(self, staggered_survey_data):
11971177
effects_no, effects_sv, atol=1e-6
11981178
), f"{method}: survey weights should change per-cell ATT"
11991179

1200-
def test_survey_df_affects_pvalues(self, staggered_survey_data):
1201-
"""Survey df (from strata/PSU) should affect p-values via t-distribution."""
1202-
data = staggered_survey_data
1203-
sd_weights = SurveyDesign(weights="weight")
1180+
def test_strata_psu_fpc_raises_inference(self, staggered_survey_data):
1181+
"""Strata/PSU/FPC raises NotImplementedError in inference context."""
12041182
sd_full = SurveyDesign(weights="weight", strata="stratum", psu="psu")
1205-
est = CallawaySantAnna(estimation_method="reg")
1206-
r_w = est.fit(
1207-
data,
1208-
outcome="outcome",
1209-
unit="unit",
1210-
time="period",
1211-
first_treat="first_treat",
1212-
aggregate="simple",
1213-
survey_design=sd_weights,
1214-
)
1215-
r_f = est.fit(
1216-
data,
1217-
outcome="outcome",
1218-
unit="unit",
1219-
time="period",
1220-
first_treat="first_treat",
1221-
aggregate="simple",
1222-
survey_design=sd_full,
1223-
)
1224-
# ATT should be same (same weights), but p-values differ (different df)
1225-
assert np.isclose(r_w.overall_att, r_f.overall_att, atol=1e-8)
1226-
# Survey df from strata/PSU should change inference
1227-
assert r_f.survey_metadata.df_survey is not None
1183+
with pytest.raises(NotImplementedError, match="strata/PSU/FPC"):
1184+
CallawaySantAnna(estimation_method="reg").fit(
1185+
staggered_survey_data,
1186+
"outcome",
1187+
"unit",
1188+
"period",
1189+
"first_treat",
1190+
aggregate="simple",
1191+
survey_design=sd_full,
1192+
)
12281193

12291194

12301195
# =============================================================================

0 commit comments

Comments
 (0)