Skip to content

Commit 66adcdc

Browse files
igerberclaude
andcommitted
Document CS survey weight normalization on unbalanced panels, add test
The pweight normalization preserves relative unit weights on unbalanced panels because all IF/WIF formulas use weight ratios (sw_i/sum(sw)) where the normalization constant cancels. Added REGISTRY note explaining this and an unbalanced panel scale-invariance test confirming it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8d184c9 commit 66adcdc

2 files changed

Lines changed: 41 additions & 1 deletion

File tree

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -416,7 +416,7 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
416416
a base period later than `t` (matching R's `did::att_gt()`)
417417
- Does not require never-treated units: when all units are eventually treated,
418418
not-yet-treated cohorts serve as controls for each other (requires ≥2 cohorts)
419-
- **Note:** CallawaySantAnna survey support: weights-only (strata/PSU/FPC raise NotImplementedError — full design-based SEs via compute_survey_vcov not yet implemented). Regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Bootstrap + survey deferred.
419+
- **Note:** CallawaySantAnna survey support: weights-only (strata/PSU/FPC raise NotImplementedError — full design-based SEs via compute_survey_vcov not yet implemented). Regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Per-unit survey weights are extracted via `groupby(unit).first()` from the panel-normalized pweight array; on unbalanced panels the pweight normalization (`w * n_obs / sum(w)`) preserves relative unit weights since all IF/WIF formulas use weight ratios (`sw_i / sum(sw)`) where the normalization constant cancels. Scale-invariance tests pass on both balanced and unbalanced panels. Bootstrap + survey deferred.
420420
- **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
421421
- **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. Strata/PSU/FPC are rejected at runtime until full design-based SEs via `compute_survey_vcov()` on the combined IF/WIF are implemented.
422422

tests/test_survey_phase4.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1265,6 +1265,46 @@ def test_dr_covariate_survey_nonuniform(self, ddd_survey_data):
12651265
class TestCallawaySantAnnaSurveyInference:
12661266
"""Validate CS survey inference beyond smoke tests."""
12671267

1268+
def test_unbalanced_panel_scale_invariance(self, staggered_survey_data):
1269+
"""Scale invariance should hold on unbalanced panels."""
1270+
data = staggered_survey_data.copy()
1271+
# Drop ~10% of observations to create unbalanced panel
1272+
rng = np.random.default_rng(99)
1273+
keep = rng.random(len(data)) > 0.1
1274+
data = data[keep].copy()
1275+
assert data.groupby("unit")["period"].count().nunique() > 1, "Panel should be unbalanced"
1276+
data["weight2"] = data["weight"] * 2.9
1277+
sd1 = SurveyDesign(weights="weight")
1278+
sd2 = SurveyDesign(weights="weight2")
1279+
import warnings
1280+
1281+
with warnings.catch_warnings():
1282+
warnings.simplefilter("ignore")
1283+
r1 = CallawaySantAnna(estimation_method="reg").fit(
1284+
data,
1285+
"outcome",
1286+
"unit",
1287+
"period",
1288+
"first_treat",
1289+
aggregate="simple",
1290+
survey_design=sd1,
1291+
)
1292+
r2 = CallawaySantAnna(estimation_method="reg").fit(
1293+
data,
1294+
"outcome",
1295+
"unit",
1296+
"period",
1297+
"first_treat",
1298+
aggregate="simple",
1299+
survey_design=sd2,
1300+
)
1301+
assert np.isclose(
1302+
r1.overall_att, r2.overall_att, atol=1e-8
1303+
), "ATT not scale-invariant on unbalanced panel"
1304+
assert np.isclose(
1305+
r1.overall_se, r2.overall_se, atol=1e-8
1306+
), f"SE not scale-invariant on unbalanced panel: {r1.overall_se} vs {r2.overall_se}"
1307+
12681308
def test_se_scale_invariance_all_methods(self, staggered_survey_data):
12691309
"""SE should be invariant under weight rescaling for all methods."""
12701310
data = staggered_survey_data

0 commit comments

Comments
 (0)