Document CS survey weight normalization on unbalanced panels, add test

igerber · claude · igerber · commit 66adcdcef78a · 2026-03-23T13:38:32.000-04:00
The pweight normalization preserves relative unit weights on unbalanced
panels because all IF/WIF formulas use weight ratios (sw_i/sum(sw)) where
the normalization constant cancels. Added REGISTRY note explaining this
and an unbalanced panel scale-invariance test confirming it.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -416,7 +416,7 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
     a base period later than `t` (matching R's `did::att_gt()`)
   - Does not require never-treated units: when all units are eventually treated,
     not-yet-treated cohorts serve as controls for each other (requires ≥2 cohorts)
-- **Note:** CallawaySantAnna survey support: weights-only (strata/PSU/FPC raise NotImplementedError — full design-based SEs via compute_survey_vcov not yet implemented). Regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Bootstrap + survey deferred.
+- **Note:** CallawaySantAnna survey support: weights-only (strata/PSU/FPC raise NotImplementedError — full design-based SEs via compute_survey_vcov not yet implemented). Regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Per-unit survey weights are extracted via `groupby(unit).first()` from the panel-normalized pweight array; on unbalanced panels the pweight normalization (`w * n_obs / sum(w)`) preserves relative unit weights since all IF/WIF formulas use weight ratios (`sw_i / sum(sw)`) where the normalization constant cancels. Scale-invariance tests pass on both balanced and unbalanced panels. Bootstrap + survey deferred.
 - **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
 - **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. Strata/PSU/FPC are rejected at runtime until full design-based SEs via `compute_survey_vcov()` on the combined IF/WIF are implemented.
 
diff --git a/tests/test_survey_phase4.py b/tests/test_survey_phase4.py
@@ -1265,6 +1265,46 @@ def test_dr_covariate_survey_nonuniform(self, ddd_survey_data):
 class TestCallawaySantAnnaSurveyInference:
     """Validate CS survey inference beyond smoke tests."""
 
+    def test_unbalanced_panel_scale_invariance(self, staggered_survey_data):
+        """Scale invariance should hold on unbalanced panels."""
+        data = staggered_survey_data.copy()
+        # Drop ~10% of observations to create unbalanced panel
+        rng = np.random.default_rng(99)
+        keep = rng.random(len(data)) > 0.1
+        data = data[keep].copy()
+        assert data.groupby("unit")["period"].count().nunique() > 1, "Panel should be unbalanced"
+        data["weight2"] = data["weight"] * 2.9
+        sd1 = SurveyDesign(weights="weight")
+        sd2 = SurveyDesign(weights="weight2")
+        import warnings
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            r1 = CallawaySantAnna(estimation_method="reg").fit(
+                data,
+                "outcome",
+                "unit",
+                "period",
+                "first_treat",
+                aggregate="simple",
+                survey_design=sd1,
+            )
+            r2 = CallawaySantAnna(estimation_method="reg").fit(
+                data,
+                "outcome",
+                "unit",
+                "period",
+                "first_treat",
+                aggregate="simple",
+                survey_design=sd2,
+            )
+        assert np.isclose(
+            r1.overall_att, r2.overall_att, atol=1e-8
+        ), "ATT not scale-invariant on unbalanced panel"
+        assert np.isclose(
+            r1.overall_se, r2.overall_se, atol=1e-8
+        ), f"SE not scale-invariant on unbalanced panel: {r1.overall_se} vs {r2.overall_se}"
+
     def test_se_scale_invariance_all_methods(self, staggered_survey_data):
         """SE should be invariant under weight rescaling for all methods."""
         data = staggered_survey_data