Add survey support (pweight + strata/PSU/FPC) to dCDH estimator

igerber · claude · igerber · commit 531013ed8f55 · 2026-04-15T20:49:31.000-04:00
- Weighted cell aggregation in _validate_and_aggregate_to_cells()
- Survey resolution via _resolve_survey_for_fit() with pweight-only
  and group-constant validation
- IF expansion from group to observation level for TSL variance
- Survey-aware SE at all call sites (overall, joiners, leavers,
  multi-horizon, placebos) via _compute_se() dispatcher
- Bootstrap + survey warning (PSU-level deferred)
- 12 new tests in test_survey_dcdh.py
- Documentation updates: REGISTRY.md, ROADMAP.md, llms-full.txt,
  choosing_estimator.rst

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -181,7 +181,7 @@ The dynamic companion paper subsumes the AER 2020 paper: `DID_1 = DID_M`. The si
 
 These are referenced by the dCDH papers but live in *separate* efforts or *separate* companion papers we don't yet have:
 
-- **Survey design integration** — deferred to a separate effort after all three phases ship. Phase 1 documents "no survey support" in the compatibility matrix; the separate effort revisits when Phase 3 is complete.
+- **Survey design integration** — shipped. Supports pweight with strata/PSU/FPC via Taylor Series Linearization. Replicate weights and PSU-level bootstrap deferred.
 - **Fuzzy DiD** (within-cell-varying treatment, Web Appendix Section 1.7 of dynamic paper) → de Chaisemartin & D'Haultfœuille (2018), separate paper not yet reviewed
 - **Principled anticipation handling and trimming rules** (footnote 14 of dynamic paper) → de Chaisemartin (2021), separate paper not yet reviewed
 - **2SLS DiD** (referenced in AER appendix Section 3.4) → separate paper
@@ -195,7 +195,7 @@ These remain in **Future Estimators** below if/when we choose to extend.
 - **Conservative CI** under Assumption 8 (independent groups), exact only under iid sampling. Documented in REGISTRY.md as a `**Note:**` deviation from "default nominal coverage." Theorem 1 of the dynamic paper.
 - **Cohort recentering for variance is essential.** Cohorts are defined by the triple `(D_{g,1}, F_g, S_g)`. The plug-in variance subtracts cohort-conditional means, **NOT a single grand mean**. Test fixtures must catch this — a wrong implementation silently produces a smaller, incorrect variance.
 - **No Rust acceleration is planned for any phase.** The estimator's hot path is groupby + BLAS-accelerated matrix-vector products, where NumPy already operates near-optimally. If profiling on large panels (`G > 100K`) reveals a bottleneck post-ship, the existing `_rust_bootstrap_weights` helper can be reused for the bootstrap loop without writing new Rust code.
-- **No survey design integration in any phase.** Handled as a separate effort after all three phases ship. Phase 1 documents the absence in the compatibility matrix so survey users do not silently apply survey weights and get wrong answers.
+- **Survey design integration shipped.** Supports pweight with strata/PSU/FPC via TSL. Replicate weights and PSU-level bootstrap deferred to a follow-up.
 
 ---
 
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
@@ -911,6 +911,47 @@ def _validate_unit_constant_survey(data, unit_col, survey_design):
                 )
 
 
+def _validate_group_constant_survey(data, group_col, survey_design):
+    """Validate that survey design columns are constant within groups.
+
+    The dCDH estimator aggregates to ``(group, time)`` cells and then
+    works at the group level. Survey columns (weights, strata, PSU)
+    must not vary within groups for the IF expansion and survey variance
+    to be well-defined.
+
+    Parameters
+    ----------
+    data : pd.DataFrame
+        Input data (pre-aggregation).
+    group_col : str
+        Group identifier column name.
+    survey_design : SurveyDesign
+        Survey design specification (uses attribute names, not resolved arrays).
+
+    Raises
+    ------
+    ValueError
+        If any survey column varies within groups.
+    """
+    cols_to_check = [
+        survey_design.weights,
+        survey_design.strata,
+        survey_design.psu,
+        survey_design.fpc,
+    ]
+    for col in cols_to_check:
+        if col is not None and col in data.columns:
+            n_unique = data.groupby(group_col)[col].nunique()
+            varying_groups = n_unique[n_unique > 1]
+            if len(varying_groups) > 0:
+                raise ValueError(
+                    f"Survey column '{col}' varies within groups "
+                    f"(found {len(varying_groups)} groups with multiple values). "
+                    f"dCDH survey support requires survey design columns to be "
+                    f"constant within groups."
+                )
+
+
 def _resolve_pweight_only(resolved_survey, estimator_name):
     """Guard: reject non-pweight and strata/PSU/FPC for pweight-only estimators.
 
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
@@ -293,9 +293,9 @@ Phase 3 will add covariate adjustment.
 
 .. note::
 
-   ``ChaisemartinDHaultfoeuille`` does not yet support ``survey_design``;
-   passing it raises ``NotImplementedError``. Survey integration is
-   deferred to a separate effort after Phases 2 and 3 ship.
+   ``ChaisemartinDHaultfoeuille`` supports ``survey_design`` with pweight
+   and strata/PSU/FPC via Taylor Series Linearization. Replicate weights
+   are not yet supported.
 
 Synthetic DiD
 ~~~~~~~~~~~~~
@@ -726,10 +726,10 @@ estimation. The depth of support varies by estimator:
      - Full
      - Multiplier at PSU
    * - ``ChaisemartinDHaultfoeuille``
+     - pweight only
+     - Full (TSL)
      - --
-     - --
-     - --
-     - --
+     - Group-level (warning)
    * - ``TripleDifference``
      - pweight only
      - Full
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
@@ -265,8 +265,8 @@ est.fit(
     trends_linear: bool | None = None,           # Phase 3: DID^{fd}
     trends_nonparam: Any | None = None,          # Phase 3: DID^s
     honest_did: bool = False,                    # Phase 3: HonestDiD integration
-    # ---- deferred (separate effort) ----
-    survey_design: Any = None,
+    # ---- survey support ----
+    survey_design: SurveyDesign | None = None,    # pweight + strata/PSU/FPC (TSL)
 ) -> ChaisemartinDHaultfoeuilleResults
 ```
 
@@ -322,7 +322,7 @@ print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}")
 - Validated against R `DIDmultiplegtDYN` v2.3.3 at horizon `l = 1` via `tests/test_chaisemartin_dhaultfoeuille_parity.py`
 - Phase 1 placebo SE is intentionally `NaN` with a warning. The dynamic companion paper Section 3.7.3 derives the cohort-recentered analytical variance for `DID_l` only — not for the placebo `DID_M^pl`. Phase 2 will add multiplier-bootstrap support for the placebo. Until then, the placebo point estimate is meaningful but its inference fields stay NaN-consistent **even when `n_bootstrap > 0`** (bootstrap currently covers `DID_M`, `DID_+`, and `DID_-` only)
 - The analytical CI is conservative under Assumption 8 (independent groups) of the dynamic companion paper, exact only under iid sampling
-- Survey design (`survey_design`) is not yet supported and is deferred to a separate effort after all phases ship
+- Survey design supported: pweight with strata/PSU/FPC via Taylor Series Linearization. Replicate weights and PSU-level bootstrap deferred
 
 ### SunAbraham
 
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -627,7 +627,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 
 **Requirements checklist:**
 - [x] Single class `ChaisemartinDHaultfoeuille` (alias `DCDH`); not a family
-- [x] Forward-compat `fit()` signature with `NotImplementedError` gates for remaining parameters (`aggregate`, `survey_design`); Phase 3 gates lifted for `controls`, `trends_linear`, `trends_nonparam`, `honest_did`
+- [x] Forward-compat `fit()` signature with `NotImplementedError` gate for `aggregate`; survey_design now supported (pweight + strata/PSU/FPC via TSL); Phase 3 gates lifted for `controls`, `trends_linear`, `trends_nonparam`, `honest_did`
 - [x] `DID_M` point estimate with cohort-recentered analytical SE
 - [x] Joiners-only `DID_+` and leavers-only `DID_-` decompositions with their own inference
 - [x] Single-lag placebo `DID_M^pl` (point estimate; SE deferred to Phase 2)
@@ -648,6 +648,8 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 - [x] Heterogeneity testing via saturated OLS (Web Appendix Section 1.5, Lemma 7)
 - [x] Design-2 switch-in/switch-out descriptive wrapper (Web Appendix Section 1.6)
 - [x] HonestDiD (Rambachan-Roth 2023) integration on placebo + event study surface
+- [x] Survey design support: pweight with strata/PSU/FPC via Taylor Series Linearization. Replicate weights and PSU-level bootstrap deferred.
+- **Note:** Survey IF expansion (`psi_i = U[g] * (w_i / W_g)`) is a library extension not in the dCDH papers. The paper's plug-in variance assumes iid sampling; the TSL variance accounts for complex survey design by expanding group-level influence functions to observation level proportionally to survey weights, then applying the standard Binder (1983) stratified PSU variance formula.
 
 ---
 
diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py
@@ -385,15 +385,21 @@ def test_honest_did_requires_lmax(self, data):
                 honest_did=True,
             )
 
-    def test_survey_design_raises_not_implemented(self, data):
-        with pytest.raises(NotImplementedError, match="separate effort"):
+    def test_survey_design_rejects_fweight(self, data):
+        """Survey support requires pweight; fweight rejected."""
+        from diff_diff import SurveyDesign
+
+        data = data.copy()
+        data["pw"] = 1.0
+        sd = SurveyDesign(weights="pw", weight_type="fweight")
+        with pytest.raises(ValueError, match="pweight"):
             self._est().fit(
                 data,
                 outcome="outcome",
                 group="group",
                 time="period",
                 treatment="treatment",
-                survey_design=object(),
+                survey_design=sd,
             )
 
     def test_cluster_parameter_raises_not_implemented(self, data):
diff --git a/tests/test_survey_dcdh.py b/tests/test_survey_dcdh.py