Address PR #355 R1: weighted λ centering + weights=None survey designs

igerber · claude · igerber · commit 099797e7edef · 2026-04-24T05:45:16.000-04:00
Fixes two P1 issues flagged by the CI reviewer on the initial submission of PR #352. P1 Methodology — `compute_time_weights_survey` was documented as solving the WLS-style weighted λ objective min Σ_u rw_u·(Σ_t λ_t·Y_u,pre[t] - Y_u,post_mean)² + ζ²·||λ||² but row-scaled Y by sqrt(rw) and then handed the scaled matrix to `_sc_weight_fw(intercept=True)`, whose column-centering uses an UNWEIGHTED mean across controls. That is not the weighted objective once rw varies, so non-uniform survey bootstrap draws were refitting λ on the wrong objective and could bias the reported SE. Fix: weighted-center `Y_time` BEFORE the sqrt(rw) row-scaling, using `col_weighted_mean = (Y_time * rw).sum(0) / rw.sum()`, and pass `intercept=False` to the kernel so no additional unweighted centering happens on the scaled matrix. Both two-pass calls updated. `compute_sdid_unit_weights_survey` is unchanged — its column-centering is PER-UNIT (time means within each control column), which is independent of rw. P1 Code Quality — `SurveyDesign(weights=None, strata=..., psu=...)` is a valid configuration (`SurveyDesign.resolve()` synthesizes ones when weights is None), but `_extract_unit_survey_weights` indexed `survey_design.weights` as if it were always a column name, so the groupby would fail with a KeyError before the bootstrap branch could run. Fix: `_extract_unit_survey_weights` now short-circuits to a vector of ones of length `len(unit_order)` when `survey_design.weights is None`, matching `SurveyDesign.resolve()`'s semantics. Regression tests: - `test_non_uniform_rw_beats_unweighted_centering_variant` (test_weighted_fw.py): reproduces the pre-fix buggy variant (row- scale Y by sqrt(rw), then call `_sc_weight_fw(intercept=True)`) and asserts the fixed path's weighted SSR is strictly ≤ the buggy variant's weighted SSR. If a future revert reintroduces intercept=True after the row-scaling, this test fails. - `test_bootstrap_full_design_without_explicit_weights` (test_methodology_sdid.py): `SurveyDesign(strata=..., psu=...)` with no explicit `weights` column now succeeds on the bootstrap path; survey_metadata populated with n_strata / n_psu. P3 Documentation: - `SyntheticDiD.fit()` docstring (survey_design parameter + Raises block): replace "bootstrap rejects all survey designs" language with the PR #352 support-matrix truth-table (bootstrap ✓ for both pweight- only and full design; placebo/jackknife ✓ pweight-only, ✗ full design). - `_placebo_variance_se` fallback-guidance messages (two sites): drop the "strata/PSU/FPC not yet supported by any SDID variance method" framing; recommend bootstrap for full-design survey fallback, jackknife for pweight-only, adding controls as the universal fallback. - `docs/survey-roadmap.md` Current Limitations table: collapse the two SDID bootstrap-rejection rows into a single row for placebo/ jackknife + full design (the bootstrap + full design row no longer applies). Verified: 75 targeted tests pass (test_weighted_fw + TestBootstrapSE + TestScaleEquivariance + TestCoverageMCArtifact + test_survey_phase5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
@@ -1058,7 +1058,12 @@ def _extract_unit_survey_weights(data, unit_col, survey_design, unit_order):
     unit_col : str
         Unit identifier column name.
     survey_design : SurveyDesign
-        Survey design (uses ``weights`` column name).
+        Survey design. When ``survey_design.weights`` is a column name,
+        the weights are pulled from ``data``. When ``survey_design.weights
+        is None`` (a valid configuration — ``SurveyDesign.resolve()`` then
+        synthesizes ones), returns a vector of ones of length
+        ``len(unit_order)`` so downstream estimators can treat all units
+        as having unit survey weight 1.
     unit_order : array-like
         Ordered sequence of unit identifiers to align weights to.
 
@@ -1067,6 +1072,14 @@ def _extract_unit_survey_weights(data, unit_col, survey_design, unit_order):
     np.ndarray
         Float64 array of unit-level weights, one per unit in ``unit_order``.
     """
+    if survey_design.weights is None:
+        # SurveyDesign(weights=None, strata=..., psu=...) is a valid
+        # configuration — the design element supplies clustering /
+        # stratification without explicit per-unit weights. Synthesize
+        # uniform unit weights of 1 to match SurveyDesign.resolve()'s
+        # behavior (which emits ones when weights is None). Without this
+        # branch the groupby below would raise a KeyError on ``None``.
+        return np.ones(len(unit_order), dtype=np.float64)
     unit_w = data.groupby(unit_col)[survey_design.weights].first()
     return np.array([unit_w[u] for u in unit_order], dtype=np.float64)
 
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
@@ -246,15 +246,19 @@ def fit(  # type: ignore[override]
             List of covariate column names. Covariates are residualized
             out before computing the SDID estimator.
         survey_design : SurveyDesign, optional
-            Survey design specification. Only pweight weight_type is supported.
-            ``variance_method='placebo'`` and ``variance_method='jackknife'``
-            accept pweight-only surveys (composed via ``w_control`` /
-            ``w_treated``). ``variance_method='bootstrap'`` rejects all
-            survey designs (including pweight-only) and strata/PSU/FPC are
-            not supported by any variance method on this release —
-            composing Rao-Wu rescaled weights with paper-faithful
-            Frank-Wolfe re-estimation requires a separate derivation
-            (tracked in TODO.md, sketched in REGISTRY.md §SyntheticDiD).
+            Survey design specification. Only pweight weight_type is
+            supported. Support matrix (PR #352):
+
+                method     pweight-only     strata/PSU/FPC
+                bootstrap  ✓ weighted FW    ✓ weighted FW + Rao-Wu
+                placebo    ✓                ✗ NotImplementedError
+                jackknife  ✓                ✗ NotImplementedError
+
+            The bootstrap path composes Rao-Wu rescaled weights per draw
+            with the weighted-Frank-Wolfe kernel; see REGISTRY.md
+            §SyntheticDiD ``Note (survey + bootstrap composition)``.
+            ``placebo`` and ``jackknife`` still reject strata/PSU/FPC
+            (separate methodology gap tracked in TODO.md).
 
         Returns
         -------
@@ -268,9 +272,10 @@ def fit(  # type: ignore[override]
             If required parameters are missing, data validation fails,
             or a non-pweight survey design is provided.
         NotImplementedError
-            If ``survey_design`` is provided with strata/PSU/FPC, or if
-            ``variance_method='bootstrap'`` is provided with any survey
-            design (including pweight-only).
+            If ``survey_design`` with strata/PSU/FPC is provided with
+            ``variance_method='placebo'`` or ``'jackknife'``. Bootstrap
+            + any survey design (pweight-only or full design) is
+            supported via PR #352's weighted-FW + Rao-Wu composition.
         """
         # Validate inputs
         if outcome is None or treatment is None or unit is None or time is None:
@@ -1249,14 +1254,13 @@ def _placebo_variance_se(
         # Ensure we have enough controls for the split
         n_pseudo_control = n_control - n_treated
         if n_pseudo_control < 1:
-            # Bootstrap rejects every survey design in this release, so
-            # steer survey users to jackknife (pweight-only only) or
-            # adding controls. Non-survey users can still fall back to
-            # bootstrap or jackknife.
+            # Fallback guidance. Placebo and jackknife reject strata/PSU/FPC,
+            # but bootstrap (PR #352) supports both pweight-only and
+            # full-design surveys, so it's always a valid fallback.
             fallback = (
-                "variance_method='jackknife' or adding more control units "
-                "(strata/PSU/FPC are not yet supported by any SDID variance "
-                "method)"
+                "variance_method='bootstrap' (supports pweight-only and "
+                "strata/PSU/FPC survey designs), variance_method='jackknife' "
+                "(pweight-only only), or adding more control units"
                 if w_control is not None
                 else "variance_method='bootstrap', variance_method='jackknife', "
                 "or adding more control units"
@@ -1353,13 +1357,14 @@ def _placebo_variance_se(
         n_successful = len(placebo_estimates)
 
         if n_successful < 2:
-            # Same survey-awareness branch as the pre-replication guard
-            # above — bootstrap rejects every survey design in this
-            # release, so suggest jackknife for pweight-only fits.
+            # Same fallback guidance as the pre-replication guard above.
+            # Bootstrap (PR #352) supports pweight-only + strata/PSU/FPC
+            # survey designs, so it's always a valid fallback for survey
+            # users even when placebo fails.
             fallback = (
-                "variance_method='jackknife' or increasing the number of "
-                "control units (strata/PSU/FPC are not yet supported by any "
-                "SDID variance method)"
+                "variance_method='bootstrap' (supports pweight-only and "
+                "strata/PSU/FPC survey designs), variance_method='jackknife' "
+                "(pweight-only only), or increasing the number of control units"
                 if w_control is not None
                 else "variance_method='bootstrap' or variance_method='jackknife' "
                 "or increasing the number of control units"
diff --git a/diff_diff/utils.py b/diff_diff/utils.py
@@ -1979,14 +1979,18 @@ def compute_time_weights_survey(
     Solves the WLS-style time-weight objective (PR #352 §2.2)::
 
         min_{λ on simplex}
-            Σ_u rw_control[u]·(Σ_t λ[t]·Y_pre_control[t,u] - Y_post_mean[u])²
+            Σ_u rw_control[u]·(Σ_t λ[t]·Y_u,pre-centered[t] - Y_u,post_mean-centered)²
             + ζ²·||λ||²
 
     Regularization stays uniform on λ (rw is per-control, λ is per-period —
-    no alignment for per-λ reg weighting). Loss term gets per-row weighting,
-    implemented as a √rw row-scale of the (transposed) Y_time matrix before
-    passing to the unweighted Rust kernel — equivalent to running the
-    standard FW on ``diag(√rw)·Y``.
+    no alignment for per-λ reg weighting). The loss term uses WLS-style
+    row weights; when ``intercept=True``, the column-centering step is
+    *also* survey-weighted (weighted mean across controls, weights
+    ``rw_control``) so the centered loss minimizes
+    ``Σ_u rw_u·(A_u·λ - b_u)²`` on the rw-centered matrix — equivalent
+    to the stated weighted objective. The Rust kernel then sees the
+    weighted-centered + sqrt(rw)-row-scaled matrix with
+    ``intercept=False`` (no additional unweighted centering).
 
     The returned λ is on the standard simplex.
 
@@ -2030,16 +2034,33 @@ def compute_time_weights_survey(
     post_means = np.mean(Y_post_control, axis=0)
     Y_time = np.column_stack([Y_pre_control.T, post_means])  # (N_co, T_pre+1)
 
-    # Row-scale by sqrt(rw): each control unit's contribution to the loss
-    # is weighted by rw_control[u]. Reg on λ stays uniform (no reg_weights).
+    # Column-center the (N_co, T_pre+1) matrix using the SURVEY-WEIGHTED
+    # mean across control units when ``intercept=True``. Plain
+    # ``intercept=True`` inside the FW kernel would use an unweighted
+    # column mean which does not correspond to the stated weighted-loss
+    # objective once ``rw_control`` varies. Perform the weighted
+    # centering here and pass ``intercept=False`` below so the kernel
+    # does not re-center on the row-scaled matrix.
+    rw_sum = float(np.sum(rw_control))
+    if intercept and rw_sum > 0:
+        col_weighted_means = (
+            (Y_time * rw_control[:, np.newaxis]).sum(axis=0) / rw_sum
+        )
+        Y_time = Y_time - col_weighted_means[np.newaxis, :]
+
+    # Row-scale by sqrt(rw): after weighted centering (if any), each
+    # control unit's contribution to the loss is weighted by
+    # ``rw_control[u]`` via the sqrt(rw) row scaling, which reproduces
+    # ``||diag(sqrt(rw))·(A·λ - b)||²`` = ``Σ_u rw_u·(A_u·λ - b_u)²``.
+    # Reg on λ stays uniform (no reg_weights).
     sqrt_rw = np.sqrt(np.maximum(rw_control, 0.0))
     Y_weighted = Y_time * sqrt_rw[:, np.newaxis]
 
     if return_convergence:
         lam, conv1 = _sc_weight_fw(
             Y_weighted,
             zeta=zeta_lambda,
-            intercept=intercept,
+            intercept=False,  # weighted centering already applied above
             init_weights=init_weights,
             min_decrease=min_decrease,
             max_iter=max_iter_pre_sparsify,
@@ -2049,7 +2070,7 @@ def compute_time_weights_survey(
         lam = _sc_weight_fw(
             Y_weighted,
             zeta=zeta_lambda,
-            intercept=intercept,
+            intercept=False,  # weighted centering already applied above
             init_weights=init_weights,
             min_decrease=min_decrease,
             max_iter=max_iter_pre_sparsify,
@@ -2061,7 +2082,7 @@ def compute_time_weights_survey(
         lam, conv2 = _sc_weight_fw(
             Y_weighted,
             zeta=zeta_lambda,
-            intercept=intercept,
+            intercept=False,  # weighted centering already applied above
             init_weights=lam,
             min_decrease=min_decrease,
             max_iter=max_iter,
@@ -2072,7 +2093,7 @@ def compute_time_weights_survey(
     return _sc_weight_fw(
         Y_weighted,
         zeta=zeta_lambda,
-        intercept=intercept,
+        intercept=False,  # weighted centering already applied above
         init_weights=lam,
         min_decrease=min_decrease,
         max_iter=max_iter,
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
@@ -223,8 +223,7 @@ the limitation and suggested alternative.
 
 | Estimator | Limitation | Alternative |
 |-----------|-----------|-------------|
-| SyntheticDiD | Strata/PSU/FPC (any variance method) | No SDID variance option in this release. Pweight-only works with `variance_method='placebo'` or `'jackknife'`. Strata/PSU/FPC + SDID requires composing Rao-Wu rescaled weights with paper-faithful Frank-Wolfe re-estimation; sketch in REGISTRY.md §SyntheticDiD. |
-| SyntheticDiD | `variance_method='bootstrap'` + any survey design (including pweight-only) | Use `variance_method='placebo'` or `'jackknife'` for pweight-only surveys. Refit bootstrap composed with survey weights requires the same weighted-FW derivation noted above. |
+| SyntheticDiD | `variance_method='placebo'` or `'jackknife'` + strata/PSU/FPC | Use `variance_method='bootstrap'` for full-design surveys (PR #352 weighted-FW + Rao-Wu composition). Placebo's control-index permutation and jackknife's LOO allocator need their own weighted derivations on top of the weighted-FW kernel; tracked in TODO.md as a follow-up. |
 | SyntheticDiD | Replicate weights | Pre-existing limitation: no replicate-weight survey support on SDID. |
 | TROP | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap |
 | BaconDecomposition | Replicate weights | Diagnostic only, no inference |
diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py
@@ -671,6 +671,35 @@ def test_bootstrap_summary_shows_replications(self, ci_params):
         assert "Bootstrap replications" in summary
         assert str(n_boot) in summary
 
+    def test_bootstrap_full_design_without_explicit_weights(self):
+        """SurveyDesign(strata=..., psu=..., weights=None) fits successfully.
+
+        Regression for PR #355 R1 code-quality finding: `SurveyDesign` allows
+        `weights=None` (resolve() synthesizes unit weights of 1), but the
+        SDID helper `_extract_unit_survey_weights` used to index
+        `survey_design.weights` directly and would fail before bootstrap
+        could run. The helper now returns ones for this configuration.
+        """
+        from diff_diff.survey import SurveyDesign
+        df = _make_panel(n_control=20, n_treated=3, seed=42)
+        df["stratum"] = df["unit"] % 2
+        df["psu"] = df["unit"]
+        result = SyntheticDiD(
+            variance_method="bootstrap", n_bootstrap=50, seed=1
+        ).fit(
+            df, outcome="outcome", treatment="treated",
+            unit="unit", time="period",
+            post_periods=[5, 6, 7],
+            survey_design=SurveyDesign(strata="stratum", psu="psu"),  # weights=None
+        )
+        assert np.isfinite(result.att)
+        assert np.isfinite(result.se)
+        assert result.se > 0
+        assert result.variance_method == "bootstrap"
+        assert result.survey_metadata is not None
+        assert result.survey_metadata.n_strata is not None
+        assert result.survey_metadata.n_psu is not None
+
     def test_bootstrap_single_psu_returns_nan(self):
         """Unstratified single-PSU survey design returns NaN SE (PR #352).
 
diff --git a/tests/test_weighted_fw.py b/tests/test_weighted_fw.py
@@ -241,6 +241,71 @@ def test_rw_shape_mismatch_raises(self, small_panel):
                 Y_pre_c, Y_post_c, wrong_rw, zeta_lambda=0.05,
             )
 
+    def test_non_uniform_rw_beats_unweighted_centering_variant(self, small_panel):
+        """Non-uniform rw: the weighted-centering solution achieves strictly
+        lower weighted SSR than the (buggy) unweighted-centering variant.
+
+        Verifies the PR #355 R1 fix — weighted centering + intercept=False
+        — actually solves the stated weighted loss
+        ``Σ_u rw_u·(A_u·λ - b_u)²``. Reproduces the unweighted-centering
+        pre-R1 path by hand (row-scale Y by sqrt(rw), then pass
+        intercept=True to the kernel so it centers on unweighted column
+        means) and asserts the correct path's weighted SSR is strictly
+        better. If R1's fix regresses (someone reverts back to
+        intercept=True after row-scaling), this test fails because the
+        two solutions become identical.
+        """
+        Y_pre_c = small_panel["Y_pre_control"]
+        Y_post_c = small_panel["Y_post_control"]
+        n_co = small_panel["n_control"]
+        rng = np.random.default_rng(23)
+        rw = np.where(rng.uniform(size=n_co) < 0.25, 5.0, 0.5)
+
+        # Correct path: what compute_time_weights_survey actually does.
+        lam_correct = compute_time_weights_survey(
+            Y_pre_c, Y_post_c, rw,
+            zeta_lambda=0.05,
+            min_decrease=1e-8,
+            max_iter=10000,
+        )
+
+        # Buggy variant: pre-R1 — row-scale by sqrt(rw) but let the kernel
+        # do UNWEIGHTED centering (intercept=True on the row-scaled matrix).
+        post_means = np.mean(Y_post_c, axis=0)
+        Y_time_raw = np.column_stack([Y_pre_c.T, post_means])
+        sqrt_rw = np.sqrt(np.maximum(rw, 0.0))
+        Y_weighted_unweighted_center = Y_time_raw * sqrt_rw[:, None]
+        lam_buggy = _sc_weight_fw(
+            Y_weighted_unweighted_center, zeta=0.05, intercept=True,
+            min_decrease=1e-8, max_iter=10000,
+        )
+        # Sparsify + refit second pass to match the two-pass shape.
+        from diff_diff.utils import _sparsify
+        lam_buggy = _sparsify(lam_buggy)
+        lam_buggy = _sc_weight_fw(
+            Y_weighted_unweighted_center, zeta=0.05, intercept=True,
+            init_weights=lam_buggy, min_decrease=1e-8, max_iter=10000,
+        )
+
+        # Compute the canonical (weighted-centered) objective on both.
+        wc_mean_pre = (Y_pre_c.T * rw[:, None]).sum(axis=0) / rw.sum()
+        wc_mean_post = (post_means * rw).sum() / rw.sum()
+        A_wc = Y_pre_c.T - wc_mean_pre
+        b_wc = post_means - wc_mean_post
+
+        def weighted_ssr(lam_val: np.ndarray) -> float:
+            resid = A_wc @ lam_val - b_wc
+            return float(np.sum(rw * resid ** 2))
+
+        ssr_correct = weighted_ssr(lam_correct)
+        ssr_buggy = weighted_ssr(lam_buggy)
+        assert ssr_correct <= ssr_buggy + 1e-6, (
+            f"weighted-centering λ (SSR={ssr_correct:.4f}) must achieve at "
+            f"least as low weighted SSR as the unweighted-centering variant "
+            f"(SSR={ssr_buggy:.4f}). PR #355 R1 regression: weighted SSR is "
+            "not being minimized by the survey λ helper."
+        )
+
     def test_zero_rw_subset_handled(self, small_panel):
         """rw with some zeros (Rao-Wu draws units to zero weight) still yields
         a valid simplex λ — the FW just down-weights those rows in the loss.