Address PR #365 R5 P1 + P3: zero-variance vs NaN; lonely_psu contract; REGISTRY docs

igerber · claude · igerber · commit ffd2e50bc16b · 2026-04-24T19:10:59.000-04:00
P1 (Methodology — zero computed variance conflated with undefined):
``_jackknife_se_survey`` previously collapsed ``total_variance &lt;= 0.0``
into ``SE=NaN`` with an "every stratum was skipped" warning. That is
correct for the "no stratum contributed" branch (undefined per Rust &amp;
Rao) but wrong for legitimate zero-variance outcomes: full-census FPC
(``fpc[h] == n_h`` → ``f_h = 1`` → ``(1 - f_h) = 0`` zeros every
stratum contribution even when within-stratum dispersion is non-zero)
and exact-zero within-stratum dispersion both give
``total_variance = 0`` by construction, not by "undefined".

Fix: split the terminal branch. Return ``SE=NaN`` only when no stratum
contributed; otherwise return ``SE = sqrt(max(total_variance, 0.0))``.
The ``max(..., 0.0)`` protects against sub-FP-epsilon negatives and
preserves the legitimate zero case at bit precision.

New regression
``test_jackknife_full_design_full_census_fpc_returns_zero_se``:
fits on ``sdid_survey_data_jk_well_formed`` with ``fpc=3`` (n_h=3
per stratum → f_h=1 → zero SE by design). Asserts
``result.se == 0.0`` (not NaN).

P1 (Methodology — lonely_psu silently ignored on jackknife path):
The full-design jackknife always skipped singleton strata (``n_h &lt;
2``) unconditionally, regardless of the user's
``SurveyDesign(lonely_psu=...)`` choice. ``"certainty"`` and
``"adjust"`` were silently degraded to ``"remove"``, which understates
SE when the user intended ``"certainty"`` (equivalent to skip on
jackknife) or flips what should be a zero-variance certainty case
into NaN otherwise.

Fix: validate ``resolved_survey_unit.lonely_psu`` at fit-time on the
survey jackknife path. ``"remove"`` and ``"certainty"`` are both
accepted (they produce the same SE on this path — singleton strata
contribute 0 variance under both, matching canonical Rust &amp; Rao /
``survey::svyjkn`` behavior for JKn). ``"adjust"`` (R's overall-mean
fallback for singleton strata) is rejected with
``NotImplementedError`` and a targeted message pointing to bootstrap
as the unconstrained alternative.

Two regressions:
* ``test_jackknife_full_design_lonely_psu_adjust_raises`` —
  verifies the rejection message.
* ``test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove``
  — asserts ``SE_remove == SE_certainty`` at ``rel=1e-14`` on the
  well-formed fixture.

P3 (Documentation — REGISTRY lag):
* Placebo feasibility Notes documented Cases B and C but missed Case D
  (the exact-count degeneracy guard added in R4). Split the "Fit-time
  feasibility guards" paragraph into an explicit 3-case enumeration
  (B: zero-control-stratum; C: undersupplied stratum; D: all-exact-
  count strata → single allocation).
* ``get_loo_effects_df()`` description still said "Requires
  variance_method='jackknife'; raises ValueError otherwise." after R2
  taught it to also raise ``NotImplementedError`` on PSU-level survey
  jackknife. Rewrote to distinguish unit-level (available) vs PSU-
  level (blocked, with pointer to ``result.placebo_effects``).
* Added a Zero-variance-vs-undefined distinction paragraph and a
  "lonely_psu contract" paragraph to the jackknife survey Note,
  matching the shipped behavior from the two P1 fixes above.

Verification: 93 passed (3 new regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
@@ -935,6 +935,31 @@ def fit(  # type: ignore[override]
             if _jackknife_use_survey_path:
                 # PSU-level LOO + stratum aggregation (Rust & Rao 1996).
                 assert w_control is not None and w_treated is not None
+                # R5 P1 fix: validate ``lonely_psu`` mode. The survey
+                # jackknife currently skips singleton strata (n_h < 2)
+                # unconditionally — equivalent to R ``survey::svyjkn``'s
+                # ``"remove"`` and ``"certainty"`` modes (both zero-
+                # contribution for singleton strata). ``"adjust"`` (use
+                # overall mean for singleton strata) is not implemented
+                # for SDID jackknife; reject upfront rather than silently
+                # treating it as ``"remove"``.
+                _lonely_psu_mode = getattr(
+                    resolved_survey_unit, "lonely_psu", "remove"
+                )
+                if _lonely_psu_mode not in ("remove", "certainty"):
+                    raise NotImplementedError(
+                        f"SurveyDesign(lonely_psu={_lonely_psu_mode!r}) is "
+                        "not supported on the SDID jackknife survey path. "
+                        "'remove' and 'certainty' are equivalent here "
+                        "(both contribute 0 variance for singleton strata, "
+                        "which is the canonical Rust & Rao 1996 behavior). "
+                        "'adjust' requires an overall-mean fallback per "
+                        "stratum that is not yet implemented for SDID "
+                        "jackknife; use variance_method='bootstrap' (which "
+                        "supports all three ``lonely_psu`` modes via the "
+                        "weighted-FW + Rao-Wu path) or switch the design "
+                        "to lonely_psu='remove'."
+                    )
                 # Unstratified designs use the synthesized single stratum
                 # (``_strata_*_eff``) so the loop reduces to classical
                 # JK1 (single-stratum PSU-LOO).
@@ -2431,7 +2456,7 @@ def _jackknife_se_survey(
                 stacklevel=3,
             )
             return np.nan, tau_loo_arr
-        if not any_stratum_contributed or total_variance <= 0.0:
+        if not any_stratum_contributed:
             warnings.warn(
                 "Jackknife survey SE is undefined because every stratum "
                 "was skipped (insufficient PSUs per stratum for variance "
@@ -2443,7 +2468,15 @@ def _jackknife_se_survey(
             )
             return np.nan, tau_loo_arr
 
-        return float(np.sqrt(total_variance)), tau_loo_arr
+        # R5 P1 fix: legitimate zero variance (e.g., full-census FPC with
+        # f_h = 1 for every contributing stratum → (1 - f_h) = 0 factor
+        # zeros the contribution even when within-stratum dispersion is
+        # non-zero; or exact-zero within-stratum dispersion when all
+        # LOOs produce identical τ̂). Rust & Rao gives V_J = 0, not
+        # undefined. Reserve NaN for the "all strata skipped" /
+        # undefined-replicate cases above; compute SE = 0 otherwise.
+        variance_nonneg = max(total_variance, 0.0)
+        return float(np.sqrt(variance_nonneg)), tau_loo_arr
 
     def get_params(self) -> Dict[str, Any]:
         """Get estimator parameters."""
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -1585,11 +1585,16 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
   3. Weighted Frank-Wolfe re-estimates ω and λ on the pseudo-panel using `compute_sdid_unit_weights_survey(rw_control=w_control[pseudo_control_idx], ...)` and `compute_time_weights_survey(...)`. Post-optimization composition `ω_eff = rw·ω/Σ(rw·ω)` with zero-mass retry.
   4. SDID estimator on the pseudo-panel; Algorithm 4 SE `sqrt((r-1)/r)·std(placebo_estimates, ddof=1)`.
 
-  **Fit-time feasibility guards** (per `feedback_front_door_over_retry_swallow.md`): for each stratum `h` containing treated units, require `n_controls_h >= n_treated_h`. Case B (`n_controls_h == 0`) and Case C (`0 < n_controls_h < n_treated_h`) both raise `ValueError` with distinct targeted messages *before* entering the retry loop. Partial-permutation fallback is rejected — it would silently change the null distribution and produce an incoherent test.
+  **Fit-time feasibility guards** (per `feedback_front_door_over_retry_swallow.md`): three distinct failure cases are rejected *before* entering the retry loop, each with a targeted `ValueError`:
+  * **Case B** (`n_controls_h == 0` for some treated-containing stratum): the stratum has treated units but no controls — no pseudo-treated set can be drawn.
+  * **Case C** (`0 < n_controls_h < n_treated_h`): the stratum has fewer controls than treated units, so exact-count without-replacement sampling is impossible.
+  * **Case D** (`n_controls_h == n_treated_h` for *every* treated stratum): the permutation support is `∏_h C(n_c_h, n_t_h) = 1` — only one allocation is possible, every placebo draw reproduces the same pseudo-treated set, and the null distribution collapses to a single point (SE = FP noise ~1e-16). At least one treated stratum must satisfy `n_c_h > n_t_h` for the test to have ≥2 distinct allocations.
+
+  Partial-permutation fallback is rejected for all three cases — it would silently change the null distribution and produce an incoherent test.
 
   **Scope note — what is NOT randomized:** the stratum marginal is preserved exactly by construction (each draw pulls the same count per treated stratum). The PSU axis is not randomized (permutation is unit-level within strata). This is conservative under clustering (ignores within-stratum PSU correlation in the null) but aligns with the classical stratified permutation test literature. See Pesarin (2001) *Multivariate Permutation Tests*, Ch. 3-4; Pesarin & Salmaso (2010) *Permutation Tests for Complex Data*.
 
-  **Validation:** no external R/Julia parity anchor (neither package defines survey-weighted SDID placebo). Correctness rests on: (a) stratum-membership contract enforced by construction + monkeypatch regression test, (b) Case B/C front-door guards with targeted-message regression tests, (c) SE-differs-from-pweight-only cross-surface sanity, (d) deterministic-dispatch regression.
+  **Validation:** no external R/Julia parity anchor (neither package defines survey-weighted SDID placebo). Correctness rests on: (a) stratum-membership contract enforced by construction + monkeypatch regression test, (b) Case B / Case C / Case D front-door guards with targeted-message regression tests, (c) SE-differs-from-pweight-only cross-surface sanity, (d) deterministic-dispatch regression.
 
 - **Note (survey + jackknife composition):** PSU-level leave-one-out with stratum aggregation (Rust & Rao 1996). For a design with strata `h = 1..H` and PSUs `j = 1..n_h` within each stratum:
 
@@ -1603,6 +1608,10 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
 
   **Undefined-replicate handling** (return NaN, do NOT silently skip): the Rust & Rao formula requires `τ̂_{(h,j)}` be defined for every PSU `j` in every contributing stratum. If any single LOO in a contributing stratum (`n_h ≥ 2`) is not computable — (a) deletion removes all treated units (e.g., all treated in one PSU), (b) `ω_eff_kept.sum() ≤ 0` after composition, (c) `w_treated_kept.sum() ≤ 0`, (d) the SDID estimator raises or returns non-finite τ̂ — the overall SE is **undefined** and the method returns `SE=NaN` with a targeted `UserWarning` naming the stratum / PSU / reason. Silently skipping the missing LOO while still applying the `(n_h-1)/n_h` factor would systematically under-scale variance (silently wrong SE). Users needing a variance estimator that accommodates PSU-deletion infeasibility should use `variance_method="bootstrap"`, whose pairs-bootstrap has no per-LOO feasibility constraint.
 
+  **Zero-variance vs undefined distinction:** when every stratum contributes but `total_variance == 0.0` by legitimate design — full-census FPC (`f_h = 1` → `(1 - f_h) = 0` zeros the contribution even when within-stratum dispersion is non-zero) or exact-zero within-stratum dispersion — the jackknife SE is **zero**, not undefined. `_jackknife_se_survey` returns `SE = 0.0` in that case. `SE = NaN` is reserved for the truly-undefined cases documented above (all strata skipped; any undefined delete-one replicate).
+
+  **`lonely_psu` contract:** `SurveyDesign(lonely_psu="remove")` (default) and `"certainty"` are both accepted — each treats singleton strata (`n_h < 2`) as contributing 0 to the total variance, matching the canonical Rust & Rao (1996) / R `survey::svyjkn` behavior for single-PSU strata. `lonely_psu="adjust"` (R's overall-mean fallback) is **not yet supported** on the SDID jackknife path and raises `NotImplementedError` at fit-time; users needing that semantic should pick `variance_method="bootstrap"` (which supports all three modes via the weighted-FW + Rao-Wu path) or switch the design to `"remove"` / `"certainty"`.
+
   **Stratum-skip handling** (silent, documented): strata with `n_h < 2` are silently skipped (stratum-level variance unidentified — the `lonely-PSU` case in R `survey::svyjkn`). If every stratum is skipped, returns `SE=NaN` with a separate `UserWarning`. PSU-None designs: each unit is treated as its own PSU within its stratum (matches the implicit-PSU convention established in PR #355 R8 P1). Unstratified single-PSU short-circuits to `SE=NaN`.
 
   **Scope note — what is NOT randomized:** stratum membership and PSU composition are fixed by design. The formula only captures within-stratum variation; between-stratum variance is absorbed into the analytical-TSL / design assumption. This is canonical survey-jackknife behavior (Rust & Rao 1996) and matches R's `survey::svyjkn` under stratified designs.
@@ -1644,7 +1653,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
 *Validation diagnostics (post-fit methods on `SyntheticDiDResults`):*
 
 - **Trajectories** (`synthetic_pre_trajectory`, `synthetic_post_trajectory`, `treated_pre_trajectory`, `treated_post_trajectory`): retained on results to support plotting and custom fit metrics. `synthetic_pre_trajectory = Y_pre_control @ ω_eff`; `treated_pre_trajectory` is the survey-weighted treated mean (matches the Frank-Wolfe target). `pre_treatment_fit` is recoverable as `RMSE(treated_pre_trajectory, synthetic_pre_trajectory)`.
-- **`get_loo_effects_df()`**: user-facing join of the jackknife leave-one-out pseudo-values (stored in `placebo_effects`) to the underlying unit identities. First `n_control` positions map to `control_unit_ids`, next `n_treated` to `treated_unit_ids` — positional ordering that mirrors `_jackknife_se`. `att_loo` is NaN when the zero-sum composed-weight guard fired for that unit; `delta_from_full = att_loo - att`. Requires `variance_method='jackknife'`; raises `ValueError` otherwise.
+- **`get_loo_effects_df()`**: user-facing join of the jackknife leave-one-out pseudo-values (stored in `placebo_effects`) to the underlying unit identities. **Unit-level LOO only** — available on the non-survey and pweight-only jackknife paths (classical Algorithm 3: one LOO per unit, first `n_control` positions map to `control_unit_ids`, next `n_treated` to `treated_unit_ids`; `att_loo` is NaN when the zero-sum composed-weight guard fired for that unit; `delta_from_full = att_loo - att`). Under the full-design survey jackknife path (PSU-level LOO with stratum aggregation, Rust & Rao 1996), the underlying replicates are PSU-level rather than unit-level — the accessor raises `NotImplementedError` pointing to `result.placebo_effects` for the raw PSU-level replicate array. Dispatch is gated by an explicit `_loo_granularity` flag set at fit-time (`"unit"` vs `"psu"`). Requires `variance_method='jackknife'`; raises `ValueError` otherwise.
 - **`get_weight_concentration(top_k=5)`**: returns `effective_n = 1/Σω²` (inverse Herfindahl), `herfindahl`, `top_k_share`, `top_k`. Operates on `self.unit_weights` which stores the composed `ω_eff`; for survey-weighted fits the metrics reflect the population-weighted concentration, not the raw Frank-Wolfe solution.
 - **`in_time_placebo(fake_treatment_periods=None, zeta_omega_override=None, zeta_lambda_override=None)`**: re-slices the pre-window at each fake treatment period and re-fits both ω and λ via Frank-Wolfe. Default sweeps every feasible pre-period (position index `i ≥ 2` so ≥2 pre-fake periods remain for weight estimation, `i ≤ n_pre - 1` so ≥1 post-fake period exists). Credible designs produce near-zero placebo ATTs; departures indicate pre-treatment dynamics the estimator is picking up.
   - **Note:** Regularization reuses `self.zeta_omega` / `self.zeta_lambda` from the original fit (matches R `synthdid` convention of treating regularization as a property of the fit). `*_override` re-fits with new values.
diff --git a/tests/test_survey_phase5.py b/tests/test_survey_phase5.py
@@ -1161,6 +1161,117 @@ def test_get_loo_effects_df_works_on_pweight_only_jackknife(
         assert set(df.columns) == {"unit", "role", "att_loo", "delta_from_full"}
         assert set(df["role"].unique()) <= {"control", "treated"}
 
+    def test_jackknife_full_design_full_census_fpc_returns_zero_se(
+        self, sdid_survey_data_jk_well_formed
+    ):
+        """R5 P1 fix: full-census FPC → SE=0, not NaN.
+
+        Rust & Rao's stratified jackknife formula has an explicit
+        ``(1 - f_h)`` factor. When ``fpc[h] == n_h`` for every
+        contributing stratum, ``f_h = 1``, ``(1 - f_h) = 0``, and every
+        stratum contribution is zero → ``total_variance = 0`` by
+        legitimate design, not by "every stratum skipped". The correct
+        jackknife SE in that case is **zero** (full census: no sampling
+        variance), not NaN. Reserve NaN for the truly-undefined cases
+        (all strata skipped, undefined PSU-LOO replicate).
+        """
+        df = sdid_survey_data_jk_well_formed.copy()
+        # Each stratum has n_h=3 PSUs. Setting fpc=3 gives f_h=1 and
+        # (1 - f_h) = 0 — the formula collapses the stratum contribution
+        # to zero for legitimate design reasons.
+        df["fpc_full_census"] = 3.0
+
+        sd = SurveyDesign(
+            weights="weight",
+            strata="stratum",
+            psu="psu",
+            fpc="fpc_full_census",
+        )
+        est = SyntheticDiD(variance_method="jackknife", seed=42)
+        result = est.fit(
+            df,
+            outcome="outcome",
+            treatment="treated",
+            unit="unit",
+            time="time",
+            post_periods=[6, 7, 8, 9],
+            survey_design=sd,
+        )
+        # SE must be exactly zero (legitimate full-census no-sampling
+        # variance), not NaN (undefined) and not a tiny positive number.
+        assert np.isfinite(result.se)
+        assert result.se == 0.0
+
+    def test_jackknife_full_design_lonely_psu_adjust_raises(
+        self, sdid_survey_data_jk_well_formed
+    ):
+        """R5 P1 fix: ``SurveyDesign(lonely_psu='adjust')`` on the jackknife
+        survey path raises NotImplementedError rather than silently being
+        treated as ``"remove"``.
+
+        ``"remove"`` and ``"certainty"`` both contribute 0 variance for
+        singleton strata on the jackknife path, matching canonical R
+        ``survey::svyjkn`` behavior. ``"adjust"`` requires an overall-
+        mean fallback per stratum that is not yet implemented; rejecting
+        upfront prevents silent variance miscomputation.
+        """
+        sd = SurveyDesign(
+            weights="weight",
+            strata="stratum",
+            psu="psu",
+            lonely_psu="adjust",
+        )
+        est = SyntheticDiD(variance_method="jackknife", seed=42)
+        with pytest.raises(
+            NotImplementedError,
+            match=r"lonely_psu='adjust'.*not supported on the SDID jackknife",
+        ):
+            est.fit(
+                sdid_survey_data_jk_well_formed,
+                outcome="outcome",
+                treatment="treated",
+                unit="unit",
+                time="time",
+                post_periods=[6, 7, 8, 9],
+                survey_design=sd,
+            )
+
+    def test_jackknife_full_design_lonely_psu_certainty_equivalent_to_remove(
+        self, sdid_survey_data_jk_well_formed
+    ):
+        """``lonely_psu='certainty'`` is accepted and produces the same SE
+        as ``lonely_psu='remove'`` (both contribute 0 for singleton
+        strata on the jackknife path).
+        """
+        sd_remove = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", lonely_psu="remove"
+        )
+        sd_certainty = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", lonely_psu="certainty"
+        )
+
+        est1 = SyntheticDiD(variance_method="jackknife", seed=42)
+        result_remove = est1.fit(
+            sdid_survey_data_jk_well_formed,
+            outcome="outcome",
+            treatment="treated",
+            unit="unit",
+            time="time",
+            post_periods=[6, 7, 8, 9],
+            survey_design=sd_remove,
+        )
+        est2 = SyntheticDiD(variance_method="jackknife", seed=42)
+        result_certainty = est2.fit(
+            sdid_survey_data_jk_well_formed,
+            outcome="outcome",
+            treatment="treated",
+            unit="unit",
+            time="time",
+            post_periods=[6, 7, 8, 9],
+            survey_design=sd_certainty,
+        )
+        assert result_remove.se == pytest.approx(result_certainty.se, rel=1e-14)
+
     def test_jackknife_full_design_undefined_replicate_returns_nan(
         self, sdid_survey_data_full_design
     ):