Address PR #365 R4 P1 + P3: Case D guard for exact-count placebo strata; non-degenerate test fixture

igerber · claude · igerber · commit f039e2fcf9be · 2026-04-24T18:57:38.000-04:00
P1 (Methodology — degenerate exact-count placebo strata):
The Case B / Case C front-door guards rejected ``n_c_h == 0`` and
``n_c_h &lt; n_t_h`` respectively, but allowed ``n_c_h == n_t_h``. For
the stratified-permutation allocator, the per-stratum support is
``C(n_c_h, n_t_h)``: when every treated-containing stratum has
``n_c_h == n_t_h``, the only allocation is to pick all ``n_c_h``
controls as pseudo-treated on every draw. All placebo draws produce
the same pseudo-treated set, the placebo null collapses to a single
point, and SE equals FP noise (~1e-16) from the np.average call
order-dependence. A naïve ``result.se &gt; 0`` check spuriously passes.

Concretely, ``sdid_survey_data`` (stratum 0: 5 treated + 5 controls,
stratum 1: 10 controls, 0 treated) would return SE ≈ 3.79e-16 from
placebo, and the R2/R3-era ``test_full_design_placebo_succeeds``
test was passing only because of that sub-ULP noise — the test
assertion ``result.se &gt; 0`` is satisfied even when the semantic SE
is zero.

Fix: add a Case D fit-time guard that rejects the design when every
treated-containing stratum has exactly ``n_c_h == n_t_h``. At least
one treated stratum must have ``n_c_h &gt; n_t_h`` for the overall
permutation support (``∏_h C(n_c_h, n_t_h)``) to be ≥2.
ValueError message enumerates the per-stratum (n_c, n_t) counts
and points to ``variance_method='bootstrap'`` as the unconstrained
alternative.

Test changes:
* ``test_full_design_placebo_succeeds`` switched from
  ``sdid_survey_data`` (degenerate exact-count) to
  ``sdid_survey_data_full_design`` (stratum 0: 5 treated + 10 controls
  → ``C(10, 5) = 252`` distinct allocations). Tightened the SE
  assertion from ``&gt; 0`` to ``&gt; 1e-6`` so future regressions back to
  sub-ULP-noise SE fail loudly.
* New ``test_placebo_full_design_raises_on_exact_count_stratum``
  asserts the Case D ValueError fires on the old
  ``sdid_survey_data`` fixture (the regression target that surfaced
  this issue).

P3 (Documentation — remaining bootstrap-only stragglers):
* ``docs/methodology/survey-theory.md`` §"Estimator survey variance
  dispatch" table row for SyntheticDiD still said "Bootstrap only".
  Updated to "Bootstrap / permutation / PSU-LOO" with a note that
  all three variance methods support full strata/PSU/FPC designs.
* ``tests/test_methodology_sdid.py::TestCoverageMCArtifact``
  comment described ``stratified_survey`` as "bootstrap-only —
  placebo and jackknife reject strata/PSU/FPC at fit-time". Updated
  to reflect current state: bootstrap is the validation gate,
  jackknife is reported with anti-conservatism caveat, placebo is
  skipped due to DGP-specific Case B (all-treated-stratum packs).

Verification: 90 passed (1 new Case D regression test).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
@@ -834,6 +834,7 @@ def fit(  # type: ignore[override]
             unique_treated_strata, treated_counts = np.unique(
                 _strata_treated_eff, return_counts=True
             )
+            has_nondegenerate_stratum = False
             for h, n_t_h in zip(unique_treated_strata, treated_counts):
                 n_c_h = int(np.sum(_strata_control_eff == h))
                 if n_c_h == 0:
@@ -859,6 +860,39 @@ def fit(  # type: ignore[override]
                         "same full survey design via weighted-FW + Rao-Wu "
                         "without a permutation-feasibility constraint)."
                     )
+                if n_c_h > int(n_t_h):
+                    has_nondegenerate_stratum = True
+            # Case D: every treated stratum is exact-count
+            # (``n_c_h == n_t_h``). The stratified permutation support
+            # collapses to a single allocation — every placebo draw
+            # reproduces the same pseudo-treated set, giving a degenerate
+            # null (SE ≈ 0 up to FP noise, no meaningful sampling
+            # distribution). Reject at fit-time rather than silently
+            # reporting a near-zero SE; the overall permutation support is
+            # ``∏_h C(n_c_h, n_t_h)``, so at least one treated stratum must
+            # satisfy ``n_c_h > n_t_h`` for the test to have ≥2 distinct
+            # allocations.
+            if not has_nondegenerate_stratum:
+                detail = ", ".join(
+                    f"stratum {h}: n_c={int(np.sum(_strata_control_eff == h))}, "
+                    f"n_t={int(n_t_h)}"
+                    for h, n_t_h in zip(unique_treated_strata, treated_counts)
+                )
+                raise ValueError(
+                    "Stratified-permutation placebo support is degenerate: "
+                    "every treated-containing stratum has exactly "
+                    "n_controls == n_treated, so the within-stratum "
+                    "permutation yields a single allocation across all "
+                    f"draws ({detail}). The resulting placebo distribution "
+                    "collapses to one point and SE is not a meaningful "
+                    "null estimate. At least one treated stratum must "
+                    "have n_controls > n_treated for the permutation to "
+                    "have ≥2 distinct allocations. Either rebalance the "
+                    "panel, or use variance_method='bootstrap' (which "
+                    "supports the same full survey design via weighted-FW "
+                    "+ Rao-Wu without a permutation-feasibility "
+                    "constraint)."
+                )
 
         # Compute standard errors on normalized Y, rescale to original units.
         # Variance procedures resample / permute indices (independent of Y
diff --git a/docs/methodology/survey-theory.md b/docs/methodology/survey-theory.md
@@ -700,7 +700,7 @@ Each estimator uses one of three variance strategies under survey designs:
 | EfficientDiD | TSL on EIFs | all weight types |
 | ContinuousDiD | TSL sandwich | all weight types |
 | StackedDiD | TSL sandwich | pweight only |
-| SyntheticDiD | Bootstrap only | Not IF-amenable (Section 4.2a) |
+| SyntheticDiD | Bootstrap / permutation / PSU-LOO | Not IF-amenable (Section 4.2a); all three variance methods support full strata/PSU/FPC designs |
 | TROP | Bootstrap only | Not IF-amenable (Section 4.2a) |
 | BaconDecomposition | Diagnostic only | Weighted descriptives, no inference |
 
diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py
@@ -3497,9 +3497,12 @@ def test_coverage_artifacts_present(self):
                         f"missing alpha {alpha_key} in {dgp}/{method} rejection_rate"
                     )
 
-        # PR #352: stratified_survey is bootstrap-only — placebo and
-        # jackknife reject strata/PSU/FPC at fit-time, so their blocks
-        # report n_successful_fits=0. Bootstrap must have the full 500
+        # Post-PR #365: stratified_survey runs bootstrap (validation
+        # gate) + jackknife (anti-conservative but reported for
+        # transparency); placebo is skipped on this DGP because its
+        # cohort packs all treated into stratum 1 which has 0 never-
+        # treated units (Case B at fit-time), so its block reports
+        # n_successful_fits=0. Bootstrap must have the full 500
         # successful fits + finite rejection rate at α=0.05 inside the
         # calibration gate [0.02, 0.10].
         survey_block = payload["per_dgp"]["stratified_survey"]
diff --git a/tests/test_survey_phase5.py b/tests/test_survey_phase5.py
@@ -208,27 +208,36 @@ def test_full_design_bootstrap_succeeds(self, sdid_survey_data, survey_design_fu
         assert "Survey Design" in summary
         assert "Bootstrap replications" in summary
 
-    def test_full_design_placebo_succeeds(self, sdid_survey_data, survey_design_full):
+    def test_full_design_placebo_succeeds(self, sdid_survey_data_full_design):
         """Placebo variance with full design now succeeds (restored capability).
 
         Stratified-permutation allocator draws pseudo-treated indices
         within each stratum containing treated units; weighted-FW
-        re-estimates ω and λ per draw on the pseudo-panel. See REGISTRY
-        §SyntheticDiD "Note (survey + placebo composition)".
+        re-estimates ω and λ per draw on the pseudo-panel. Uses the
+        non-degenerate full-design fixture (stratum 0 has 5 treated +
+        10 controls, so the within-stratum permutation has ``C(10, 5) =
+        252`` distinct allocations — SE reflects a genuine null
+        distribution, not FP noise from a single-allocation collapse).
+        See REGISTRY §SyntheticDiD "Note (survey + placebo composition)".
         """
+        sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
         est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42)
         result = est.fit(
-            sdid_survey_data,
+            sdid_survey_data_full_design,
             outcome="outcome",
             treatment="treated",
             unit="unit",
             time="time",
             post_periods=[6, 7, 8, 9],
-            survey_design=survey_design_full,
+            survey_design=sd,
         )
         assert np.isfinite(result.att)
         assert np.isfinite(result.se)
-        assert result.se > 0
+        # SE must be materially positive, not sub-ULP FP noise from a
+        # degenerate single-allocation permutation (R4 P1 regression —
+        # the prior fixture had n_c == n_t in stratum 0, yielding
+        # SE ≈ 1e-16; the Case D guard below rejects that shape).
+        assert result.se > 1e-6
         assert result.variance_method == "placebo"
         assert result.survey_metadata is not None
         assert result.survey_metadata.n_strata is not None
@@ -791,6 +800,36 @@ def test_placebo_full_design_raises_on_zero_control_stratum(
                 survey_design=sd,
             )
 
+    def test_placebo_full_design_raises_on_exact_count_stratum(
+        self, sdid_survey_data, survey_design_full
+    ):
+        """R4 P1 fix: Case D — every treated stratum has n_c == n_t.
+
+        The ``sdid_survey_data`` fixture has 5 treated units + 5 controls
+        in stratum 0 and 10 controls in stratum 1 (with no treated
+        units). For placebo stratified permutation, the pseudo-treated
+        set within stratum 0 is chosen from 5 controls, sized 5 — only
+        one allocation is possible. Every placebo draw reproduces the
+        same pseudo-treated set, the placebo null collapses to a
+        single point, and SE = FP noise (~1e-16). The new Case D guard
+        rejects this design at fit-time rather than silently reporting
+        a near-zero SE that would pass a naïve ``result.se > 0`` check.
+        """
+        est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42)
+        with pytest.raises(
+            ValueError,
+            match=r"permutation yields a single allocation across all draws",
+        ):
+            est.fit(
+                sdid_survey_data,
+                outcome="outcome",
+                treatment="treated",
+                unit="unit",
+                time="time",
+                post_periods=[6, 7, 8, 9],
+                survey_design=survey_design_full,
+            )
+
     def test_placebo_full_design_raises_on_undersupplied_stratum(
         self, sdid_survey_data_full_design
     ):