Skip to content

Commit f039e2f

Browse files
igerberclaude
andcommitted
Address PR #365 R4 P1 + P3: Case D guard for exact-count placebo strata; non-degenerate test fixture
P1 (Methodology — degenerate exact-count placebo strata): The Case B / Case C front-door guards rejected ``n_c_h == 0`` and ``n_c_h < n_t_h`` respectively, but allowed ``n_c_h == n_t_h``. For the stratified-permutation allocator, the per-stratum support is ``C(n_c_h, n_t_h)``: when every treated-containing stratum has ``n_c_h == n_t_h``, the only allocation is to pick all ``n_c_h`` controls as pseudo-treated on every draw. All placebo draws produce the same pseudo-treated set, the placebo null collapses to a single point, and SE equals FP noise (~1e-16) from the np.average call order-dependence. A naïve ``result.se > 0`` check spuriously passes. Concretely, ``sdid_survey_data`` (stratum 0: 5 treated + 5 controls, stratum 1: 10 controls, 0 treated) would return SE ≈ 3.79e-16 from placebo, and the R2/R3-era ``test_full_design_placebo_succeeds`` test was passing only because of that sub-ULP noise — the test assertion ``result.se > 0`` is satisfied even when the semantic SE is zero. Fix: add a Case D fit-time guard that rejects the design when every treated-containing stratum has exactly ``n_c_h == n_t_h``. At least one treated stratum must have ``n_c_h > n_t_h`` for the overall permutation support (``∏_h C(n_c_h, n_t_h)``) to be ≥2. ValueError message enumerates the per-stratum (n_c, n_t) counts and points to ``variance_method='bootstrap'`` as the unconstrained alternative. Test changes: * ``test_full_design_placebo_succeeds`` switched from ``sdid_survey_data`` (degenerate exact-count) to ``sdid_survey_data_full_design`` (stratum 0: 5 treated + 10 controls → ``C(10, 5) = 252`` distinct allocations). Tightened the SE assertion from ``> 0`` to ``> 1e-6`` so future regressions back to sub-ULP-noise SE fail loudly. * New ``test_placebo_full_design_raises_on_exact_count_stratum`` asserts the Case D ValueError fires on the old ``sdid_survey_data`` fixture (the regression target that surfaced this issue). P3 (Documentation — remaining bootstrap-only stragglers): * ``docs/methodology/survey-theory.md`` §"Estimator survey variance dispatch" table row for SyntheticDiD still said "Bootstrap only". Updated to "Bootstrap / permutation / PSU-LOO" with a note that all three variance methods support full strata/PSU/FPC designs. * ``tests/test_methodology_sdid.py::TestCoverageMCArtifact`` comment described ``stratified_survey`` as "bootstrap-only — placebo and jackknife reject strata/PSU/FPC at fit-time". Updated to reflect current state: bootstrap is the validation gate, jackknife is reported with anti-conservatism caveat, placebo is skipped due to DGP-specific Case B (all-treated-stratum packs). Verification: 90 passed (1 new Case D regression test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 473c6d7 commit f039e2f

4 files changed

Lines changed: 86 additions & 10 deletions

File tree

diff_diff/synthetic_did.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -834,6 +834,7 @@ def fit( # type: ignore[override]
834834
unique_treated_strata, treated_counts = np.unique(
835835
_strata_treated_eff, return_counts=True
836836
)
837+
has_nondegenerate_stratum = False
837838
for h, n_t_h in zip(unique_treated_strata, treated_counts):
838839
n_c_h = int(np.sum(_strata_control_eff == h))
839840
if n_c_h == 0:
@@ -859,6 +860,39 @@ def fit( # type: ignore[override]
859860
"same full survey design via weighted-FW + Rao-Wu "
860861
"without a permutation-feasibility constraint)."
861862
)
863+
if n_c_h > int(n_t_h):
864+
has_nondegenerate_stratum = True
865+
# Case D: every treated stratum is exact-count
866+
# (``n_c_h == n_t_h``). The stratified permutation support
867+
# collapses to a single allocation — every placebo draw
868+
# reproduces the same pseudo-treated set, giving a degenerate
869+
# null (SE ≈ 0 up to FP noise, no meaningful sampling
870+
# distribution). Reject at fit-time rather than silently
871+
# reporting a near-zero SE; the overall permutation support is
872+
# ``∏_h C(n_c_h, n_t_h)``, so at least one treated stratum must
873+
# satisfy ``n_c_h > n_t_h`` for the test to have ≥2 distinct
874+
# allocations.
875+
if not has_nondegenerate_stratum:
876+
detail = ", ".join(
877+
f"stratum {h}: n_c={int(np.sum(_strata_control_eff == h))}, "
878+
f"n_t={int(n_t_h)}"
879+
for h, n_t_h in zip(unique_treated_strata, treated_counts)
880+
)
881+
raise ValueError(
882+
"Stratified-permutation placebo support is degenerate: "
883+
"every treated-containing stratum has exactly "
884+
"n_controls == n_treated, so the within-stratum "
885+
"permutation yields a single allocation across all "
886+
f"draws ({detail}). The resulting placebo distribution "
887+
"collapses to one point and SE is not a meaningful "
888+
"null estimate. At least one treated stratum must "
889+
"have n_controls > n_treated for the permutation to "
890+
"have ≥2 distinct allocations. Either rebalance the "
891+
"panel, or use variance_method='bootstrap' (which "
892+
"supports the same full survey design via weighted-FW "
893+
"+ Rao-Wu without a permutation-feasibility "
894+
"constraint)."
895+
)
862896

863897
# Compute standard errors on normalized Y, rescale to original units.
864898
# Variance procedures resample / permute indices (independent of Y

docs/methodology/survey-theory.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -700,7 +700,7 @@ Each estimator uses one of three variance strategies under survey designs:
700700
| EfficientDiD | TSL on EIFs | all weight types |
701701
| ContinuousDiD | TSL sandwich | all weight types |
702702
| StackedDiD | TSL sandwich | pweight only |
703-
| SyntheticDiD | Bootstrap only | Not IF-amenable (Section 4.2a) |
703+
| SyntheticDiD | Bootstrap / permutation / PSU-LOO | Not IF-amenable (Section 4.2a); all three variance methods support full strata/PSU/FPC designs |
704704
| TROP | Bootstrap only | Not IF-amenable (Section 4.2a) |
705705
| BaconDecomposition | Diagnostic only | Weighted descriptives, no inference |
706706

tests/test_methodology_sdid.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3497,9 +3497,12 @@ def test_coverage_artifacts_present(self):
34973497
f"missing alpha {alpha_key} in {dgp}/{method} rejection_rate"
34983498
)
34993499

3500-
# PR #352: stratified_survey is bootstrap-only — placebo and
3501-
# jackknife reject strata/PSU/FPC at fit-time, so their blocks
3502-
# report n_successful_fits=0. Bootstrap must have the full 500
3500+
# Post-PR #365: stratified_survey runs bootstrap (validation
3501+
# gate) + jackknife (anti-conservative but reported for
3502+
# transparency); placebo is skipped on this DGP because its
3503+
# cohort packs all treated into stratum 1 which has 0 never-
3504+
# treated units (Case B at fit-time), so its block reports
3505+
# n_successful_fits=0. Bootstrap must have the full 500
35033506
# successful fits + finite rejection rate at α=0.05 inside the
35043507
# calibration gate [0.02, 0.10].
35053508
survey_block = payload["per_dgp"]["stratified_survey"]

tests/test_survey_phase5.py

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -208,27 +208,36 @@ def test_full_design_bootstrap_succeeds(self, sdid_survey_data, survey_design_fu
208208
assert "Survey Design" in summary
209209
assert "Bootstrap replications" in summary
210210

211-
def test_full_design_placebo_succeeds(self, sdid_survey_data, survey_design_full):
211+
def test_full_design_placebo_succeeds(self, sdid_survey_data_full_design):
212212
"""Placebo variance with full design now succeeds (restored capability).
213213
214214
Stratified-permutation allocator draws pseudo-treated indices
215215
within each stratum containing treated units; weighted-FW
216-
re-estimates ω and λ per draw on the pseudo-panel. See REGISTRY
217-
§SyntheticDiD "Note (survey + placebo composition)".
216+
re-estimates ω and λ per draw on the pseudo-panel. Uses the
217+
non-degenerate full-design fixture (stratum 0 has 5 treated +
218+
10 controls, so the within-stratum permutation has ``C(10, 5) =
219+
252`` distinct allocations — SE reflects a genuine null
220+
distribution, not FP noise from a single-allocation collapse).
221+
See REGISTRY §SyntheticDiD "Note (survey + placebo composition)".
218222
"""
223+
sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
219224
est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42)
220225
result = est.fit(
221-
sdid_survey_data,
226+
sdid_survey_data_full_design,
222227
outcome="outcome",
223228
treatment="treated",
224229
unit="unit",
225230
time="time",
226231
post_periods=[6, 7, 8, 9],
227-
survey_design=survey_design_full,
232+
survey_design=sd,
228233
)
229234
assert np.isfinite(result.att)
230235
assert np.isfinite(result.se)
231-
assert result.se > 0
236+
# SE must be materially positive, not sub-ULP FP noise from a
237+
# degenerate single-allocation permutation (R4 P1 regression —
238+
# the prior fixture had n_c == n_t in stratum 0, yielding
239+
# SE ≈ 1e-16; the Case D guard below rejects that shape).
240+
assert result.se > 1e-6
232241
assert result.variance_method == "placebo"
233242
assert result.survey_metadata is not None
234243
assert result.survey_metadata.n_strata is not None
@@ -791,6 +800,36 @@ def test_placebo_full_design_raises_on_zero_control_stratum(
791800
survey_design=sd,
792801
)
793802

803+
def test_placebo_full_design_raises_on_exact_count_stratum(
804+
self, sdid_survey_data, survey_design_full
805+
):
806+
"""R4 P1 fix: Case D — every treated stratum has n_c == n_t.
807+
808+
The ``sdid_survey_data`` fixture has 5 treated units + 5 controls
809+
in stratum 0 and 10 controls in stratum 1 (with no treated
810+
units). For placebo stratified permutation, the pseudo-treated
811+
set within stratum 0 is chosen from 5 controls, sized 5 — only
812+
one allocation is possible. Every placebo draw reproduces the
813+
same pseudo-treated set, the placebo null collapses to a
814+
single point, and SE = FP noise (~1e-16). The new Case D guard
815+
rejects this design at fit-time rather than silently reporting
816+
a near-zero SE that would pass a naïve ``result.se > 0`` check.
817+
"""
818+
est = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=42)
819+
with pytest.raises(
820+
ValueError,
821+
match=r"permutation yields a single allocation across all draws",
822+
):
823+
est.fit(
824+
sdid_survey_data,
825+
outcome="outcome",
826+
treatment="treated",
827+
unit="unit",
828+
time="time",
829+
post_periods=[6, 7, 8, 9],
830+
survey_design=survey_design_full,
831+
)
832+
794833
def test_placebo_full_design_raises_on_undersupplied_stratum(
795834
self, sdid_survey_data_full_design
796835
):

0 commit comments

Comments
 (0)