You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Address PR #365 R4 P1 + P3: Case D guard for exact-count placebo strata; non-degenerate test fixture
P1 (Methodology — degenerate exact-count placebo strata):
The Case B / Case C front-door guards rejected ``n_c_h == 0`` and
``n_c_h < n_t_h`` respectively, but allowed ``n_c_h == n_t_h``. For
the stratified-permutation allocator, the per-stratum support is
``C(n_c_h, n_t_h)``: when every treated-containing stratum has
``n_c_h == n_t_h``, the only allocation is to pick all ``n_c_h``
controls as pseudo-treated on every draw. All placebo draws produce
the same pseudo-treated set, the placebo null collapses to a single
point, and SE equals FP noise (~1e-16) from the np.average call
order-dependence. A naïve ``result.se > 0`` check spuriously passes.
Concretely, ``sdid_survey_data`` (stratum 0: 5 treated + 5 controls,
stratum 1: 10 controls, 0 treated) would return SE ≈ 3.79e-16 from
placebo, and the R2/R3-era ``test_full_design_placebo_succeeds``
test was passing only because of that sub-ULP noise — the test
assertion ``result.se > 0`` is satisfied even when the semantic SE
is zero.
Fix: add a Case D fit-time guard that rejects the design when every
treated-containing stratum has exactly ``n_c_h == n_t_h``. At least
one treated stratum must have ``n_c_h > n_t_h`` for the overall
permutation support (``∏_h C(n_c_h, n_t_h)``) to be ≥2.
ValueError message enumerates the per-stratum (n_c, n_t) counts
and points to ``variance_method='bootstrap'`` as the unconstrained
alternative.
Test changes:
* ``test_full_design_placebo_succeeds`` switched from
``sdid_survey_data`` (degenerate exact-count) to
``sdid_survey_data_full_design`` (stratum 0: 5 treated + 10 controls
→ ``C(10, 5) = 252`` distinct allocations). Tightened the SE
assertion from ``> 0`` to ``> 1e-6`` so future regressions back to
sub-ULP-noise SE fail loudly.
* New ``test_placebo_full_design_raises_on_exact_count_stratum``
asserts the Case D ValueError fires on the old
``sdid_survey_data`` fixture (the regression target that surfaced
this issue).
P3 (Documentation — remaining bootstrap-only stragglers):
* ``docs/methodology/survey-theory.md`` §"Estimator survey variance
dispatch" table row for SyntheticDiD still said "Bootstrap only".
Updated to "Bootstrap / permutation / PSU-LOO" with a note that
all three variance methods support full strata/PSU/FPC designs.
* ``tests/test_methodology_sdid.py::TestCoverageMCArtifact``
comment described ``stratified_survey`` as "bootstrap-only —
placebo and jackknife reject strata/PSU/FPC at fit-time". Updated
to reflect current state: bootstrap is the validation gate,
jackknife is reported with anti-conservatism caveat, placebo is
skipped due to DGP-specific Case B (all-treated-stratum packs).
Verification: 90 passed (1 new Case D regression test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments