Skip to content

Commit db377e6

Browse files
igerberclaude
andcommitted
Loosen TestScaleEquivariance[placebo] SE tolerance to rel=1e-7
The placebo SE warm-start landed in PR #369 threads ``unit_weights`` (a fit-time FW output that carries sub-ULP BLAS reduction-order divergence) into each per-draw FW init. Across 200 placebo draws with path-dependent sparsification in the 100-iter pre-sparsify pass, that ULP-level input difference accumulates to ~1e-9 SE divergence between Apple Accelerate (macOS) and OpenBLAS (Linux). No single double satisfies both at the prior ``1e-12`` gate. The placebo row's SE assertion is loosened to ``rel=1e-7`` (drift detector, not bit-identity). Bootstrap and jackknife stay at ``rel=1e-14``: bootstrap dilutes the divergence by resampling from the full unit set with replacement; jackknife uses fixed weights and no FW re-estimation. Bit-identity protection for placebo moves to ``test_placebo_se_matches_r`` (``TestJackknifeSERParity``), which uses the ``_placebo_indices`` test seam to feed R's exact permutations through the same normalized inputs the dispatcher would, bypassing the platform-divergent fit-time path. That test asserts both aggregate SE (< 1e-8 vs R) and per-draw τ (< 1e-8 elementwise vs R), which is strictly stronger than the prior ``1e-12`` capture-vs-capture gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent fd96d08 commit db377e6

1 file changed

Lines changed: 19 additions & 13 deletions

File tree

tests/test_methodology_sdid.py

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2995,20 +2995,19 @@ class TestScaleEquivariance:
29952995
# weights$omega[ind[1:N0_placebo]])`` warm-start. Drift from
29962996
# the cold-start capture (0.29385822261006445) is the same
29972997
# finite-iter convergence-pattern shift as the bootstrap warm-
2998-
# start landed in PR #349 — strict-convexity guarantees the
2999-
# converged answer is unique, but the 100-iter pre-sparsify
3000-
# pass produces different sparsification under uniform vs warm
3001-
# init on a handful of draws. Warm-start matches R at machine
2998+
# start landed in PR #349. Warm-start matches R at machine
30022999
# precision (test_placebo_se_matches_r in
3003-
# TestJackknifeSERParity). Capture: Linux/OpenBLAS (CI runner) —
3004-
# warm-start carries ``unit_weights`` (fit-time FW output) into
3005-
# per-draw init, which is platform-divergent at sub-ULP from
3006-
# BLAS reduction order; across 200 draws with path-dependent
3007-
# sparsification the SE diverges ~1e-9 between Apple Accelerate
3000+
# TestJackknifeSERParity). Captured value is platform-divergent
3001+
# at sub-ULP — warm-start carries ``unit_weights`` (fit-time
3002+
# FW output) into per-draw init, and BLAS reduction-order
3003+
# differences accumulate across 200 draws with path-dependent
3004+
# sparsification to ~1e-9 SE divergence between Apple Accelerate
30083005
# (macOS local: 0.293840360160448) and OpenBLAS (Linux CI:
3009-
# 0.2938403592163006). Linux value pinned because CI is the
3010-
# gating surface; macOS local fits will drift at ~1e-9 — that
3011-
# delta is finite-iter FW path-dependence, not a numerical bug.
3006+
# 0.2938403592163006). The placebo row's SE assertion is
3007+
# therefore loosened to ``rel=1e-7`` below; bit-identity
3008+
# protection moves to ``test_placebo_se_matches_r``, which
3009+
# bypasses the platform-divergent fit-time path through a test
3010+
# seam. Pinned value is the Linux/OpenBLAS capture.
30123011
"placebo": (4.603349837478791, 0.2938403592163006, 0.004975124378109453, 200),
30133012
# bootstrap = paper-faithful refit with R-default warm-start: FW is
30143013
# initialized with ``sum_normalize(unit_weights[boot_control_idx])``
@@ -3065,7 +3064,14 @@ def test_baseline_parity_small_scale(self, variance_method):
30653064
warnings.simplefilter("ignore", UserWarning)
30663065
r = self._fit(data, variance_method)
30673066
assert r.att == pytest.approx(att0, rel=1e-14)
3068-
assert r.se == pytest.approx(se0, rel=1e-14)
3067+
# Placebo SE is warm-start-driven (PR #369) and the per-draw FW
3068+
# init carries platform-divergent ``unit_weights``; ~1e-9 macOS-
3069+
# vs-Linux drift on 200 draws makes bit-identity unattainable.
3070+
# Bit-identity protection moves to ``test_placebo_se_matches_r``
3071+
# (TestJackknifeSERParity). Bootstrap and jackknife stay at the
3072+
# original gate; their paths don't accumulate the same way.
3073+
se_rel = 1e-7 if variance_method == "placebo" else 1e-14
3074+
assert r.se == pytest.approx(se0, rel=se_rel)
30693075
assert r.p_value == pytest.approx(p0, rel=1e-14)
30703076
assert len(r.placebo_effects) == n0
30713077

0 commit comments

Comments
 (0)