Skip to content

Commit 5799d42

Browse files
igerberclaude
andcommitted
Address CI AI review: boundary check, stage guards, docstring refresh
P1 #1 (methodology): mse_optimal_bandwidth now rejects boundary > d.min() with a clear ValueError. The Phase 1b wrapper is scoped to the HAD lower-boundary case (Design 1' with d_0 = 0 or Design 1 continuous-near- d_lower with d_0 = min D_2). Interior or upper-boundary inputs would silently run the boundary selector with a symmetric kernel and return a bandwidth incompatible with the one-sided fitter. The port remains available for interior / broader surface via _nprobust_port.lpbwselect_mse_dpi. P1 #2 (code quality): lprobust_bw validates in-window observation counts at each of the three local-poly fits before calling qrXXinv: - variance: n_V >= o+1 - B1: n_B1 >= o_B+1 - B2: n_B2 >= o_B+2 Each guard raises a targeted ValueError naming the failing stage, the bandwidth, and suggested remediation. Previously these failed with opaque LinAlgError from Cholesky on under-determined designs. P3 (doc): local_linear.py module docstring updated to say Phase 1b "ships" instead of "will add"; tiny-sample test now asserts the new ValueError contract instead of accepting any non-IndexError failure. New behavioral tests: - test_interior_boundary_rejected: boundary=0.5 on U(0,1) rejected - test_upper_boundary_rejected: boundary=d.max() rejected - test_boundary_equal_to_min_d_accepted: boundary=min(d) accepted (Design 1 continuous-near-d_lower path) - test_boundary_below_min_d_accepted: boundary=0 with d.min()>0 accepted (Design 1' path) - test_bwcheck_none_on_tiny_sample_raises_valueerror: upgraded from "catch anything non-IndexError" to pytest.raises(ValueError, match="lprobust_bw"). 153 tests pass (up from 149). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c125e7c commit 5799d42

3 files changed

Lines changed: 108 additions & 16 deletions

File tree

diff_diff/_nprobust_port.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,15 @@ def lprobust_bw(
371371
``(V, B1, B2, R, bw)``. Called four times from ``lpbwselect_mse_dpi``.
372372
373373
Parameters match the R signature argument-for-argument.
374+
375+
Raises
376+
------
377+
ValueError
378+
If any of the three local-polynomial fits has fewer in-window
379+
observations than its required column count. Catches opaque
380+
``LinAlgError`` failures from downstream Cholesky inversion in
381+
tiny-sample or mispositioned-boundary settings and surfaces a
382+
targeted error naming the failing stage.
374383
"""
375384
N = X.shape[0]
376385
eC: Optional[np.ndarray] = None
@@ -383,6 +392,14 @@ def lprobust_bw(
383392
eX = X[ind_V]
384393
eW = w[ind_V]
385394
n_V = int(np.sum(ind_V))
395+
if n_V < o + 1:
396+
raise ValueError(
397+
f"lprobust_bw: variance stage has n_V={n_V} in-window "
398+
f"observations at h_V={h_V:.6g} (boundary={c}, kernel={kernel!r}), "
399+
f"but needs at least o+1={o + 1}. Increase sample size, choose "
400+
f"a valid lower boundary with sufficient data to the right, "
401+
f"or disable bwcheck=None-driven narrow windows."
402+
)
386403
# Design matrix R.V in R; rename to R_V to avoid Python builtin conflict.
387404
R_V = np.empty((n_V, o + 1), dtype=np.float64)
388405
for j in range(o + 1):
@@ -443,6 +460,13 @@ def lprobust_bw(
443460
eX1 = X[ind1]
444461
eW1 = w1[ind1]
445462
n_B1 = int(np.sum(ind1))
463+
if n_B1 < o_B + 1:
464+
raise ValueError(
465+
f"lprobust_bw: B1 stage has n_B1={n_B1} in-window observations "
466+
f"at h_B1={h_B1:.6g} (boundary={c}, kernel={kernel!r}), but "
467+
f"needs at least o_B+1={o_B + 1}. Increase sample size or "
468+
f"widen the pilot bandwidth."
469+
)
446470
R_B1 = np.empty((n_B1, o_B + 1), dtype=np.float64)
447471
for j in range(o_B + 1):
448472
R_B1[:, j] = (eX1 - c) ** j
@@ -493,6 +517,13 @@ def lprobust_bw(
493517
eX2 = X[ind2]
494518
eW2 = w2[ind2]
495519
n_B2 = int(np.sum(ind2))
520+
if n_B2 < o_B + 2:
521+
raise ValueError(
522+
f"lprobust_bw: B2 stage has n_B2={n_B2} in-window observations "
523+
f"at h_B2={h_B2:.6g} (boundary={c}, kernel={kernel!r}), but "
524+
f"needs at least o_B+2={o_B + 2}. Increase sample size or "
525+
f"widen the pilot bandwidth."
526+
)
496527
R_B2 = np.empty((n_B2, o_B + 2), dtype=np.float64)
497528
for j in range(o_B + 2):
498529
R_B2[:, j] = (eX2 - c) ** j

diff_diff/local_linear.py

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,13 @@
1313
the conditional mean ``m(d0) := E[Y | D = d0]`` at the boundary of ``D``'s
1414
support via kernel-weighted OLS.
1515
16-
This module is used by future :class:`HeterogeneousAdoptionDiD` phases:
16+
This module is used by the :class:`HeterogeneousAdoptionDiD` phases:
1717
1818
- Phase 1a ships the kernels and fitter (this module).
19-
- Phase 1b will add an MSE-optimal bandwidth selector (Calonico-Cattaneo-Farrell
20-
2018) built on top of the fitter.
19+
- Phase 1b ships the MSE-optimal bandwidth selector
20+
``mse_optimal_bandwidth`` / ``BandwidthResult`` (this module), a thin
21+
wrapper over the Calonico-Cattaneo-Farrell (2018) plug-in selector
22+
ported in ``diff_diff/_nprobust_port.py``.
2123
- Phase 1c will add the bias-corrected confidence interval per Equation 8 of
2224
de Chaisemartin, Ciccia, D'Haultfoeuille & Knau (2026, arXiv:2405.04465v6).
2325
@@ -603,6 +605,34 @@ def mse_optimal_bandwidth(
603605
if not np.isfinite(boundary):
604606
raise ValueError(f"boundary must be finite; got {boundary}")
605607

608+
# Boundary-applicability check (Phase 1b scope).
609+
# The exported wrapper is scoped to the HAD LOWER-boundary case:
610+
# - Design 1' (d_0 = 0 with support infimum at 0), or
611+
# - Design 1 continuous-near-d_lower (d_0 = d_lower = min(D_2)).
612+
# In both cases ``boundary`` must lie at or below the sample minimum:
613+
# no observation should fall to the left of the evaluation point,
614+
# because the downstream ``local_linear_fit`` uses a one-sided kernel
615+
# restricted to ``d >= d_0``. An interior or upper-boundary
616+
# evaluation would silently run the boundary selector branch with a
617+
# symmetric kernel (see port's ``kernel_W``) and produce a bandwidth
618+
# incompatible with the fitter. Reject those inputs here until an
619+
# interior-point API is documented (Phase 2+).
620+
d_min = float(d.min())
621+
# Allow a small numerical tolerance so boundary = min(d) passes even
622+
# under floating-point noise (e.g. ``d.min()`` equals ``boundary``
623+
# within 1e-12 but not bit-exact).
624+
_boundary_tol = 1e-12 * max(1.0, abs(d_min), abs(boundary))
625+
if boundary > d_min + _boundary_tol:
626+
raise ValueError(
627+
f"boundary={boundary!r} exceeds the sample minimum "
628+
f"d.min()={d_min!r}. The Phase 1b wrapper is scoped to the "
629+
f"HAD lower-boundary case (d_0 <= min(d)). For interior or "
630+
f"upper-boundary evaluation, use "
631+
f"diff_diff._nprobust_port.lpbwselect_mse_dpi directly with "
632+
f"interior=True; note that the non-boundary branches are not "
633+
f"separately parity-tested against nprobust."
634+
)
635+
606636
# Defer heavy import to call time to avoid import-cycle risk.
607637
from diff_diff._nprobust_port import lpbwselect_mse_dpi
608638

tests/test_bandwidth_selector.py

Lines changed: 44 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -253,24 +253,55 @@ def test_public_wrapper_fixes_vce_nn_nnmatch_3(self):
253253
assert via_wrapper.b_mse == via_port_nn.b_mse_dpi
254254
assert via_wrapper.c_bw == via_port_nn.c_bw
255255

256-
def test_bwcheck_none_on_tiny_sample_does_not_index_error(self):
257-
"""bwcheck=None should skip the NN floor.
258-
259-
The selector may still fail inside lprobust.bw on very small
260-
samples (rank-deficient design), but any failure must be a
261-
linear-algebra error, not an indexing crash in the bwcheck
262-
clipping logic.
263-
"""
256+
def test_bwcheck_none_on_tiny_sample_raises_valueerror(self):
257+
"""bwcheck=None on a tiny sample must raise a clear ValueError
258+
from the per-stage support/rank guard in lprobust_bw, NOT an
259+
opaque LinAlgError or IndexError."""
264260
from diff_diff._nprobust_port import lpbwselect_mse_dpi
265261

266262
d = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
267263
y = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
268-
try:
264+
with pytest.raises(ValueError, match="lprobust_bw"):
269265
lpbwselect_mse_dpi(y, d, eval_point=0.0, bwcheck=None)
270-
except IndexError as exc: # pragma: no cover - regression sentinel
271-
pytest.fail(f"Unexpected IndexError from bwcheck path: {exc}")
272-
except Exception:
273-
pass
266+
267+
def test_interior_boundary_rejected(self):
268+
"""boundary strictly inside the support must be rejected by
269+
the Phase 1b wrapper. Running the boundary selector at an
270+
interior point would silently use a symmetric kernel and
271+
produce a bandwidth incompatible with the one-sided fitter."""
272+
rng = np.random.default_rng(2026)
273+
d = rng.uniform(0.0, 1.0, size=500)
274+
y = d + d**2 + rng.normal(0, 0.5, size=500)
275+
with pytest.raises(ValueError, match="boundary"):
276+
mse_optimal_bandwidth(d, y, boundary=0.5)
277+
278+
def test_upper_boundary_rejected(self):
279+
"""boundary at d.max() (upper support edge) must be rejected."""
280+
rng = np.random.default_rng(2026)
281+
d = rng.uniform(0.0, 1.0, size=500)
282+
y = d + d**2 + rng.normal(0, 0.5, size=500)
283+
with pytest.raises(ValueError, match="boundary"):
284+
mse_optimal_bandwidth(d, y, boundary=float(d.max()))
285+
286+
def test_boundary_equal_to_min_d_accepted(self):
287+
"""Design 1 continuous-near-d_lower uses boundary = min(d)
288+
exactly; this must pass the applicability check."""
289+
rng = np.random.default_rng(20260419)
290+
d = rng.uniform(1.0, 2.0, size=1500)
291+
y = 3.0 + 0.5 * (d - 1.0) ** 2 + rng.normal(0, 0.3, size=1500)
292+
h = mse_optimal_bandwidth(d, y, boundary=float(d.min()))
293+
assert np.isfinite(h)
294+
assert h > 0.0
295+
296+
def test_boundary_below_min_d_accepted(self):
297+
"""Design 1' uses boundary = 0 when all D_2 > 0; boundary
298+
below d.min() must pass."""
299+
rng = np.random.default_rng(20260419)
300+
d = rng.uniform(0.01, 1.0, size=1500) # d.min() > 0
301+
y = d + d**2 + rng.normal(0, 0.5, size=1500)
302+
h = mse_optimal_bandwidth(d, y, boundary=0.0)
303+
assert np.isfinite(h)
304+
assert h > 0.0
274305

275306

276307
class TestKernelDispatch:

0 commit comments

Comments
 (0)