Skip to content

Commit 12a8efe

Browse files
igerberclaude
andcommitted
Clarify z-critical-value doc claims (audit export, not direct consumption)
P3 doc-accuracy note from AI review: the registry, R generator, and local_linear.py comments said the Python side "consumes R's z directly" but the wrapper actually computes its own z via scipy.stats.norm.ppf inside safe_inference(). The golden JSON's z field is an audit/reference export so a reviewer can confirm R's qnorm and Python's scipy.stats.norm.ppf agree to machine precision. The parity harness compares Python-computed CI bounds to R-computed CI bounds (not R's z fed into Python's arithmetic). Behavior is unchanged; only doc phrasing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7672f8d commit 12a8efe

3 files changed

Lines changed: 13 additions & 8 deletions

File tree

benchmarks/R/generate_nprobust_lprobust_golden.R

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,11 @@ golden <- list(
123123
generator = "benchmarks/R/generate_nprobust_lprobust_golden.R",
124124
algorithm = paste("nprobust::lprobust(..., bwselect='mse-dpi') at a single",
125125
"eval point, p=1, deriv=0, kernel='epa', vce='nn'",
126-
"unless noted. z = qnorm(1 - alpha/2) exported so the",
127-
"Python side consumes R's critical value directly.")
126+
"unless noted. The Python wrapper computes its own",
127+
"z_{1-alpha/2} via scipy.stats.norm.ppf inside",
128+
"safe_inference(); R's z is exported here for audit",
129+
"so a reviewer can verify the two critical values",
130+
"agree to machine precision.")
128131
),
129132
dgp1 = c(list(n = G, d = d1, y = y1, kernel = "epa", vce = "nn",
130133
description = "Uniform(0,1), polynomial m(d) = d + d^2"),

diff_diff/local_linear.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1191,11 +1191,13 @@ def bias_corrected_local_linear(
11911191
# cases where the residual vector collapses), ALL inference fields —
11921192
# including the CI — must return NaN. This enforces the repo-wide
11931193
# inference contract (CLAUDE.md Key Design Pattern #6; CI review
1194-
# PR #340 P0) rather than returning a misleading zero-width or infinite
1195-
# CI. safe_inference also handles the R z = qnorm(1 - alpha/2) critical
1196-
# value via scipy.stats.norm.ppf (the golden JSON stores R's z so
1197-
# parity tests consume R's value directly and drift is pure
1198-
# tau.bc + z * se.rb arithmetic).
1194+
# PR #340 P0) rather than returning a misleading zero-width or
1195+
# infinite CI. safe_inference computes the critical value z_{1-α/2}
1196+
# via scipy.stats.norm.ppf; the parity tests compare Python's
1197+
# scipy-computed ci_low/ci_high to R's qnorm-computed ci_low/ci_high
1198+
# stored in the golden JSON. The golden JSON also exports R's raw
1199+
# `z` value for audit/reference so a reviewer can verify the two
1200+
# critical values agree to machine precision.
11991201
from diff_diff.utils import safe_inference
12001202

12011203
_, _, (ci_low, ci_high) = safe_inference(

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2314,7 +2314,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
23142314
- [x] Phase 1a: `clubSandwich::vcovCR(..., type="CR2")` parity harness committed: R script at `benchmarks/R/generate_clubsandwich_golden.R` plus a regression-anchor JSON at `benchmarks/data/clubsandwich_cr2_golden.json`. **Note:** the committed JSON currently has `"source": "python_self_reference"` and pins numerical stability only; authoritative R-produced values are generated by running the R script, which the TODO.md row under Methodology/Correctness tracks. The parity test at `tests/test_linalg_hc2_bm.py::TestCR2BMCluster::test_cr2_parity_with_golden` runs at 1e-6 tolerance (Phase 1a plan commits 6-digit parity once R regen completes).
23152315
- [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4.
23162316
- [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246).
2317-
- [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is tiered: `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb` (the `Q.q` outer-product + broadcast-multiply path drifts slightly from the Phase 1b bit-parity primitives), `atol=1e-13` on CI bounds (R's `z_{1-α/2}` is stored in the golden JSON so Python consumes it directly rather than calling `scipy.stats.norm.ppf`). The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs happen to be sorted; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Deviation from R:** nprobust silently accepts `vce="hc2"`/`vce="hc3"` combined with `cluster=`; this wrapper raises `ValueError` because the combination is not a well-defined estimator (hc2/hc3 assume independent observations). **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
2317+
- [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is tiered: `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb` (the `Q.q` outer-product + broadcast-multiply path drifts slightly from the Phase 1b bit-parity primitives), `atol=1e-13` on CI bounds (the Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement). The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs happen to be sorted; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Deviation from R:** nprobust silently accepts `vce="hc2"`/`vce="hc3"` combined with `cluster=`; this wrapper raises `ValueError` because the combination is not a well-defined estimator (hc2/hc3 assume independent observations). **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug.
23182318
- [ ] Phase 2: `HeterogeneousAdoptionDiD` class with separate code paths for Design 1', Design 1 mass-point, and Design 1 continuous-near-``.
23192319
- [ ] Phase 2: `design="auto"` detection rule (`min_g D_{g,2} < 0.01 · median_g D_{g,2}` → continuous_at_zero; modal-min fraction > 2% → mass_point; else continuous_near_lower).
23202320
- [ ] Phase 2: Panel validator verifies `D_{g,1} = 0` for all units; error on staggered timing without last-cohort subgroup.

0 commit comments

Comments
 (0)