Align Phase 1c parity-contract narrative across wrapper, JSON, tests

igerber · claude · igerber · commit 7eeb42ac95be · 2026-04-20T07:13:21.000-04:00
P3 follow-up from AI review. Three small inconsistencies to resolve:

1. `bias_corrected_local_linear` docstring still described tau_cl/se_cl
   as bit-parity and said Python consumes R's z directly. The actual
   contract is atol=1e-12 on all four scalars (DGP 1-3) and the
   wrapper computes its own z via scipy.stats.norm.ppf; R's qnorm is
   stored in the JSON for audit only. Docstring updated to match.

2. Committed golden JSON metadata still had the old "consume R's
   critical value directly" string because the generator was edited
   without regenerating. Regenerated so JSON metadata matches the
   corrected audit-export wording in the R script.

3. Parity tests for DGP 4 and DGP 5 did not assert CI bounds. Added
   ci_low / ci_high assertions at the same tolerance as the
   corresponding se_rb assertion (bit-parity for DGP 4, 1e-12 for
   DGP 5), so the audit surface matches what the registry states.

Behavior unchanged; tests strengthened and docs aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/benchmarks/data/nprobust_lprobust_golden.json b/benchmarks/data/nprobust_lprobust_golden.json
@@ -10,7 +10,7 @@
       "dgp5": 20260421
     },
     "generator": "benchmarks/R/generate_nprobust_lprobust_golden.R",
-    "algorithm": "nprobust::lprobust(..., bwselect='mse-dpi') at a single eval point, p=1, deriv=0, kernel='epa', vce='nn' unless noted. z = qnorm(1 - alpha/2) exported so the Python side consumes R's critical value directly."
+    "algorithm": "nprobust::lprobust(..., bwselect='mse-dpi') at a single eval point, p=1, deriv=0, kernel='epa', vce='nn' unless noted. The Python wrapper computes its own z_{1-alpha/2} via scipy.stats.norm.ppf inside safe_inference(); R's z is exported here for audit so a reviewer can verify the two critical values agree to machine precision."
   },
   "dgp1": {
     "n": 2000,
diff --git a/diff_diff/local_linear.py b/diff_diff/local_linear.py
@@ -1020,12 +1020,17 @@ def bias_corrected_local_linear(
     Notes
     -----
     Parity against ``nprobust::lprobust(..., bwselect="mse-dpi")`` is tiered
-    (see ``docs/methodology/REGISTRY.md``): bit-parity on ``tau_cl``/``se_cl``
-    (same arithmetic path as Phase 1b's bit-parity-verified primitives);
-    ``atol=1e-12`` on ``tau_bc``/``se_rb`` (new outer-product step); and
-    ``atol=1e-13`` on CI bounds (R's ``z_{1-alpha/2}`` is stored in the
-    golden JSON so Python consumes it directly rather than calling
-    ``scipy.stats.norm.ppf``).
+    (see ``docs/methodology/REGISTRY.md``): ``atol=1e-12`` on ``tau_cl``,
+    ``tau_bc``, ``se_cl``, and ``se_rb`` across the three unclustered
+    golden DGPs; ``atol=1e-13`` on CI bounds. The Python wrapper computes
+    its own ``z_{1-alpha/2}`` via ``scipy.stats.norm.ppf`` inside
+    ``safe_inference()``; R's ``qnorm`` value is stored in the golden JSON
+    for audit, and the parity harness compares Python's CI bounds to R's
+    pre-computed CI bounds, so any residual drift is purely the
+    floating-point arithmetic in ``tau.bc +/- z * se.rb``, not a
+    critical-value disagreement. Clustered DGP 4 achieves bit-parity
+    (``atol=1e-14``) when cluster IDs happen to be in first-appearance
+    order; otherwise BLAS reduction ordering can drift to ``atol=1e-10``.
     """
     if weights is not None:
         raise NotImplementedError(
diff --git a/tests/test_bias_corrected_lprobust.py b/tests/test_bias_corrected_lprobust.py
@@ -192,6 +192,13 @@ def test_clustered_parity_dgp_4(self, golden):
                                    atol=1e-14, rtol=1e-14)
         np.testing.assert_allclose(fit.se_robust, g["se_rb"],
                                    atol=1e-14, rtol=1e-14)
+        # CI bounds at bit-parity too (Python's scipy ppf and R's qnorm
+        # agree to ULP; the remaining drift is pure tau.bc +/- z * se.rb
+        # floating-point arithmetic).
+        np.testing.assert_allclose(fit.ci_low, g["ci_low"],
+                                   atol=1e-14, rtol=1e-14)
+        np.testing.assert_allclose(fit.ci_high, g["ci_high"],
+                                   atol=1e-14, rtol=1e-14)
 
     def test_shifted_boundary_parity_dgp_5(self, golden):
         """Design 1 continuous-near-d_lower: boundary = d.min() > 0."""
@@ -210,6 +217,12 @@ def test_shifted_boundary_parity_dgp_5(self, golden):
                                    atol=1e-12, rtol=1e-12)
         np.testing.assert_allclose(fit.se_robust, g["se_rb"],
                                    atol=1e-12, rtol=1e-12)
+        # CI bounds at the same tolerance (Python scipy ppf vs R qnorm
+        # agree to ULP; tau.bc +/- z * se.rb inherits se_rb's drift).
+        np.testing.assert_allclose(fit.ci_low, g["ci_low"],
+                                   atol=1e-12, rtol=1e-12)
+        np.testing.assert_allclose(fit.ci_high, g["ci_high"],
+                                   atol=1e-12, rtol=1e-12)
 
 
 # =============================================================================