Skip to content

Commit 189f560

Browse files
igerberclaude
andcommitted
Phase 1c: Bias-corrected local-linear CI (CCT 2014)
Port nprobust::lprobust's single-eval-point path (lprobust.R:177-246) as the foundation for paper Equation 8: - diff_diff/_nprobust_port.py: add `lprobust()` + `LprobustResult`. Uses the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` to produce classical and bias-corrected point estimates along with naive and robust (CCT 2014) standard errors in a single pass. - diff_diff/local_linear.py: add `bias_corrected_local_linear()` + `BiasCorrectedFit`. Public wrapper returns the mu-scale CI `[tau.bc +/- z_{1-alpha/2} * se.rb]`. Auto-bandwidth path delegates to `mse_optimal_bandwidth` and honors nprobust's rho=1 default (b = h). Also extracts shared `_validate_had_inputs` helper from Phase 1b. - benchmarks/R/generate_nprobust_lprobust_golden.R + golden JSON: 5 DGPs (Uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R's z = qnorm(1-alpha/2) exported so Python skips ppf and matches bit-wise on CI arithmetic. - Tests: TestLprobustSingleEval (8 port-level) + test_bias_corrected_lprobust (29 wrapper-level). Tiered tolerances per plan: 1e-12 on tau/se, 1e-13 on CI bounds; clustered DGP 4 hits bit-parity (1e-14). Known deviations from nprobust (in REGISTRY.md): hc2/hc3 + cluster raises (nprobust silently accepts); clustered DGP 4 uses manual h=b=0.3 to sidestep an nprobust-internal singleton-cluster bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2e9447e commit 189f560

9 files changed

Lines changed: 1751 additions & 112 deletions

File tree

TODO.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,11 @@ Deferred items from PR reviews that were not addressed before merge.
8686
| `compute_synthetic_weights` backend algorithm mismatch: Rust path uses Frank-Wolfe (`_rust_synthetic_weights` in `utils.py:1184`); Python fallback uses projected gradient descent (`_compute_synthetic_weights_numpy` in `utils.py:1228`). Both solve the same constrained QP but converge to different simplex vertices on near-degenerate / extreme-scale inputs (e.g. `Y~1e9`, or near-singular `Y'Y`). Unified backend (one algorithm) would close the parity gap surfaced by audit finding #22. Two `@pytest.mark.xfail(strict=True)` tests in `tests/test_rust_backend.py::TestSyntheticWeightsBackendParity` baseline the divergence so we notice when/if the algorithms align. | `utils.py`, `rust/` | follow-up | Medium |
8787
| TROP Rust vs Python grid-search divergence on rank-deficient Y: on two near-parallel control units, LOOCV grid-search ATT diverges ~6% between Rust (`trop_global.py:688`) and Python fallback (`trop_global.py:753`). Either grid-winner ties are broken differently or the per-λ solver reaches different stationary points under rank deficiency. Audit finding #23 flagged this surface. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_grid_search_rank_deficient_Y` baselines the gap. | `trop_global.py`, `rust/` | follow-up | Medium |
8888
| TROP Rust vs Python bootstrap SE divergence under fixed seed: `seed=42` on a tiny panel produces ~28% bootstrap-SE gap. Root cause: Rust bootstrap uses its own RNG (`rand` crate) while Python uses `numpy.random.default_rng`; same seed value maps to different bytestreams across backends. Audit axis-H (RNG/seed) adjacent. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility` baselines the gap. Unifying RNG (threading a numpy-generated seed-sequence into Rust, or porting Python to ChaCha) would close it. | `trop_global.py`, `rust/` | follow-up | Medium |
89+
| `bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
90+
| `bias_corrected_local_linear`: extend golden parity to `vce in {"hc0", "hc1", "hc2", "hc3"}`. The port supports all four via `lprobust_res`'s hc branches but Phase 1c golden-tested only `vce="nn"`. | `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py` | Phase 1c | Low |
91+
| `bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. | `diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Medium |
92+
| `bias_corrected_local_linear`: support multi-eval grid (`neval > 1`) with cross-covariance (`covgrid=TRUE` branch of `lprobust.R:253-378`). Not needed for HAD but useful for multi-dose diagnostics. | `diff_diff/_nprobust_port.py::lprobust` | Phase 1c | Low |
93+
| Clustered-DGP parity: Phase 1c's DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster bug in `lpbwselect.mse.dpi`'s pilot fits. Once nprobust ships a fix (or we derive one independently), add a clustered-auto-bandwidth parity test. | `benchmarks/R/generate_nprobust_lprobust_golden.R` | Phase 1c | Low |
8994

9095
#### Performance
9196

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Generate nprobust lprobust golden values for the Phase 1c parity suite.
2+
#
3+
# This script calls nprobust::lprobust() at a single eval point with
4+
# bwselect="mse-dpi" on five deterministic DGPs and records:
5+
# - tau.cl, tau.bc (point estimates)
6+
# - se.cl, se.rb (standard errors)
7+
# - h, b (bandwidths chosen by the mse-dpi selector)
8+
# - N (observations in the selected kernel window)
9+
# - z = qnorm(1 - alpha/2)
10+
# - ci_low, ci_high = tau.bc +/- z * se.rb
11+
#
12+
# DGPs 1-3 reuse the same seed + shape as benchmarks/R/generate_nprobust_golden.R
13+
# so the selected (h, b) are identical; Phase 1c parity is therefore isolated
14+
# to the point-estimate + variance computation. DGP 4 adds cluster IDs for
15+
# cluster-robust SE parity; DGP 5 shifts the support to test a non-zero
16+
# boundary (Design 1 continuous-near-d_lower).
17+
#
18+
# Usage:
19+
# Rscript benchmarks/R/generate_nprobust_lprobust_golden.R
20+
#
21+
# Requirements:
22+
# nprobust (CRAN), jsonlite
23+
#
24+
# Output:
25+
# benchmarks/data/nprobust_lprobust_golden.json
26+
#
27+
# Phase 1c of the HeterogeneousAdoptionDiD implementation (de Chaisemartin,
28+
# Ciccia, D'Haultfoeuille & Knau 2026, arXiv:2405.04465v6). Python tests at
29+
# tests/test_bias_corrected_lprobust.py and tests/test_nprobust_port.py load
30+
# this JSON and check agreement to tiered tolerances (1e-14 on tau_cl/se_cl,
31+
# 1e-12 on tau_bc/se_rb, 1e-13 on CI bounds; see Phase 1c plan).
32+
33+
library(nprobust)
34+
library(jsonlite)
35+
36+
stopifnot(packageVersion("nprobust") == "0.5.0")
37+
38+
extract_lprobust_single_eval <- function(d, y, eval_point = 0.0,
39+
kernel = "epa", vce = "nn",
40+
cluster = NULL, alpha = 0.05,
41+
h = NULL, b = NULL) {
42+
# If h (and optionally b) are passed, bypass the mse-dpi selector and
43+
# call lprobust() with those bandwidths directly. This is used for
44+
# clustered DGPs where nprobust's internal lpbwselect.mse.dpi hits a
45+
# singleton-cluster shape bug in lprobust.vce during the order-q+1/q+2
46+
# pilot fits. For unclustered DGPs, h=NULL triggers bwselect="mse-dpi".
47+
if (is.null(h)) {
48+
fit <- lprobust(y = y, x = d, eval = eval_point,
49+
p = 1L, deriv = 0L, kernel = kernel,
50+
bwselect = "mse-dpi", vce = vce, cluster = cluster,
51+
bwcheck = 21L, bwregul = 1, nnmatch = 3L)
52+
} else {
53+
# When b is unspecified, nprobust defaults to b = h / rho with rho=1.
54+
fit <- lprobust(y = y, x = d, eval = eval_point,
55+
p = 1L, deriv = 0L, kernel = kernel,
56+
h = h, b = if (is.null(b)) h else b,
57+
vce = vce, cluster = cluster,
58+
bwcheck = 21L, nnmatch = 3L)
59+
}
60+
est <- fit$Estimate[1, ]
61+
z <- qnorm(1 - alpha / 2)
62+
ci_low <- as.numeric(est["tau.bc"] - z * est["se.rb"])
63+
ci_high <- as.numeric(est["tau.bc"] + z * est["se.rb"])
64+
65+
list(
66+
eval_point = as.numeric(eval_point),
67+
h = as.numeric(est["h"]),
68+
b = as.numeric(est["b"]),
69+
n_used = as.integer(est["N"]),
70+
tau_cl = as.numeric(est["tau.us"]),
71+
tau_bc = as.numeric(est["tau.bc"]),
72+
se_cl = as.numeric(est["se.us"]),
73+
se_rb = as.numeric(est["se.rb"]),
74+
ci_low = ci_low,
75+
ci_high = ci_high,
76+
alpha = as.numeric(alpha),
77+
z = as.numeric(z)
78+
)
79+
}
80+
81+
set.seed(20260419)
82+
83+
# DGP 1: d ~ Uniform(0, 1), y = d + d^2 + N(0, 0.5)
84+
G <- 2000L
85+
d1 <- runif(G, 0, 1)
86+
y1 <- d1 + d1^2 + rnorm(G, 0, 0.5)
87+
88+
# DGP 2: d ~ Beta(2, 2), y = d + d^2 + N(0, 0.5) (f(0) vanishes at boundary)
89+
d2 <- rbeta(G, 2, 2)
90+
y2 <- d2 + d2^2 + rnorm(G, 0, 0.5)
91+
92+
# DGP 3: Half-normal d, y = 0.5 * d^2 + N(0, 1)
93+
d3 <- abs(rnorm(G, 0, 1))
94+
y3 <- 0.5 * d3^2 + rnorm(G, 0, 1)
95+
96+
# DGP 4: Uniform(0, 1) with 50 clusters of 40 obs (cluster-robust SE parity).
97+
# Fewer, larger clusters avoid an nprobust-internal singleton-cluster shape
98+
# bug in lprobust.vce that fires if a kernel window retains only one obs per
99+
# cluster. 50 clusters x 40 obs => the mse-dpi pilot windows near the
100+
# boundary keep enough obs per cluster to stay well-conditioned.
101+
set.seed(20260420)
102+
G4 <- 2000L
103+
d4 <- runif(G4, 0, 1)
104+
cluster4 <- rep(1:50, each = 40)[1:G4]
105+
# Introduce within-cluster correlation in y via a cluster effect.
106+
cluster_effect <- rnorm(50, 0, 0.3)[cluster4]
107+
y4 <- d4 + d4^2 + cluster_effect + rnorm(G4, 0, 0.3)
108+
109+
# DGP 5: Uniform(0.2, 1.0) — Design 1 continuous-near-d_lower at
110+
# boundary = d.min() > 0. Different seed to avoid aliasing DGP 1.
111+
set.seed(20260421)
112+
G5 <- 2000L
113+
d5 <- runif(G5, 0.2, 1.0)
114+
y5 <- (d5 - 0.2) + (d5 - 0.2)^2 + rnorm(G5, 0, 0.5)
115+
eval5 <- min(d5) # Design 1 continuous: evaluate at the realized minimum.
116+
117+
golden <- list(
118+
metadata = list(
119+
nprobust_version = as.character(packageVersion("nprobust")),
120+
nprobust_sha = "36e4e532d2f7d23d4dc6e162575cca79e0927cda",
121+
seeds = list(dgp1 = 20260419L, dgp2 = 20260419L, dgp3 = 20260419L,
122+
dgp4 = 20260420L, dgp5 = 20260421L),
123+
generator = "benchmarks/R/generate_nprobust_lprobust_golden.R",
124+
algorithm = paste("nprobust::lprobust(..., bwselect='mse-dpi') at a single",
125+
"eval point, p=1, deriv=0, kernel='epa', vce='nn'",
126+
"unless noted. z = qnorm(1 - alpha/2) exported so the",
127+
"Python side consumes R's critical value directly.")
128+
),
129+
dgp1 = c(list(n = G, d = d1, y = y1, kernel = "epa", vce = "nn",
130+
description = "Uniform(0,1), polynomial m(d) = d + d^2"),
131+
extract_lprobust_single_eval(d1, y1, kernel = "epa", vce = "nn")),
132+
dgp2 = c(list(n = G, d = d2, y = y2, kernel = "epa", vce = "nn",
133+
description = "Beta(2,2) - boundary density vanishes at 0"),
134+
extract_lprobust_single_eval(d2, y2, kernel = "epa", vce = "nn")),
135+
dgp3 = c(list(n = G, d = d3, y = y3, kernel = "epa", vce = "nn",
136+
description = "Half-normal d, quadratic m(d) with unit noise"),
137+
extract_lprobust_single_eval(d3, y3, kernel = "epa", vce = "nn")),
138+
dgp4 = c(list(n = G4, d = d4, y = y4, cluster = cluster4,
139+
kernel = "epa", vce = "nn",
140+
h_manual = 0.3, b_manual = 0.3,
141+
description = paste("Clustered (50 groups of 40) Uniform(0,1);",
142+
"manual h=b=0.3 to sidestep nprobust's",
143+
"singleton-cluster bug in the mse-dpi",
144+
"pilot fits.")),
145+
extract_lprobust_single_eval(d4, y4, kernel = "epa", vce = "nn",
146+
cluster = cluster4,
147+
h = 0.3, b = 0.3)),
148+
dgp5 = c(list(n = G5, d = d5, y = y5, eval_point_override = eval5,
149+
kernel = "epa", vce = "nn",
150+
description = "Uniform(0.2, 1.0), Design 1 boundary = d.min()"),
151+
extract_lprobust_single_eval(d5, y5, eval_point = eval5,
152+
kernel = "epa", vce = "nn"))
153+
)
154+
155+
out_path <- "benchmarks/data/nprobust_lprobust_golden.json"
156+
dir.create("benchmarks/data", recursive = TRUE, showWarnings = FALSE)
157+
write_json(golden, out_path, auto_unbox = TRUE, pretty = TRUE, digits = 14)
158+
cat("Golden values written to", out_path, "\n")
159+
for (name in c("dgp1", "dgp2", "dgp3", "dgp4", "dgp5")) {
160+
cat(sprintf("%s: tau.bc = %.6f, se.rb = %.6f, h = %.6f, b = %.6f\n",
161+
name, golden[[name]]$tau_bc, golden[[name]]$se_rb,
162+
golden[[name]]$h, golden[[name]]$b))
163+
}

benchmarks/data/nprobust_lprobust_golden.json

Lines changed: 119 additions & 0 deletions
Large diffs are not rendered by default.

diff_diff/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,9 @@
4646
from diff_diff.local_linear import (
4747
KERNELS,
4848
BandwidthResult,
49+
BiasCorrectedFit,
4950
LocalLinearFit,
51+
bias_corrected_local_linear,
5052
epanechnikov_kernel,
5153
kernel_moments,
5254
local_linear_fit,
@@ -427,6 +429,9 @@
427429
# MSE-optimal bandwidth selector (Phase 1b for HeterogeneousAdoptionDiD)
428430
"BandwidthResult",
429431
"mse_optimal_bandwidth",
432+
# Bias-corrected local-linear (Phase 1c for HeterogeneousAdoptionDiD)
433+
"BiasCorrectedFit",
434+
"bias_corrected_local_linear",
430435
# Datasets
431436
"load_card_krueger",
432437
"load_castle_doctrine",

0 commit comments

Comments
 (0)