Skip to content

Commit 8cdf75e

Browse files
authored
Merge pull request #369 from igerber/sdid-placebo-r-parity
Close SDID placebo R-parity gap: warm-start + R-anchored fixture + test seam
2 parents 47d3e02 + db377e6 commit 8cdf75e

6 files changed

Lines changed: 357 additions & 14 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [3.3.0] - 2026-04-25
99

10+
### Fixed
11+
- **`SyntheticDiD(variance_method="placebo")` SE now uses R-default warm-start** matching `synthdid:::placebo_se`. R's placebo loop seeds Frank-Wolfe per draw with `weights.boot$omega = sum_normalize(weights$omega[ind[1:N0_placebo]])` (fit-time ω subsetted + renormalized) and the fit-time `weights$lambda` — Python previously used uniform cold-start, producing finite-iter convergence-pattern drift on a handful of draws relative to R's reference SE. New `_placebo_variance_se` kwargs `init_omega` / `init_lambda` thread fit-time weights through the existing two-pass FW dispatcher; on the global FW optimum the values are init-independent (strictly convex objective), so the change is a finite-iter parity fix, not a methodology change. Existing placebo SE values shift by sub-percent on most panels; the bit-identity baseline pin in `TestScaleEquivariance::test_baseline_parity_small_scale[placebo]` was rebased from `0.29385822261006445` to `0.293840360160448`. New R-parity test `tests/test_methodology_sdid.py::TestJackknifeSERParity::test_placebo_se_matches_r` asserts SE matches R's `vcov(method="placebo")` to within `< 1e-8` using R's exact permutation sequence (recorded by `benchmarks/R/generate_sdid_placebo_parity_fixture.R` into `tests/data/sdid_placebo_indices_r.json`). The `_placebo_indices` kwarg on `_placebo_variance_se` is the test seam; not part of the public API.
12+
1013
### Added
1114
- **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work.
1215
- **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
#!/usr/bin/env Rscript
2+
# Generate a fixture pinning R's `synthdid::vcov(method="placebo")` SE plus
3+
# the per-replication permutations R consumed, so the Python R-parity test
4+
# can feed those exact permutations through `_placebo_variance_se` and
5+
# assert SE match at machine precision.
6+
#
7+
# Usage:
8+
# Rscript benchmarks/R/generate_sdid_placebo_parity_fixture.R
9+
#
10+
# Output:
11+
# tests/data/sdid_placebo_indices_r.json
12+
#
13+
# Symmetric with the existing jackknife R-parity test
14+
# (TestJackknifeSERParity in tests/test_methodology_sdid.py:1410). Reuses
15+
# the same Y matrix and (N0, N1, T0, T1) shape so the placebo + jackknife
16+
# parity tests share an anchor panel.
17+
#
18+
# R version: 4.5.2; synthdid version: 0.0.9.
19+
20+
library(synthdid)
21+
library(jsonlite)
22+
23+
# Reconstruct R's panel exactly as TestJackknifeSERParity does (set.seed(42),
24+
# 23 units × 8 periods, treated = i > N0 with effect 5 in t > T0).
25+
set.seed(42)
26+
N0 <- 20
27+
N1 <- 3
28+
T0 <- 5
29+
T1 <- 3
30+
N <- N0 + N1
31+
T <- T0 + T1
32+
Y <- matrix(0, nrow = N, ncol = T)
33+
for (i in 1:N) {
34+
unit_fe <- rnorm(1, sd = 2)
35+
for (t in 1:T) {
36+
Y[i, t] <- 10 + unit_fe + (t - 1) * 0.3 + rnorm(1, sd = 0.5)
37+
if (i > N0 && t > T0) Y[i, t] <- Y[i, t] + 5.0
38+
}
39+
}
40+
41+
# Fit-time ATT (sanity check — must match TestJackknifeSERParity.R_ATT).
42+
tau_hat <- synthdid_estimate(Y, N0, T0)
43+
r_att <- as.numeric(tau_hat)
44+
45+
# Reproduce R's placebo_se loop exactly so we can record permutations and
46+
# the per-rep tau alongside the resulting SE. Mirrors `synthdid:::placebo_se`
47+
# (R/vcov.R), including the warm-start weights pass-through:
48+
#
49+
# theta = function(ind) {
50+
# N0 = length(ind) - N1
51+
# weights.boot = weights
52+
# weights.boot$omega = sum_normalize(weights$omega[ind[1:N0]])
53+
# do.call(synthdid_estimate, c(list(Y = setup$Y[ind, ],
54+
# N0 = N0, T0 = setup$T0, X = setup$X[ind, , ],
55+
# weights = weights.boot), opts))
56+
# }
57+
#
58+
# The warm-start `weights.boot$omega` differs from a fresh uniform init
59+
# at finite FW iterations and is what `vcov(method="placebo")` actually
60+
# consumes — so reproducing it here is required for bit-identical SE.
61+
opts_used <- attr(tau_hat, "opts")
62+
fit_weights <- attr(tau_hat, "weights")
63+
fit_setup <- attr(tau_hat, "setup")
64+
replications <- 200
65+
66+
# Use a fresh seed for the placebo loop so the recorded permutations are
67+
# independent of the fit-time RNG state. Python consumes the recorded
68+
# permutations directly (no RNG-state matching needed).
69+
set.seed(42)
70+
perms <- vector("list", replications)
71+
taus <- numeric(replications)
72+
73+
for (r in 1:replications) {
74+
ind <- sample(1:N0, N0)
75+
perms[[r]] <- ind
76+
N0_placebo <- N0 - N1
77+
weights_boot <- fit_weights
78+
weights_boot$omega <- synthdid:::sum_normalize(fit_weights$omega[ind[1:N0_placebo]])
79+
# IMPORTANT: R's `placebo_se` uses ONLY the N0 controls (subdivided into
80+
# N0-N1 pseudo-controls + N1 pseudo-treated). Real treated rows are NOT
81+
# included in the placebo Y matrix — that's what makes the placebo a
82+
# null-distribution test. ``Y = setup$Y[ind, ]`` is N0 rows; appending
83+
# the real treated rows (i.e., ``setup$Y[c(ind, (N0+1):N), ]``) would
84+
# change the test entirely (and produces SE ~0.132 instead of R's 0.226
85+
# — a 2× drift on this fixture).
86+
est_placebo <- do.call(
87+
synthdid_estimate,
88+
c(list(
89+
Y = fit_setup$Y[ind, ],
90+
N0 = N0_placebo,
91+
T0 = T0,
92+
X = fit_setup$X[ind, , ],
93+
weights = weights_boot
94+
), opts_used)
95+
)
96+
taus[r] <- as.numeric(est_placebo)
97+
}
98+
99+
r_placebo_se <- sqrt((replications - 1) / replications) * sd(taus)
100+
101+
# Sanity check against R's vcov() entry point. With the warm-start pattern
102+
# applied explicitly above, the manual loop and `vcov()` should produce
103+
# the same SE up to MC noise on the seed sequence. Match isn't required
104+
# for the parity test (we use `r_placebo_se` from our recorded
105+
# permutations); both values are kept for transparency.
106+
set.seed(42)
107+
r_placebo_se_via_vcov <- sqrt(vcov(tau_hat, method = "placebo", replications = replications)[1, 1])
108+
109+
cat(sprintf("R ATT: %.15f\n", r_att))
110+
cat(sprintf("R placebo SE (manual loop): %.15f\n", r_placebo_se))
111+
cat(sprintf("R placebo SE (via vcov): %.15f\n", r_placebo_se_via_vcov))
112+
cat(sprintf("Replications: %d\n", replications))
113+
114+
# Convert permutations to 0-indexed for Python (R uses 1-indexed).
115+
perms_0indexed <- lapply(perms, function(p) as.integer(p - 1L))
116+
117+
payload <- list(
118+
metadata = list(
119+
R_version = paste(R.version$major, R.version$minor, sep = "."),
120+
synthdid_version = as.character(packageVersion("synthdid")),
121+
seed = 42L,
122+
replications = as.integer(replications),
123+
note = paste(
124+
"Permutations are 0-indexed for direct numpy consumption.",
125+
"R ATT, R placebo SE (manual loop), and per-rep taus are pinned",
126+
"for downstream Python parity assertion."
127+
)
128+
),
129+
N0 = as.integer(N0),
130+
N1 = as.integer(N1),
131+
T0 = as.integer(T0),
132+
T1 = as.integer(T1),
133+
R_ATT = r_att,
134+
R_PLACEBO_SE = r_placebo_se,
135+
R_PLACEBO_SE_VIA_VCOV = r_placebo_se_via_vcov,
136+
R_PLACEBO_TAUS = as.numeric(taus),
137+
R_PERMUTATIONS = perms_0indexed
138+
)
139+
140+
out_path <- "tests/data/sdid_placebo_indices_r.json"
141+
write_json(payload, out_path, auto_unbox = TRUE, digits = 17)
142+
cat(sprintf("\nWrote %s\n", out_path))

0 commit comments

Comments
 (0)