igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎benchmarks/R/generate_sdid_placebo_parity_fixture.R‎
Lines changed: 142 additions & 0 deletions b/‎benchmarks/R/generate_sdid_placebo_parity_fixture.R‎
Lines changed: 142 additions & 0 deletions
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [3.3.0] - 2026-04-25
 
+### Fixed
+- **`SyntheticDiD(variance_method="placebo")` SE now uses R-default warm-start** matching `synthdid:::placebo_se`. R's placebo loop seeds Frank-Wolfe per draw with `weights.boot$omega = sum_normalize(weights$omega[ind[1:N0_placebo]])` (fit-time ω subsetted + renormalized) and the fit-time `weights$lambda` — Python previously used uniform cold-start, producing finite-iter convergence-pattern drift on a handful of draws relative to R's reference SE. New `_placebo_variance_se` kwargs `init_omega` / `init_lambda` thread fit-time weights through the existing two-pass FW dispatcher; on the global FW optimum the values are init-independent (strictly convex objective), so the change is a finite-iter parity fix, not a methodology change. Existing placebo SE values shift by sub-percent on most panels; the bit-identity baseline pin in `TestScaleEquivariance::test_baseline_parity_small_scale[placebo]` was rebased from `0.29385822261006445` to `0.293840360160448`. New R-parity test `tests/test_methodology_sdid.py::TestJackknifeSERParity::test_placebo_se_matches_r` asserts SE matches R's `vcov(method="placebo")` to within `< 1e-8` using R's exact permutation sequence (recorded by `benchmarks/R/generate_sdid_placebo_parity_fixture.R` into `tests/data/sdid_placebo_indices_r.json`). The `_placebo_indices` kwarg on `_placebo_variance_se` is the test seam; not part of the public API.
+
 ### Added
 - **`qug_test` and `did_had_pretest_workflow` survey-aware NotImplementedError gates (Phase 4.5 C0 decision gate).** `qug_test(d, *, survey=None, weights=None)` and `did_had_pretest_workflow(..., *, survey=None, weights=None)` now accept the two kwargs as keyword-only with default `None`. Passing either non-`None` raises `NotImplementedError` with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on `survey=` + `weights=` mirrors `HeterogeneousAdoptionDiD.fit()` at `had.py:2890`. **QUG-under-survey is permanently deferred** — the test statistic uses extreme order statistics `D_{(1)}, D_{(2)}` which are NOT smooth functionals of the empirical CDF, so standard survey machinery (Binder-TSL linearization, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the `Exp(1)/Exp(1)` limit law's independence assumption breaks; and the EVT-under-unequal-probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is **temporary** — Phase 4.5 C will close it for the linearity-family pretests with mechanism varying by test: Rao-Wu rescaled bootstrap for `stute_test` and the joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); weighted OLS residuals + weighted variance estimator for `yatchew_hr_test` (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Sister pretests (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`) keep their closed signatures in this release — Phase 4.5 C will add kwargs and implementation together to avoid API churn. Unweighted `qug_test(d)` and `did_had_pretest_workflow(...)` calls are bit-exact pre-PR (kwargs are keyword-only after `*`; positional path unchanged). New tests at `tests/test_had_pretests.py::TestQUGTest` (5 rejection / mutex / message / regression tests) and the new `TestHADPretestWorkflowSurveyGuards` class (6 tests covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C0) for the full methodology rationale plus a sketch of the (out-of-scope) theoretical bridge that combines endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) — publishable methodology research, not engineering work.
 - **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
 
@@ -0,0 +1,142 @@
+#!/usr/bin/env Rscript
+# Generate a fixture pinning R's `synthdid::vcov(method="placebo")` SE plus
+# the per-replication permutations R consumed, so the Python R-parity test
+# can feed those exact permutations through `_placebo_variance_se` and
+# assert SE match at machine precision.
+#
+# Usage:
+#   Rscript benchmarks/R/generate_sdid_placebo_parity_fixture.R
+#
+# Output:
+#   tests/data/sdid_placebo_indices_r.json
+#
+# Symmetric with the existing jackknife R-parity test
+# (TestJackknifeSERParity in tests/test_methodology_sdid.py:1410). Reuses
+# the same Y matrix and (N0, N1, T0, T1) shape so the placebo + jackknife
+# parity tests share an anchor panel.
+#
+# R version: 4.5.2; synthdid version: 0.0.9.
+
+library(synthdid)
+library(jsonlite)
+
+# Reconstruct R's panel exactly as TestJackknifeSERParity does (set.seed(42),
+# 23 units × 8 periods, treated = i > N0 with effect 5 in t > T0).
+set.seed(42)
+N0 <- 20
+N1 <- 3
+T0 <- 5
+T1 <- 3
+N <- N0 + N1
+T <- T0 + T1
+Y <- matrix(0, nrow = N, ncol = T)
+for (i in 1:N) {
+  unit_fe <- rnorm(1, sd = 2)
+  for (t in 1:T) {
+    Y[i, t] <- 10 + unit_fe + (t - 1) * 0.3 + rnorm(1, sd = 0.5)
+    if (i > N0 && t > T0) Y[i, t] <- Y[i, t] + 5.0
+  }
+}
+
+# Fit-time ATT (sanity check — must match TestJackknifeSERParity.R_ATT).
+tau_hat <- synthdid_estimate(Y, N0, T0)
+r_att <- as.numeric(tau_hat)
+
+# Reproduce R's placebo_se loop exactly so we can record permutations and
+# the per-rep tau alongside the resulting SE. Mirrors `synthdid:::placebo_se`
+# (R/vcov.R), including the warm-start weights pass-through:
+#
+#   theta = function(ind) {
+#     N0 = length(ind) - N1
+#     weights.boot = weights
+#     weights.boot$omega = sum_normalize(weights$omega[ind[1:N0]])
+#     do.call(synthdid_estimate, c(list(Y = setup$Y[ind, ],
+#         N0 = N0, T0 = setup$T0, X = setup$X[ind, , ],
+#         weights = weights.boot), opts))
+#   }
+#
+# The warm-start `weights.boot$omega` differs from a fresh uniform init
+# at finite FW iterations and is what `vcov(method="placebo")` actually
+# consumes — so reproducing it here is required for bit-identical SE.
+opts_used <- attr(tau_hat, "opts")
+fit_weights <- attr(tau_hat, "weights")
+fit_setup <- attr(tau_hat, "setup")
+replications <- 200
+
+# Use a fresh seed for the placebo loop so the recorded permutations are
+# independent of the fit-time RNG state. Python consumes the recorded
+# permutations directly (no RNG-state matching needed).
+set.seed(42)
+perms <- vector("list", replications)
+taus <- numeric(replications)
+
+for (r in 1:replications) {
+  ind <- sample(1:N0, N0)
+  perms[[r]] <- ind
+  N0_placebo <- N0 - N1
+  weights_boot <- fit_weights
+  weights_boot$omega <- synthdid:::sum_normalize(fit_weights$omega[ind[1:N0_placebo]])
+  # IMPORTANT: R's `placebo_se` uses ONLY the N0 controls (subdivided into
+  # N0-N1 pseudo-controls + N1 pseudo-treated). Real treated rows are NOT
+  # included in the placebo Y matrix — that's what makes the placebo a
+  # null-distribution test. ``Y = setup$Y[ind, ]`` is N0 rows; appending
+  # the real treated rows (i.e., ``setup$Y[c(ind, (N0+1):N), ]``) would
+  # change the test entirely (and produces SE ~0.132 instead of R's 0.226
+  # — a 2× drift on this fixture).
+  est_placebo <- do.call(
+    synthdid_estimate,
+    c(list(
+      Y = fit_setup$Y[ind, ],
+      N0 = N0_placebo,
+      T0 = T0,
+      X = fit_setup$X[ind, , ],
+      weights = weights_boot
+    ), opts_used)
+  )
+  taus[r] <- as.numeric(est_placebo)
+}
+
+r_placebo_se <- sqrt((replications - 1) / replications) * sd(taus)
+
+# Sanity check against R's vcov() entry point. With the warm-start pattern
+# applied explicitly above, the manual loop and `vcov()` should produce
+# the same SE up to MC noise on the seed sequence. Match isn't required
+# for the parity test (we use `r_placebo_se` from our recorded
+# permutations); both values are kept for transparency.
+set.seed(42)
+r_placebo_se_via_vcov <- sqrt(vcov(tau_hat, method = "placebo", replications = replications)[1, 1])
+
+cat(sprintf("R ATT:                       %.15f\n", r_att))
+cat(sprintf("R placebo SE (manual loop):  %.15f\n", r_placebo_se))
+cat(sprintf("R placebo SE (via vcov):     %.15f\n", r_placebo_se_via_vcov))
+cat(sprintf("Replications:                %d\n", replications))
+
+# Convert permutations to 0-indexed for Python (R uses 1-indexed).
+perms_0indexed <- lapply(perms, function(p) as.integer(p - 1L))
+
+payload <- list(
+  metadata = list(
+    R_version = paste(R.version$major, R.version$minor, sep = "."),
+    synthdid_version = as.character(packageVersion("synthdid")),
+    seed = 42L,
+    replications = as.integer(replications),
+    note = paste(
+      "Permutations are 0-indexed for direct numpy consumption.",
+      "R ATT, R placebo SE (manual loop), and per-rep taus are pinned",
+      "for downstream Python parity assertion."
+    )
+  ),
+  N0 = as.integer(N0),
+  N1 = as.integer(N1),
+  T0 = as.integer(T0),
+  T1 = as.integer(T1),
+  R_ATT = r_att,
+  R_PLACEBO_SE = r_placebo_se,
+  R_PLACEBO_SE_VIA_VCOV = r_placebo_se_via_vcov,
+  R_PLACEBO_TAUS = as.numeric(taus),
+  R_PERMUTATIONS = perms_0indexed
+)
+
+out_path <- "tests/data/sdid_placebo_indices_r.json"
+write_json(payload, out_path, auto_unbox = TRUE, digits = 17)
+cat(sprintf("\nWrote %s\n", out_path))