Skip to content

Commit 37bb13e

Browse files
igerberclaude
andcommitted
Extend PR #312 Y-normalization contract into SDID diagnostic methods
PR #312 centered and scaled Y once in SyntheticDiD.fit() to avoid catastrophic cancellation in the SDID double-difference at extreme Y. The fix did not reach the two public diagnostic methods on SyntheticDiDResults: in_time_placebo() and sensitivity_to_zeta_omega() both re-run Frank-Wolfe on the original-scale fit snapshot with original-scale zetas, so at Y ~ 1e9 they reproduce the same silent failure the main path was fixed for. Capture Y_shift and Y_scale on the fit snapshot, apply the same (Y - shift) / scale normalization inside the two diagnostic methods, pass zeta / Y_scale and min_decrease derived from the normalized noise level to the FW weight solvers, and rescale att / pre_fit_rmse back to original-Y units before reporting. Unit-weight diagnostics (max_unit_weight, effective_n) are scale-invariant and reported without rescaling. Regression coverage: - TestDiagnosticScaleParity: att / pre_fit_rmse must scale by |a| across (Y -> a*Y + b) for both diagnostic methods; a no-effect DGP at Y ~ 1e9 must produce finite placebo atts landing within 5 * noise_level. - TestHeterogeneousAndRampingScale: cross-unit heterogeneous scale (units spanning 1e6 to 1e9) and cross-period ramping (trend growing 4 orders of magnitude across periods) must leave the fit and both diagnostic surfaces finite. These are the heterogeneity pathways the original TestScaleEquivariance (affine-only) suite did not cover. Covered by audit axis A (numerical precision / scale fragility). Findings D-4 and D-4b from docs/audits/silent-failures-findings.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 61aea00 commit 37bb13e

4 files changed

Lines changed: 284 additions & 45 deletions

File tree

diff_diff/results.py

Lines changed: 73 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -669,6 +669,12 @@ class _SyntheticDiDFitSnapshot:
669669
post_periods: List[Any]
670670
w_control: Optional[np.ndarray] = None
671671
w_treated: Optional[np.ndarray] = None
672+
# Normalization constants captured during fit() so diagnostic methods can
673+
# reproduce the main path's centering+scaling and avoid catastrophic
674+
# cancellation on extreme-Y panels. Defaults preserve behavior for
675+
# snapshots built before these fields existed.
676+
Y_shift: float = 0.0
677+
Y_scale: float = 1.0
672678

673679
def __post_init__(self):
674680
for arr in (
@@ -1144,8 +1150,17 @@ def in_time_placebo(
11441150
"in_time_placebo() needs zeta_omega and zeta_lambda from the "
11451151
"original fit. Expected on the results object but found None."
11461152
)
1153+
# Reproduce the main fit path's Y normalization (Y → (Y - shift) / scale)
1154+
# so Frank-Wolfe sees the same ~O(1) inputs it saw during fit() and the
1155+
# SDID double-difference does not suffer ~6-digit cancellation at
1156+
# extreme Y. See SyntheticDiD.fit() and REGISTRY.md §SyntheticDiD.
1157+
Y_shift = snap.Y_shift
1158+
Y_scale = snap.Y_scale
1159+
zeta_omega_n = zeta_omega / Y_scale
1160+
zeta_lambda_n = zeta_lambda / Y_scale
11471161
noise_level = self.noise_level if self.noise_level is not None else 0.0
1148-
min_decrease = 1e-5 * noise_level if noise_level > 0 else 1e-5
1162+
noise_level_n = noise_level / Y_scale
1163+
min_decrease = 1e-5 * noise_level_n if noise_level_n > 0 else 1e-5
11491164

11501165
# Build the list of (fake_period, position) pairs to iterate.
11511166
period_to_idx = {p: i for i, p in enumerate(pre_periods)}
@@ -1195,29 +1210,29 @@ def in_time_placebo(
11951210
rows.append(row)
11961211
continue
11971212

1198-
Y_pre_c = snap.Y_pre_control[:i, :]
1199-
Y_post_c = snap.Y_pre_control[i:, :]
1200-
Y_pre_t = snap.Y_pre_treated[:i, :]
1201-
Y_post_t = snap.Y_pre_treated[i:, :]
1213+
Y_pre_c_n = (snap.Y_pre_control[:i, :] - Y_shift) / Y_scale
1214+
Y_post_c_n = (snap.Y_pre_control[i:, :] - Y_shift) / Y_scale
1215+
Y_pre_t_n = (snap.Y_pre_treated[:i, :] - Y_shift) / Y_scale
1216+
Y_post_t_n = (snap.Y_pre_treated[i:, :] - Y_shift) / Y_scale
12021217

12031218
if snap.w_treated is not None:
12041219
w_t = snap.w_treated
1205-
y_pre_t_mean = np.average(Y_pre_t, axis=1, weights=w_t)
1206-
y_post_t_mean = np.average(Y_post_t, axis=1, weights=w_t)
1220+
y_pre_t_mean_n = np.average(Y_pre_t_n, axis=1, weights=w_t)
1221+
y_post_t_mean_n = np.average(Y_post_t_n, axis=1, weights=w_t)
12071222
else:
1208-
y_pre_t_mean = np.mean(Y_pre_t, axis=1)
1209-
y_post_t_mean = np.mean(Y_post_t, axis=1)
1223+
y_pre_t_mean_n = np.mean(Y_pre_t_n, axis=1)
1224+
y_post_t_mean_n = np.mean(Y_post_t_n, axis=1)
12101225

12111226
omega_fake = compute_sdid_unit_weights(
1212-
Y_pre_c,
1213-
y_pre_t_mean,
1214-
zeta_omega=zeta_omega,
1227+
Y_pre_c_n,
1228+
y_pre_t_mean_n,
1229+
zeta_omega=zeta_omega_n,
12151230
min_decrease=min_decrease,
12161231
)
12171232
lambda_fake = compute_time_weights(
1218-
Y_pre_c,
1219-
Y_post_c,
1220-
zeta_lambda=zeta_lambda,
1233+
Y_pre_c_n,
1234+
Y_post_c_n,
1235+
zeta_lambda=zeta_lambda_n,
12211236
min_decrease=min_decrease,
12221237
)
12231238

@@ -1231,20 +1246,22 @@ def in_time_placebo(
12311246
else:
12321247
omega_eff_fake = omega_fake
12331248

1234-
att_fake = compute_sdid_estimator(
1235-
Y_pre_c,
1236-
Y_post_c,
1237-
y_pre_t_mean,
1238-
y_post_t_mean,
1249+
att_fake_n = compute_sdid_estimator(
1250+
Y_pre_c_n,
1251+
Y_post_c_n,
1252+
y_pre_t_mean_n,
1253+
y_post_t_mean_n,
12391254
omega_eff_fake,
12401255
lambda_fake,
12411256
)
1242-
synthetic_pre_fake = Y_pre_c @ omega_eff_fake
1243-
pre_fit = float(
1244-
np.sqrt(np.mean((y_pre_t_mean - synthetic_pre_fake) ** 2))
1257+
synthetic_pre_fake_n = Y_pre_c_n @ omega_eff_fake
1258+
pre_fit_n = float(
1259+
np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_fake_n) ** 2))
12451260
)
1246-
row["att"] = float(att_fake)
1247-
row["pre_fit_rmse"] = pre_fit
1261+
# ATT is scale-equivariant and shift-invariant in Y; RMSE is
1262+
# scale-equivariant. Rescale back to original-Y units.
1263+
row["att"] = float(att_fake_n * Y_scale)
1264+
row["pre_fit_rmse"] = pre_fit_n * Y_scale
12481265
rows.append(row)
12491266

12501267
return pd.DataFrame(rows)
@@ -1320,19 +1337,29 @@ def sensitivity_to_zeta_omega(
13201337
else:
13211338
zeta_values = [float(z) for z in zeta_grid]
13221339

1340+
# Reproduce the main fit path's Y normalization so FW sees ~O(1)
1341+
# inputs; see in_time_placebo() for the same pattern.
1342+
Y_shift = snap.Y_shift
1343+
Y_scale = snap.Y_scale
13231344
noise_level = self.noise_level if self.noise_level is not None else 0.0
1324-
min_decrease = 1e-5 * noise_level if noise_level > 0 else 1e-5
1345+
noise_level_n = noise_level / Y_scale
1346+
min_decrease = 1e-5 * noise_level_n if noise_level_n > 0 else 1e-5
1347+
1348+
Y_pre_control_n = (snap.Y_pre_control - Y_shift) / Y_scale
1349+
Y_post_control_n = (snap.Y_post_control - Y_shift) / Y_scale
1350+
Y_pre_treated_n = (snap.Y_pre_treated - Y_shift) / Y_scale
1351+
Y_post_treated_n = (snap.Y_post_treated - Y_shift) / Y_scale
13251352

13261353
if snap.w_treated is not None:
1327-
y_pre_t_mean = np.average(
1328-
snap.Y_pre_treated, axis=1, weights=snap.w_treated
1354+
y_pre_t_mean_n = np.average(
1355+
Y_pre_treated_n, axis=1, weights=snap.w_treated
13291356
)
1330-
y_post_t_mean = np.average(
1331-
snap.Y_post_treated, axis=1, weights=snap.w_treated
1357+
y_post_t_mean_n = np.average(
1358+
Y_post_treated_n, axis=1, weights=snap.w_treated
13321359
)
13331360
else:
1334-
y_pre_t_mean = np.mean(snap.Y_pre_treated, axis=1)
1335-
y_post_t_mean = np.mean(snap.Y_post_treated, axis=1)
1361+
y_pre_t_mean_n = np.mean(Y_pre_treated_n, axis=1)
1362+
y_post_t_mean_n = np.mean(Y_post_treated_n, axis=1)
13361363

13371364
columns = [
13381365
"zeta_omega",
@@ -1348,9 +1375,9 @@ def sensitivity_to_zeta_omega(
13481375
rows: List[Dict[str, Any]] = []
13491376
for z in zeta_values:
13501377
omega_fake = compute_sdid_unit_weights(
1351-
snap.Y_pre_control,
1352-
y_pre_t_mean,
1353-
zeta_omega=z,
1378+
Y_pre_control_n,
1379+
y_pre_t_mean_n,
1380+
zeta_omega=z / Y_scale,
13541381
min_decrease=min_decrease,
13551382
)
13561383
if snap.w_control is not None:
@@ -1371,22 +1398,24 @@ def sensitivity_to_zeta_omega(
13711398
else:
13721399
omega_eff = omega_fake
13731400

1374-
att = compute_sdid_estimator(
1375-
snap.Y_pre_control,
1376-
snap.Y_post_control,
1377-
y_pre_t_mean,
1378-
y_post_t_mean,
1401+
att_n = compute_sdid_estimator(
1402+
Y_pre_control_n,
1403+
Y_post_control_n,
1404+
y_pre_t_mean_n,
1405+
y_post_t_mean_n,
13791406
omega_eff,
13801407
time_weights,
13811408
)
1382-
synthetic_pre = snap.Y_pre_control @ omega_eff
1383-
pre_fit = float(np.sqrt(np.mean((y_pre_t_mean - synthetic_pre) ** 2)))
1409+
synthetic_pre_n = Y_pre_control_n @ omega_eff
1410+
pre_fit_n = float(np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_n) ** 2)))
13841411
herf = float(np.sum(omega_eff ** 2))
13851412
rows.append(
13861413
{
13871414
"zeta_omega": z,
1388-
"att": float(att),
1389-
"pre_fit_rmse": pre_fit,
1415+
# Unit weights are scale-invariant; ATT and RMSE are
1416+
# scale-equivariant. Report original-Y units.
1417+
"att": float(att_n * Y_scale),
1418+
"pre_fit_rmse": pre_fit_n * Y_scale,
13901419
"max_unit_weight": float(np.max(omega_eff)),
13911420
"effective_n": float("nan") if herf == 0 else 1.0 / herf,
13921421
}

diff_diff/synthetic_did.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -667,6 +667,8 @@ def fit( # type: ignore[override]
667667
post_periods=list(post_periods),
668668
w_control=w_control,
669669
w_treated=w_treated,
670+
Y_shift=Y_shift,
671+
Y_scale=Y_scale,
670672
)
671673

672674
# Freeze the public diagnostic arrays so mutation via the results

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1519,7 +1519,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
15191519
- **Jackknife with non-finite LOO estimate**: Returns NaN SE. Unlike bootstrap/placebo, jackknife is deterministic and cannot skip failed iterations; NaN propagates through `var()` (matches R behavior).
15201520
- **Jackknife with survey weights**: Guards on effective positive support (omega * w_control > 0 and w_treated > 0) after composition, not raw FW counts. Returns NaN SE if fewer than 2 effective controls or 2 positive-weight treated units. Per-iteration zero-sum guards return NaN for individual LOO iterations when remaining composed weights sum to zero.
15211521
- **Note:** Survey support: weights, strata, PSU, and FPC are all supported. Full-design surveys use Rao-Wu rescaled bootstrap (Phase 6); non-bootstrap variance methods (`variance_method="placebo"` or `"jackknife"`) require weights-only (strata/PSU/FPC require bootstrap). Both sides weighted per WLS regression interpretation: treated-side means are survey-weighted (Frank-Wolfe target and ATT formula); control-side synthetic weights are composed with survey weights post-optimization (ω_eff = ω * w_co, renormalized). Frank-Wolfe optimization itself is unweighted — survey importance enters after trajectory-matching. Covariate residualization uses WLS with survey weights. Placebo, jackknife, and bootstrap SE preserve survey weights on both sides.
1522-
- **Note:** Internal Y normalization. Before weight optimization, the estimator, and variance procedures, `fit()` centers Y by `mean(Y_pre_control)` and scales by `std(Y_pre_control)`; `Y_scale` falls back to `1.0` when std is non-finite or below `1e-12 * max(|mean|, 1)`. Auto-regularization and `noise_level` are computed on normalized Y; user-supplied `zeta_omega` / `zeta_lambda` are divided by `Y_scale` internally for Frank-Wolfe. τ, SE, CI, the placebo/bootstrap/jackknife effect vectors, `results_.noise_level`, and `results_.zeta_omega` / `results_.zeta_lambda` are all reported on the user's original outcome scale (user-supplied zetas are echoed back exactly to avoid float roundoff). Mathematically a no-op — τ is location-invariant and scale-equivariant, and FW weights are invariant under `(Y, ζ) → (Y/s, ζ/s)` — but prevents catastrophic cancellation in the SDID double-difference when outcomes span millions-to-billions (see synth-inference/synthdid#71 for the R-package version of this issue). Normalization constants are derived from controls' pre-period only so the reference is unaffected by treatment. Scope: `in_time_placebo()` and `sensitivity_to_zeta_omega()` continue to run on the stored original-scale snapshot with original-scale zetas (preserving pre-fix behavior); extending normalization into those diagnostic paths is tracked separately.
1522+
- **Note:** Internal Y normalization. Before weight optimization, the estimator, and variance procedures, `fit()` centers Y by `mean(Y_pre_control)` and scales by `std(Y_pre_control)`; `Y_scale` falls back to `1.0` when std is non-finite or below `1e-12 * max(|mean|, 1)`. Auto-regularization and `noise_level` are computed on normalized Y; user-supplied `zeta_omega` / `zeta_lambda` are divided by `Y_scale` internally for Frank-Wolfe. τ, SE, CI, the placebo/bootstrap/jackknife effect vectors, `results_.noise_level`, and `results_.zeta_omega` / `results_.zeta_lambda` are all reported on the user's original outcome scale (user-supplied zetas are echoed back exactly to avoid float roundoff). Mathematically a no-op — τ is location-invariant and scale-equivariant, and FW weights are invariant under `(Y, ζ) → (Y/s, ζ/s)` — but prevents catastrophic cancellation in the SDID double-difference when outcomes span millions-to-billions (see synth-inference/synthdid#71 for the R-package version of this issue). Normalization constants are derived from controls' pre-period only so the reference is unaffected by treatment. `in_time_placebo()` and `sensitivity_to_zeta_omega()` reuse the exact same `Y_shift` / `Y_scale` captured on the fit snapshot: they normalize the re-sliced arrays before re-running Frank-Wolfe, pass `zeta / Y_scale` to the weight solvers, and rescale the returned `att` and `pre_fit_rmse` by `Y_scale` before reporting; unit-weight diagnostics (`max_unit_weight`, `effective_n`) are scale-invariant and reported directly.
15231523

15241524
*Validation diagnostics (post-fit methods on `SyntheticDiDResults`):*
15251525

0 commit comments

Comments
 (0)