Extend PR #312 Y-normalization contract into SDID diagnostic methods

igerber · claude · igerber · commit 37bb13ee0257 · 2026-04-19T06:27:12.000-04:00
PR #312 centered and scaled Y once in SyntheticDiD.fit() to avoid catastrophic cancellation in the SDID double-difference at extreme Y. The fix did not reach the two public diagnostic methods on SyntheticDiDResults: in_time_placebo() and sensitivity_to_zeta_omega() both re-run Frank-Wolfe on the original-scale fit snapshot with original-scale zetas, so at Y ~ 1e9 they reproduce the same silent failure the main path was fixed for. Capture Y_shift and Y_scale on the fit snapshot, apply the same (Y - shift) / scale normalization inside the two diagnostic methods, pass zeta / Y_scale and min_decrease derived from the normalized noise level to the FW weight solvers, and rescale att / pre_fit_rmse back to original-Y units before reporting. Unit-weight diagnostics (max_unit_weight, effective_n) are scale-invariant and reported without rescaling. Regression coverage: - TestDiagnosticScaleParity: att / pre_fit_rmse must scale by |a| across (Y -> a*Y + b) for both diagnostic methods; a no-effect DGP at Y ~ 1e9 must produce finite placebo atts landing within 5 * noise_level. - TestHeterogeneousAndRampingScale: cross-unit heterogeneous scale (units spanning 1e6 to 1e9) and cross-period ramping (trend growing 4 orders of magnitude across periods) must leave the fit and both diagnostic surfaces finite. These are the heterogeneity pathways the original TestScaleEquivariance (affine-only) suite did not cover. Covered by audit axis A (numerical precision / scale fragility). Findings D-4 and D-4b from docs/audits/silent-failures-findings.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/diff_diff/results.py b/diff_diff/results.py
@@ -669,6 +669,12 @@ class _SyntheticDiDFitSnapshot:
     post_periods: List[Any]
     w_control: Optional[np.ndarray] = None
     w_treated: Optional[np.ndarray] = None
+    # Normalization constants captured during fit() so diagnostic methods can
+    # reproduce the main path's centering+scaling and avoid catastrophic
+    # cancellation on extreme-Y panels. Defaults preserve behavior for
+    # snapshots built before these fields existed.
+    Y_shift: float = 0.0
+    Y_scale: float = 1.0
 
     def __post_init__(self):
         for arr in (
@@ -1144,8 +1150,17 @@ def in_time_placebo(
                 "in_time_placebo() needs zeta_omega and zeta_lambda from the "
                 "original fit. Expected on the results object but found None."
             )
+        # Reproduce the main fit path's Y normalization (Y → (Y - shift) / scale)
+        # so Frank-Wolfe sees the same ~O(1) inputs it saw during fit() and the
+        # SDID double-difference does not suffer ~6-digit cancellation at
+        # extreme Y. See SyntheticDiD.fit() and REGISTRY.md §SyntheticDiD.
+        Y_shift = snap.Y_shift
+        Y_scale = snap.Y_scale
+        zeta_omega_n = zeta_omega / Y_scale
+        zeta_lambda_n = zeta_lambda / Y_scale
         noise_level = self.noise_level if self.noise_level is not None else 0.0
-        min_decrease = 1e-5 * noise_level if noise_level > 0 else 1e-5
+        noise_level_n = noise_level / Y_scale
+        min_decrease = 1e-5 * noise_level_n if noise_level_n > 0 else 1e-5
 
         # Build the list of (fake_period, position) pairs to iterate.
         period_to_idx = {p: i for i, p in enumerate(pre_periods)}
@@ -1195,29 +1210,29 @@ def in_time_placebo(
                 rows.append(row)
                 continue
 
-            Y_pre_c = snap.Y_pre_control[:i, :]
-            Y_post_c = snap.Y_pre_control[i:, :]
-            Y_pre_t = snap.Y_pre_treated[:i, :]
-            Y_post_t = snap.Y_pre_treated[i:, :]
+            Y_pre_c_n = (snap.Y_pre_control[:i, :] - Y_shift) / Y_scale
+            Y_post_c_n = (snap.Y_pre_control[i:, :] - Y_shift) / Y_scale
+            Y_pre_t_n = (snap.Y_pre_treated[:i, :] - Y_shift) / Y_scale
+            Y_post_t_n = (snap.Y_pre_treated[i:, :] - Y_shift) / Y_scale
 
             if snap.w_treated is not None:
                 w_t = snap.w_treated
-                y_pre_t_mean = np.average(Y_pre_t, axis=1, weights=w_t)
-                y_post_t_mean = np.average(Y_post_t, axis=1, weights=w_t)
+                y_pre_t_mean_n = np.average(Y_pre_t_n, axis=1, weights=w_t)
+                y_post_t_mean_n = np.average(Y_post_t_n, axis=1, weights=w_t)
             else:
-                y_pre_t_mean = np.mean(Y_pre_t, axis=1)
-                y_post_t_mean = np.mean(Y_post_t, axis=1)
+                y_pre_t_mean_n = np.mean(Y_pre_t_n, axis=1)
+                y_post_t_mean_n = np.mean(Y_post_t_n, axis=1)
 
             omega_fake = compute_sdid_unit_weights(
-                Y_pre_c,
-                y_pre_t_mean,
-                zeta_omega=zeta_omega,
+                Y_pre_c_n,
+                y_pre_t_mean_n,
+                zeta_omega=zeta_omega_n,
                 min_decrease=min_decrease,
             )
             lambda_fake = compute_time_weights(
-                Y_pre_c,
-                Y_post_c,
-                zeta_lambda=zeta_lambda,
+                Y_pre_c_n,
+                Y_post_c_n,
+                zeta_lambda=zeta_lambda_n,
                 min_decrease=min_decrease,
             )
 
@@ -1231,20 +1246,22 @@ def in_time_placebo(
             else:
                 omega_eff_fake = omega_fake
 
-            att_fake = compute_sdid_estimator(
-                Y_pre_c,
-                Y_post_c,
-                y_pre_t_mean,
-                y_post_t_mean,
+            att_fake_n = compute_sdid_estimator(
+                Y_pre_c_n,
+                Y_post_c_n,
+                y_pre_t_mean_n,
+                y_post_t_mean_n,
                 omega_eff_fake,
                 lambda_fake,
             )
-            synthetic_pre_fake = Y_pre_c @ omega_eff_fake
-            pre_fit = float(
-                np.sqrt(np.mean((y_pre_t_mean - synthetic_pre_fake) ** 2))
+            synthetic_pre_fake_n = Y_pre_c_n @ omega_eff_fake
+            pre_fit_n = float(
+                np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_fake_n) ** 2))
             )
-            row["att"] = float(att_fake)
-            row["pre_fit_rmse"] = pre_fit
+            # ATT is scale-equivariant and shift-invariant in Y; RMSE is
+            # scale-equivariant. Rescale back to original-Y units.
+            row["att"] = float(att_fake_n * Y_scale)
+            row["pre_fit_rmse"] = pre_fit_n * Y_scale
             rows.append(row)
 
         return pd.DataFrame(rows)
@@ -1320,19 +1337,29 @@ def sensitivity_to_zeta_omega(
         else:
             zeta_values = [float(z) for z in zeta_grid]
 
+        # Reproduce the main fit path's Y normalization so FW sees ~O(1)
+        # inputs; see in_time_placebo() for the same pattern.
+        Y_shift = snap.Y_shift
+        Y_scale = snap.Y_scale
         noise_level = self.noise_level if self.noise_level is not None else 0.0
-        min_decrease = 1e-5 * noise_level if noise_level > 0 else 1e-5
+        noise_level_n = noise_level / Y_scale
+        min_decrease = 1e-5 * noise_level_n if noise_level_n > 0 else 1e-5
+
+        Y_pre_control_n = (snap.Y_pre_control - Y_shift) / Y_scale
+        Y_post_control_n = (snap.Y_post_control - Y_shift) / Y_scale
+        Y_pre_treated_n = (snap.Y_pre_treated - Y_shift) / Y_scale
+        Y_post_treated_n = (snap.Y_post_treated - Y_shift) / Y_scale
 
         if snap.w_treated is not None:
-            y_pre_t_mean = np.average(
-                snap.Y_pre_treated, axis=1, weights=snap.w_treated
+            y_pre_t_mean_n = np.average(
+                Y_pre_treated_n, axis=1, weights=snap.w_treated
             )
-            y_post_t_mean = np.average(
-                snap.Y_post_treated, axis=1, weights=snap.w_treated
+            y_post_t_mean_n = np.average(
+                Y_post_treated_n, axis=1, weights=snap.w_treated
             )
         else:
-            y_pre_t_mean = np.mean(snap.Y_pre_treated, axis=1)
-            y_post_t_mean = np.mean(snap.Y_post_treated, axis=1)
+            y_pre_t_mean_n = np.mean(Y_pre_treated_n, axis=1)
+            y_post_t_mean_n = np.mean(Y_post_treated_n, axis=1)
 
         columns = [
             "zeta_omega",
@@ -1348,9 +1375,9 @@ def sensitivity_to_zeta_omega(
         rows: List[Dict[str, Any]] = []
         for z in zeta_values:
             omega_fake = compute_sdid_unit_weights(
-                snap.Y_pre_control,
-                y_pre_t_mean,
-                zeta_omega=z,
+                Y_pre_control_n,
+                y_pre_t_mean_n,
+                zeta_omega=z / Y_scale,
                 min_decrease=min_decrease,
             )
             if snap.w_control is not None:
@@ -1371,22 +1398,24 @@ def sensitivity_to_zeta_omega(
             else:
                 omega_eff = omega_fake
 
-            att = compute_sdid_estimator(
-                snap.Y_pre_control,
-                snap.Y_post_control,
-                y_pre_t_mean,
-                y_post_t_mean,
+            att_n = compute_sdid_estimator(
+                Y_pre_control_n,
+                Y_post_control_n,
+                y_pre_t_mean_n,
+                y_post_t_mean_n,
                 omega_eff,
                 time_weights,
             )
-            synthetic_pre = snap.Y_pre_control @ omega_eff
-            pre_fit = float(np.sqrt(np.mean((y_pre_t_mean - synthetic_pre) ** 2)))
+            synthetic_pre_n = Y_pre_control_n @ omega_eff
+            pre_fit_n = float(np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_n) ** 2)))
             herf = float(np.sum(omega_eff ** 2))
             rows.append(
                 {
                     "zeta_omega": z,
-                    "att": float(att),
-                    "pre_fit_rmse": pre_fit,
+                    # Unit weights are scale-invariant; ATT and RMSE are
+                    # scale-equivariant. Report original-Y units.
+                    "att": float(att_n * Y_scale),
+                    "pre_fit_rmse": pre_fit_n * Y_scale,
                     "max_unit_weight": float(np.max(omega_eff)),
                     "effective_n": float("nan") if herf == 0 else 1.0 / herf,
                 }
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
@@ -667,6 +667,8 @@ def fit(  # type: ignore[override]
             post_periods=list(post_periods),
             w_control=w_control,
             w_treated=w_treated,
+            Y_shift=Y_shift,
+            Y_scale=Y_scale,
         )
 
         # Freeze the public diagnostic arrays so mutation via the results
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -1519,7 +1519,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
 - **Jackknife with non-finite LOO estimate**: Returns NaN SE. Unlike bootstrap/placebo, jackknife is deterministic and cannot skip failed iterations; NaN propagates through `var()` (matches R behavior).
 - **Jackknife with survey weights**: Guards on effective positive support (omega * w_control > 0 and w_treated > 0) after composition, not raw FW counts. Returns NaN SE if fewer than 2 effective controls or 2 positive-weight treated units. Per-iteration zero-sum guards return NaN for individual LOO iterations when remaining composed weights sum to zero.
 - **Note:** Survey support: weights, strata, PSU, and FPC are all supported. Full-design surveys use Rao-Wu rescaled bootstrap (Phase 6); non-bootstrap variance methods (`variance_method="placebo"` or `"jackknife"`) require weights-only (strata/PSU/FPC require bootstrap). Both sides weighted per WLS regression interpretation: treated-side means are survey-weighted (Frank-Wolfe target and ATT formula); control-side synthetic weights are composed with survey weights post-optimization (ω_eff = ω * w_co, renormalized). Frank-Wolfe optimization itself is unweighted — survey importance enters after trajectory-matching. Covariate residualization uses WLS with survey weights. Placebo, jackknife, and bootstrap SE preserve survey weights on both sides.
-- **Note:** Internal Y normalization. Before weight optimization, the estimator, and variance procedures, `fit()` centers Y by `mean(Y_pre_control)` and scales by `std(Y_pre_control)`; `Y_scale` falls back to `1.0` when std is non-finite or below `1e-12 * max(|mean|, 1)`. Auto-regularization and `noise_level` are computed on normalized Y; user-supplied `zeta_omega` / `zeta_lambda` are divided by `Y_scale` internally for Frank-Wolfe. τ, SE, CI, the placebo/bootstrap/jackknife effect vectors, `results_.noise_level`, and `results_.zeta_omega` / `results_.zeta_lambda` are all reported on the user's original outcome scale (user-supplied zetas are echoed back exactly to avoid float roundoff). Mathematically a no-op — τ is location-invariant and scale-equivariant, and FW weights are invariant under `(Y, ζ) → (Y/s, ζ/s)` — but prevents catastrophic cancellation in the SDID double-difference when outcomes span millions-to-billions (see synth-inference/synthdid#71 for the R-package version of this issue). Normalization constants are derived from controls' pre-period only so the reference is unaffected by treatment. Scope: `in_time_placebo()` and `sensitivity_to_zeta_omega()` continue to run on the stored original-scale snapshot with original-scale zetas (preserving pre-fix behavior); extending normalization into those diagnostic paths is tracked separately.
+- **Note:** Internal Y normalization. Before weight optimization, the estimator, and variance procedures, `fit()` centers Y by `mean(Y_pre_control)` and scales by `std(Y_pre_control)`; `Y_scale` falls back to `1.0` when std is non-finite or below `1e-12 * max(|mean|, 1)`. Auto-regularization and `noise_level` are computed on normalized Y; user-supplied `zeta_omega` / `zeta_lambda` are divided by `Y_scale` internally for Frank-Wolfe. τ, SE, CI, the placebo/bootstrap/jackknife effect vectors, `results_.noise_level`, and `results_.zeta_omega` / `results_.zeta_lambda` are all reported on the user's original outcome scale (user-supplied zetas are echoed back exactly to avoid float roundoff). Mathematically a no-op — τ is location-invariant and scale-equivariant, and FW weights are invariant under `(Y, ζ) → (Y/s, ζ/s)` — but prevents catastrophic cancellation in the SDID double-difference when outcomes span millions-to-billions (see synth-inference/synthdid#71 for the R-package version of this issue). Normalization constants are derived from controls' pre-period only so the reference is unaffected by treatment. `in_time_placebo()` and `sensitivity_to_zeta_omega()` reuse the exact same `Y_shift` / `Y_scale` captured on the fit snapshot: they normalize the re-sliced arrays before re-running Frank-Wolfe, pass `zeta / Y_scale` to the weight solvers, and rescale the returned `att` and `pre_fit_rmse` by `Y_scale` before reporting; unit-weight diagnostics (`max_unit_weight`, `effective_n`) are scale-invariant and reported directly.
 
 *Validation diagnostics (post-fit methods on `SyntheticDiDResults`):*
 
diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py

Original file line number	Diff line number	Diff line change
`@@ -667,6 +667,8 @@ def fit( # type: ignore[override]`
`667`	`667`	`post_periods=list(post_periods),`
`668`	`668`	`w_control=w_control,`
`669`	`669`	`w_treated=w_treated,`
	`670`	`+ Y_shift=Y_shift,`
	`671`	`+ Y_scale=Y_scale,`
`670`	`672`	`)`
`671`	`673`
`672`	`674`	`# Freeze the public diagnostic arrays so mutation via the results`