Match R exactly: H = X'WX (no /n), asy_rep = score @ inv(H) (no /n)

igerber · claude · igerber · commit 6789e37101ce · 2026-03-30T10:27:06.000-04:00
Drop the H/n, asy_rep/n convention from all three panel PS correction
sites. Now uses R's direct formulation: H = X'WX, asy_rep = score @ inv(H),
M2 = colMeans (sum over control terms / n_all). The /n factors were
algebraically canceling but confused the static reviewer.

Also: add duplicate unit-ID check for panel=False.
Update REGISTRY note to reflect the simpler H = X'WX convention.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/staggered.py b/diff_diff/staggered.py
@@ -1336,6 +1336,14 @@ def fit(
                 stacklevel=2,
             )
 
+        # Validate unique unit IDs for panel=False
+        if not self.panel:
+            if data[unit].duplicated().any():
+                raise ValueError(
+                    "panel=False requires unique unit IDs (one observation per unit). "
+                    "Found duplicate unit IDs. If your data is a panel, use panel=True."
+                )
+
         # Normalize empty covariates list to None
         if covariates is not None and len(covariates) == 0:
             covariates = None
@@ -2048,27 +2056,29 @@ def _ipw_estimation(
                 X_all_int = np.column_stack([np.ones(n_t + n_c), X_all])
                 pscore_all = np.concatenate([pscore_treated, pscore_control])
 
-                # PS IF correction — matches R's std_ipw_did_panel convention:
-                # H = X'WX / n, asy_lin_rep = score @ solve(H) / n, M2 = colMeans
+                # PS IF correction — R convention: H = X'WX, M2 = colMeans
                 n_all_panel = n_t + n_c
                 W_ps = pscore_all * (1 - pscore_all)
                 if sw_all is not None:
                     W_ps = W_ps * sw_all
-                H = X_all_int.T @ (W_ps[:, None] * X_all_int) / n_all_panel
+                H = X_all_int.T @ (W_ps[:, None] * X_all_int)
                 H_inv = _safe_inv(H)
 
                 D_all = np.concatenate([np.ones(n_t), np.zeros(n_c)])
                 score_ps = (D_all - pscore_all)[:, None] * X_all_int
                 if sw_all is not None:
                     score_ps = score_ps * sw_all[:, None]
-                asy_lin_rep_ps = score_ps @ H_inv / n_all_panel
+                asy_lin_rep_ps = score_ps @ H_inv
 
                 att_control_weighted = np.sum(weights_control_norm * control_change)
-                # R: colMeans over control rows (n_c denominator, not n_all)
-                M2 = np.mean(
-                    (weights_control_norm * (control_change - att_control_weighted))[:, None]
-                    * X_all_int[n_t:],
-                    axis=0,
+                # R: colMeans over ALL n obs (treated rows contribute zero)
+                M2 = (
+                    np.sum(
+                        (weights_control_norm * (control_change - att_control_weighted))[:, None]
+                        * X_all_int[n_t:],
+                        axis=0,
+                    )
+                    / n_all_panel
                 )
 
                 inf_func = inf_func + asy_lin_rep_ps @ M2
@@ -2306,19 +2316,19 @@ def _doubly_robust(
                     )
                     pscore_all = np.concatenate([pscore_treated_clipped, pscore_control])
 
-                    # PS IF correction — R convention: H/n, asy_rep/n, colMeans
+                    # PS IF correction — R convention: H = X'WX, M2 = colMeans
                     n_all_panel = n_t + n_c
                     W_ps = pscore_all * (1 - pscore_all)
                     if sw_all is not None:
                         W_ps = W_ps * sw_all
-                    H_ps = X_all_int.T @ (W_ps[:, None] * X_all_int) / n_all_panel
+                    H_ps = X_all_int.T @ (W_ps[:, None] * X_all_int)
                     H_ps_inv = _safe_inv(H_ps)
 
                     D_all = np.concatenate([np.ones(n_t), np.zeros(n_c)])
                     score_ps = (D_all - pscore_all)[:, None] * X_all_int
                     if sw_all is not None:
                         score_ps = score_ps * sw_all[:, None]
-                    asy_lin_rep_ps = score_ps @ H_ps_inv / n_all_panel
+                    asy_lin_rep_ps = score_ps @ H_ps_inv
 
                     dr_resid_control = m_control - control_change
                     M2_dr = np.mean(
@@ -2375,12 +2385,12 @@ def _doubly_robust(
                     pscore_all = np.concatenate([pscore_treated_clipped, pscore_control])
 
                     W_ps = pscore_all * (1 - pscore_all)
-                    H_ps = X_all_int.T @ (W_ps[:, None] * X_all_int) / n_all_panel
+                    H_ps = X_all_int.T @ (W_ps[:, None] * X_all_int)
                     H_ps_inv = _safe_inv(H_ps)
 
                     D_all = np.concatenate([np.ones(n_t), np.zeros(n_c)])
                     score_ps = (D_all - pscore_all)[:, None] * X_all_int
-                    asy_lin_rep_ps = score_ps @ H_ps_inv / n_all_panel
+                    asy_lin_rep_ps = score_ps @ H_ps_inv
 
                     dr_resid_control = m_control - control_change
                     M2_dr = np.mean(
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -418,7 +418,7 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
     not-yet-treated cohorts serve as controls for each other (requires ≥2 cohorts)
 - **Note:** CallawaySantAnna survey support: weights, strata, PSU, and FPC are all supported for all estimation methods (reg, ipw, dr) with or without covariates. Analytical (`n_bootstrap=0`): aggregated SEs use design-based variance via `compute_survey_if_variance()`. Bootstrap (`n_bootstrap>0`): PSU-level multiplier weights replace analytical SEs for aggregated quantities. IPW and DR with covariates use DRDID panel nuisance IF corrections (Phase 7a: PS IF correction via survey-weighted Hessian/score, OR IF correction via WLS bread and gradient; Sant'Anna & Zhao 2020, Theorem 3.1). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Per-unit survey weights are extracted via `groupby(unit).first()` from the panel-normalized pweight array; on unbalanced panels the pweight normalization (`w * n_obs / sum(w)`) preserves relative unit weights since all IF/WIF formulas use weight ratios (`sw_i / sum(sw)`) where the normalization constant cancels. Scale-invariance tests pass on both balanced and unbalanced panels.
 - **Note (deviation from R):** Panel DR control augmentation is normalized by treated mass (`sw_t_sum` or `n_t`) rather than control IPW mass (`sum(w_cont)`). R's `DRDID::drdid_panel` uses `mean(w.cont)` as the control normalizer. Both are consistent asymptotically (under correct model specification, `E[w_cont] = E[D]` so the normalizers converge), but they differ in finite samples when IPW reweighting doesn't perfectly balance. The treated-mass normalization is simpler and matches the `did::att_gt` convention where ATT is defined per treated unit. Aligning to `DRDID::drdid_panel`'s exact `w.cont` normalization is deferred.
-- **Note:** Panel and RC nuisance IF corrections use `H = X'WX/n`, `asy_lin_rep = score @ solve(H) / n`, `M2 = colMeans(control_terms)` (mean over control rows, n_c denominator). This matches R's DRDID convention: `solve(crossprod(X)/n)` for the Hessian, `colMeans(...)` for gradients.
+- **Note:** Panel and RC nuisance IF corrections use R's DRDID convention: `H = X'WX`, `asy_lin_rep = score @ solve(H)`, `M2 = colMeans(w_cont * stuff * X)` where colMeans is over all n observations (treated rows contribute zero, so denominator is n not n_c). This directly matches R's `solve(crossprod(X * sqrt(W)))` and `colMeans(...)` formulation.
 - **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
 - **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. When strata/PSU/FPC are present, analytical aggregated SEs (`n_bootstrap=0`) use `compute_survey_if_variance()` on the combined IF/WIF; bootstrap aggregated SEs (`n_bootstrap>0`) use PSU-level multiplier weights.