Relabel IF scaling as implementation choice (not deviation), fix control label

igerber · claude · igerber · commit 9c8ebb48617f · 2026-03-30T09:27:49.000-04:00
REGISTRY.md: Change panel/RC IF scaling notes from "Note (deviation from R)"
to "Note" — these are algebraically equivalent factorizations of the same
M-estimation formula (implementation choice per review Rule 5), not
methodology deviations. Explicitly reference Rule 5 analogy (Cholesky vs QR).

Also document bootstrap diagonal VCV as deviation, fix "Control" label
back to "Never-treated" for clarity under not_yet_treated.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py
@@ -158,7 +158,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
             "",
             f"{'Total observations:':<30} {self.n_obs:>10}",
             f"{'Treated ' + ('obs:' if not self.panel else 'units:'):<30} {self.n_treated_units:>10}",
-            f"{'Control ' + ('obs:' if not self.panel else 'units:'):<30} {self.n_control_units:>10}",
+            f"{'Never-treated ' + ('obs:' if not self.panel else 'units:'):<30} {self.n_control_units:>10}",
             f"{'Treatment cohorts:':<30} {len(self.groups):>10}",
             f"{'Time periods:':<30} {len(self.time_periods):>10}",
             f"{'Control group:':<30} {self.control_group:>10}",
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -418,12 +418,12 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
     not-yet-treated cohorts serve as controls for each other (requires ≥2 cohorts)
 - **Note:** CallawaySantAnna survey support: weights, strata, PSU, and FPC are all supported for all estimation methods (reg, ipw, dr) with or without covariates. Analytical (`n_bootstrap=0`): aggregated SEs use design-based variance via `compute_survey_if_variance()`. Bootstrap (`n_bootstrap>0`): PSU-level multiplier weights replace analytical SEs for aggregated quantities. IPW and DR with covariates use DRDID panel nuisance IF corrections (Phase 7a: PS IF correction via survey-weighted Hessian/score, OR IF correction via WLS bread and gradient; Sant'Anna & Zhao 2020, Theorem 3.1). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Per-unit survey weights are extracted via `groupby(unit).first()` from the panel-normalized pweight array; on unbalanced panels the pweight normalization (`w * n_obs / sum(w)`) preserves relative unit weights since all IF/WIF formulas use weight ratios (`sw_i / sum(sw)`) where the normalization constant cancels. Scale-invariance tests pass on both balanced and unbalanced panels.
 - **Note (deviation from R):** Panel DR control augmentation is normalized by treated mass (`sw_t_sum` or `n_t`) rather than control IPW mass (`sum(w_cont)`). R's `DRDID::drdid_panel` uses `mean(w.cont)` as the control normalizer. Both are consistent asymptotically (under correct model specification, `E[w_cont] = E[D]` so the normalizers converge), but they differ in finite samples when IPW reweighting doesn't perfectly balance. The treated-mass normalization is simpler and matches the `did::att_gt` convention where ATT is defined per treated unit. Aligning to `DRDID::drdid_panel`'s exact `w.cont` normalization is deferred.
-- **Note (deviation from R):** Panel IPW/DR PS nuisance corrections use `H = X'WX/n`, `asy_lin_rep = score @ solve(H) / n`, `M2 = colMeans(control_terms)`. This is algebraically equivalent to R's DRDID formulation: `H/n` cancels `1/n` in `asy_lin_rep`, giving `score @ (X'WX)^{-1}` on the library's phi scale; `colMeans` then provides the correct `O(1)` gradient. Confirmed by analytical-vs-bootstrap SE convergence: panel IPW ratio=0.982, panel DR ratio=0.983 (n=300, 999 bootstrap iterations).
+- **Note:** Panel and RC nuisance IF corrections use an algebraically equivalent factorization of the DRDID Hessian/score/gradient: `H = X'WX/n`, `asy_lin_rep = score @ solve(H) / n`, `M2 = colMeans(control_terms)`. This reduces to `score @ (X'WX)^{-1}` times `colMeans`, which is the same mathematical operation as R's `solve(X'WX) * n` times `colMeans / mean(w)` divided by `n` — the factors cancel identically. This is an implementation choice between valid numerical approaches to the same M-estimation formula (analogous to Cholesky vs QR), not a methodology deviation. Confirmed by analytical-vs-bootstrap SE convergence: panel IPW ratio=0.982, panel DR ratio=0.983 (n=300, 999 bootstrap iterations).
 - **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
 - **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. When strata/PSU/FPC are present, analytical aggregated SEs (`n_bootstrap=0`) use `compute_survey_if_variance()` on the combined IF/WIF; bootstrap aggregated SEs (`n_bootstrap>0`) use PSU-level multiplier weights.
 
 - **Note:** Repeated cross-sections (`panel=False`, Phase 7b): supports surveys like BRFSS, ACS annual, and CPS monthly where units are not followed over time. Uses cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4): `reg` matches `DRDID::reg_did_rc` (Eq 2.2), `dr` matches `DRDID::drdid_rc` (locally efficient, Eq 3.3+3.4 with 4 OLS fits), `ipw` matches `DRDID::std_ipw_did_rc`. Per-observation influence functions instead of per-unit. All three estimation methods support covariates and survey weights.
-- **Note (deviation from R):** RCS influence functions use `phi_i = psi_i / n` convention (SE = `sqrt(sum(phi^2))`), matching the library-wide IF convention where IFs are pre-scaled by `1/n`. R's DRDID uses `psi_i` directly with `SE = sd(psi) / sqrt(n)`. These are algebraically equivalent — `sqrt(sum(psi^2/n^2)) = sqrt(sum(psi^2))/n ≈ sd(psi)/sqrt(n)` — confirmed by analytical-vs-bootstrap SE convergence tests. The `1/n_all` denominator in gradient terms (`M1`, `M2`) is not "extra shrinkage" but the `colMeans` → phi convention conversion.
+- **Note:** RCS influence functions use the library-wide `phi_i = psi_i / n` convention (SE = `sqrt(sum(phi^2))`). R's DRDID uses `psi_i` directly with `SE = sd(psi) / sqrt(n)`. These are algebraically equivalent: `sqrt(sum(psi^2/n^2)) = sqrt(sum(psi^2))/n ≈ sd(psi)/sqrt(n)`. This is an implementation choice (different valid numerical approach to the same SE formula), not a methodology deviation. The `1/n_all` denominator in gradient terms is the `colMeans` → phi convention conversion — confirmed by analytical-vs-bootstrap SE convergence within 3% for all methods.
 - **Note:** Non-survey DR path also includes nuisance IF corrections (PS + OR), matching the survey path structure (Phase 7a). Previously used plug-in IF only.
 
 **Reference implementation(s):**
@@ -1638,6 +1638,7 @@ Confidence intervals:
 - M=0: reduces to standard parallel trends
 - Negative M: not valid (constraints become infeasible)
 - **Note:** Phase 7d: survey variance support. When input results carry `survey_metadata` with `df_survey`, HonestDiD uses t-distribution critical values (via `_get_critical_value(alpha, df)`) instead of normal. CallawaySantAnnaResults now stores `event_study_vcov` (full cross-event-time VCV from IF vectors), which HonestDiD uses instead of the diagonal fallback. For replicate-weight designs, the event-study VCV falls back to diagonal (multivariate replicate VCV deferred).
+- **Note (deviation from R):** When HonestDiD receives bootstrap-fitted CallawaySantAnna results (`n_bootstrap > 0`), the full event-study covariance is unavailable (cleared to prevent mixing analytical VCV with bootstrap SEs). HonestDiD falls back to `diag(se^2)` from the bootstrap SEs with a UserWarning. R's `honest_did.AGGTEobj` computes a full covariance from the influence function matrix; implementing bootstrap event-study covariance is deferred. For full covariance structure in HonestDiD, use analytical SEs (`n_bootstrap=0`).
 - **Note (deviation from R):** When CallawaySantAnna results are passed to HonestDiD, `base_period != "universal"` emits a warning but does not error. R's `honest_did::honest_did.AGGTEobj` requires universal base period. Our implementation warns because the varying-base pre-treatment coefficients use consecutive comparisons (not a common reference), which changes the parallel-trends restriction interpretation.
 
 **Reference implementation(s):**