Skip to content

Commit e3d51db

Browse files
igerberclaude
andcommitted
Fix CI review Round 9: suppress normalized_effects under trends, Inf guard
P0: normalized_effects suppressed under trends_linear (was normalizing second-differences DID^{fd}_l instead of level effects). REGISTRY documents that normalized_effects and cost_benefit_delta are both unavailable under trends_linear. P1: Non-finite (Inf) control values now rejected with ValueError during DID^X validation (was silently collapsing first-stage OLS). P3: summary() event study header and row labels now use _horizon_label() (DID^X_l, DID^{fd}_l, DID^{X,fd}_l matching to_dataframe). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent d087f21 commit e3d51db

3 files changed

Lines changed: 21 additions & 12 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -659,6 +659,12 @@ def fit(
659659
f"Control column {c!r} contains {n_nan} NaN value(s). "
660660
"Drop or impute missing covariates before fitting."
661661
)
662+
n_inf = int(np.isinf(data_controls[c].to_numpy()).sum())
663+
if n_inf > 0:
664+
raise ValueError(
665+
f"Control column {c!r} contains {n_inf} Inf value(s). "
666+
"Remove or replace non-finite covariates before fitting."
667+
)
662668
# Aggregate covariates to cell means (same groupby as treatment/outcome).
663669
# Use the coerced copy joined with group/time from original data.
664670
x_agg_input = data[[group, time]].copy()
@@ -1514,14 +1520,17 @@ def fit(
15141520
"n_obs": pl_data["N_pl_l"],
15151521
}
15161522

1517-
# Normalized effects DID^n_l
1518-
normalized_effects_dict = _compute_normalized_effects(
1519-
multi_horizon_dids=multi_horizon_dids,
1520-
D_mat=D_mat,
1521-
baselines=baselines,
1522-
first_switch_idx=first_switch_idx_arr,
1523-
L_max=L_max,
1524-
)
1523+
# Normalized effects DID^n_l (suppressed under trends_linear
1524+
# because event_study_effects holds second-differences DID^{fd}_l,
1525+
# not level effects - normalizing second-differences is wrong)
1526+
if not _is_trends_linear:
1527+
normalized_effects_dict = _compute_normalized_effects(
1528+
multi_horizon_dids=multi_horizon_dids,
1529+
D_mat=D_mat,
1530+
baselines=baselines,
1531+
first_switch_idx=first_switch_idx_arr,
1532+
L_max=L_max,
1533+
)
15251534

15261535
# Cost-benefit delta (only meaningful when L_max >= 2)
15271536
if L_max >= 2:

diff_diff/chaisemartin_dhaultfoeuille_results.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -418,7 +418,7 @@ class ChaisemartinDHaultfoeuilleResults:
418418
# Repr / properties
419419
# ------------------------------------------------------------------
420420

421-
def _horizon_label(self, h: int) -> str:
421+
def _horizon_label(self, h) -> str:
422422
"""Return per-horizon estimand label for event study rows."""
423423
has_controls = self.covariate_residuals is not None
424424
has_trends = self.linear_trends_effects is not None
@@ -728,7 +728,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
728728
lines.extend(
729729
[
730730
thin,
731-
f"Event Study (DID_l, l = 1..{self.L_max})".center(width),
731+
f"Event Study ({self._horizon_label('l')}, l = 1..{self.L_max})".center(width),
732732
thin,
733733
header_row,
734734
thin,
@@ -738,7 +738,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
738738
entry = self.event_study_effects[l_h]
739739
lines.append(
740740
_format_inference_row(
741-
f"DID_{l_h}",
741+
self._horizon_label(l_h),
742742
entry["effect"],
743743
entry["se"],
744744
entry["t_stat"],

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -611,7 +611,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
611611

612612
- **Note (Phase 3 DID^X covariate adjustment):** When `controls` is set, `per_period_effects` (the Phase 1 per-period DID_M decomposition) remains **unadjusted** (computed on raw outcomes). The covariate residualization applies only to the per-group `DID_{g,l}` path (`L_max >= 1`), which produces `event_study_effects` and `overall_att`. This means `per_period_effects` and `event_study_effects[1]` may diverge when controls are active - by design (the per-period path uses binary joiner/leaver categorization and is not part of the DID^X contract). Implements the residualization-style covariate adjustment from Web Appendix Section 1.2 (Assumption 11). For each baseline treatment value `d`, estimates `theta_hat_d` via OLS of first-differenced outcomes on first-differenced covariates with time FEs, restricted to not-yet-treated observations. Residualizes at levels: `Y_tilde[g,t] = Y[g,t] - X[g,t] @ theta_hat_d`. All downstream DID computations use residualized outcomes. This is NOT doubly-robust, NOT IPW, NOT Callaway-Sant'Anna-style. Plug-in IF (treating `theta_hat` as fixed) is valid by FWL theorem. **Deviation from R `DIDmultiplegtDYN`:** The first-stage OLS uses equal cell weights (one observation per `(g,t)` cell), consistent with the library's cell-count weighting convention documented in Phase 1. R weights by `N_gt` (observation count per cell). On panels with 1 observation per cell (the common case), results are identical. When baseline-specific first stages fail (`n_obs = 0` or `n_obs < n_params`), the affected strata are excluded from the estimation (outcomes set to NaN) rather than retained unadjusted - matching R's "drop failed strata" behavior. Requires `L_max >= 1`. Activated via `controls=["col1", "col2"]` in `fit()`.
613613

614-
- **Note (Phase 3 DID^{fd} linear trends):** Implements group-specific linear trends from Web Appendix Section 1.3 (Assumption 12, Lemma 6). Uses the Z_mat transformation: `Z[g,t] = Y[g,t] - Y[g,t-1]` (first-differenced outcomes). Since `DID_{g,l}(Z) = DID^{fd}_{g,l}` algebraically, the existing multi-horizon DID code produces trend-adjusted estimates when fed Z_mat. Requires F_g >= 3 (at least 2 pre-switch periods); groups with F_g < 3 are excluded with a `UserWarning`. Cumulated level effects `delta^{fd}_l = sum_{l'=1}^l DID^{fd}_{l'}` stored in `results.linear_trends_effects`. Cumulated SE uses conservative upper bound (sum of per-horizon SEs); cross-horizon covariance from IF vectors is a library extension (paper proves Theorem 1 per-horizon, not cross-horizon). When combined with DID^X, residualization is applied first, then first-differencing (per paper assumption ordering). Activated via `trends_linear=True` in `fit()`.
614+
- **Note (Phase 3 DID^{fd} linear trends):** Implements group-specific linear trends from Web Appendix Section 1.3 (Assumption 12, Lemma 6). Uses the Z_mat transformation: `Z[g,t] = Y[g,t] - Y[g,t-1]` (first-differenced outcomes). Since `DID_{g,l}(Z) = DID^{fd}_{g,l}` algebraically, the existing multi-horizon DID code produces trend-adjusted estimates when fed Z_mat. Requires F_g >= 3 (at least 2 pre-switch periods); groups with F_g < 3 are excluded with a `UserWarning`. Cumulated level effects `delta^{fd}_l = sum_{l'=1}^l DID^{fd}_{l'}` stored in `results.linear_trends_effects`. Cumulated SE uses conservative upper bound (sum of per-horizon SEs); cross-horizon covariance from IF vectors is a library extension (paper proves Theorem 1 per-horizon, not cross-horizon). When combined with DID^X, residualization is applied first, then first-differencing (per paper assumption ordering). **Suppressed surfaces under `trends_linear`:** `normalized_effects` (`DID^n_l`) and `cost_benefit_delta` are suppressed because they would operate on second-differences rather than level effects. Users should access cumulated level effects via `linear_trends_effects`. Activated via `trends_linear=True` in `fit()`.
615615

616616
- **Note (Phase 3 state-set trends):** Implements state-set-specific trends from Web Appendix Section 1.4 (Assumptions 13-14). Restricts the control pool for each switcher to groups in the same set (e.g., same state in county-level data). The restriction applies in BOTH `_compute_multi_horizon_dids()` (point estimates) and `_compute_per_group_if_multi_horizon()` (influence functions) to ensure IF consistency. Cohort structure stays as `(D_{g,1}, F_g, S_g)` triples (does not incorporate set membership). Set membership must be time-invariant per group. Activated via `trends_nonparam="state_column"` in `fit()`.
617617

0 commit comments

Comments
 (0)