Skip to content

Commit bc1dab7

Browse files
igerberclaude
andcommitted
Fix CI review Round 5: partial theta_hat, coarser-than-group, het docs
P1: DID^X rank-deficiency now residualizes with finite subset of theta_hat (zeroing NaN coefficients) instead of skipping entirely. P1: trends_nonparam now rejects set definitions that are not coarser than group (singleton sets have no within-set controls). P1: heterogeneity restrictions on trends_linear and trends_nonparam now documented in REGISTRY.md and fit() docstring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 357a551 commit bc1dab7

3 files changed

Lines changed: 38 additions & 9 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -544,7 +544,8 @@ def fit(
544544
heterogeneous effects (Web Appendix Section 1.5, Lemma 7).
545545
Partial implementation: post-treatment regressions only
546546
(no placebo regressions or joint null test). Cannot be
547-
combined with ``controls``. Requires ``L_max >= 1``.
547+
combined with ``controls``, ``trends_linear``, or
548+
``trends_nonparam``. Requires ``L_max >= 1``.
548549
design2 : bool, default=False
549550
If ``True``, identify and report switch-in/switch-out
550551
(Design-2) groups. Convenience wrapper (descriptive summary,
@@ -1075,6 +1076,20 @@ def fit(
10751076
f"{len(time_varying)} group(s) have varying values. "
10761077
f"Examples: {time_varying.index.tolist()[:5]}"
10771078
)
1079+
# Set partition must be coarser than group (multiple groups
1080+
# per set). A group-level partition creates singleton sets
1081+
# with no within-set controls available.
1082+
set_map_check = data.groupby(group)[set_col].first()
1083+
n_sets = set_map_check.nunique()
1084+
n_groups_total = len(set_map_check)
1085+
if n_sets >= n_groups_total:
1086+
raise ValueError(
1087+
f"trends_nonparam column {set_col!r} defines "
1088+
f"{n_sets} distinct sets for {n_groups_total} "
1089+
f"groups. The set partition must be coarser than "
1090+
f"group (multiple groups per set) to provide "
1091+
f"within-set controls."
1092+
)
10781093
# Extract set membership per group aligned with all_groups
10791094
set_map = data.groupby(group)[set_col].first()
10801095
set_ids_arr = np.array(
@@ -2848,18 +2863,22 @@ def _compute_covariate_residualization(
28482863
"r_squared": r_squared,
28492864
}
28502865

2851-
# Guard: if any control coefficient is NaN (rank-deficient OLS
2852-
# dropped a collinear control), skip residualization for this
2853-
# baseline to prevent NaN propagation through Y_resid.
2854-
if not np.all(np.isfinite(theta_hat)):
2866+
# Guard: if some control coefficients are NaN (rank-deficient
2867+
# OLS dropped collinear controls), residualize with only the
2868+
# finite subset. Replace NaN coefficients with 0 so einsum
2869+
# only uses the identified controls.
2870+
nan_mask = ~np.isfinite(theta_hat)
2871+
if nan_mask.any():
2872+
n_dropped = int(nan_mask.sum())
28552873
warnings.warn(
28562874
f"DID^X: rank-deficient first-stage OLS for baseline "
2857-
f"d={d_val} produced NaN coefficients. Outcomes for "
2858-
f"groups with this baseline are not residualized.",
2875+
f"d={d_val} dropped {n_dropped} collinear control(s). "
2876+
f"Residualization uses the {n_covariates - n_dropped} "
2877+
f"identified control(s).",
28592878
UserWarning,
28602879
stacklevel=3,
28612880
)
2862-
continue
2881+
theta_hat = np.where(np.isfinite(theta_hat), theta_hat, 0.0)
28632882

28642883
# Residualize Y at levels for all groups with this baseline.
28652884
# Vectorized level residualization: Y_tilde[g, t] = Y[g, t] - X[g, t] @ theta_hat

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -615,7 +615,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
615615

616616
- **Note (Phase 3 state-set trends):** Implements state-set-specific trends from Web Appendix Section 1.4 (Assumptions 13-14). Restricts the control pool for each switcher to groups in the same set (e.g., same state in county-level data). The restriction applies in BOTH `_compute_multi_horizon_dids()` (point estimates) and `_compute_per_group_if_multi_horizon()` (influence functions) to ensure IF consistency. Cohort structure stays as `(D_{g,1}, F_g, S_g)` triples (does not incorporate set membership). Set membership must be time-invariant per group. Activated via `trends_nonparam="state_column"` in `fit()`.
617617

618-
- **Note (Phase 3 heterogeneity testing - partial implementation):** Partial implementation of the heterogeneity test from Web Appendix Section 1.5 (Assumption 15, Lemma 7). Computes post-treatment saturated OLS regressions of `S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})` on a time-invariant covariate `X_g` plus cohort indicator dummies. Standard OLS inference is valid (paper shows no DID error correction needed). **Deviation from R `predict_het`:** R's full `predict_het` option additionally computes placebo regressions and a joint null test, and disallows combination with `controls`. This implementation provides only post-treatment regressions. Combination with `controls` is rejected (matching R). Results stored in `results.heterogeneity_effects`. Activated via `heterogeneity="covariate_column"` in `fit()`.
618+
- **Note (Phase 3 heterogeneity testing - partial implementation):** Partial implementation of the heterogeneity test from Web Appendix Section 1.5 (Assumption 15, Lemma 7). Computes post-treatment saturated OLS regressions of `S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})` on a time-invariant covariate `X_g` plus cohort indicator dummies. Standard OLS inference is valid (paper shows no DID error correction needed). **Deviation from R `predict_het`:** R's full `predict_het` option additionally computes placebo regressions and a joint null test, and disallows combination with `controls`. This implementation provides only post-treatment regressions. **Rejected combinations:** `controls` (matching R), `trends_linear` (heterogeneity test uses raw level changes, incompatible with second-differenced outcomes), and `trends_nonparam` (heterogeneity test does not thread state-set control-pool restrictions). Results stored in `results.heterogeneity_effects`. Activated via `heterogeneity="covariate_column"` in `fit()`.
619619

620620
- **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. The paper notes Design-2 can be implemented by "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups.
621621

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2672,6 +2672,16 @@ def test_missing_set_column_raises(self):
26722672
L_max=1, trends_nonparam="nonexistent",
26732673
)
26742674

2675+
def test_group_level_set_rejected(self):
2676+
"""Set partition at group level (not coarser) raises ValueError."""
2677+
df = self._make_panel_with_sets()
2678+
# Use group column itself as set (each group is its own set)
2679+
with pytest.raises(ValueError, match="coarser than group"):
2680+
ChaisemartinDHaultfoeuille(seed=1).fit(
2681+
df, "outcome", "group", "period", "treatment",
2682+
L_max=1, trends_nonparam="group",
2683+
)
2684+
26752685
def test_nonparam_with_covariates(self):
26762686
"""Combined state-set trends + covariates."""
26772687
df = self._make_panel_with_sets()

0 commit comments

Comments
 (0)