Skip to content

Commit cf17cb2

Browse files
igerberclaude
andcommitted
Fix CI review R2: zero-weight row drops, weighted covariates, df_survey propagation
- P1-A: Drop zero-weight rows entirely from cell DataFrame (instead of just zeroing n_gt) so ragged-panel validator doesn't see them - P1-B: Survey-weighted covariate aggregation in DID^X path (sum(w*x)/sum(w) instead of unweighted mean) - P1-C: Thread _df_survey to all remaining safe_inference() calls: bootstrap t-stats, normalized effects, cost-benefit delta, placebo bootstrap t-stats - P3: Fix REGISTRY overview paragraph (was still saying survey deferred) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1ed49bc commit cf17cb2

2 files changed

Lines changed: 27 additions & 12 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -227,12 +227,11 @@ def _validate_and_aggregate_to_cells(
227227
)
228228
cell["y_gt"] = cell["_wy_sum"] / cell["w_gt"]
229229
cell = cell.drop(columns=["_wy_sum"])
230-
# Zero-weight cells: treat as absent so downstream presence
231-
# logic (N_mat > 0) correctly excludes them.
230+
# Zero-weight cells: drop entirely so downstream validators
231+
# (ragged-panel, baseline requirement) don't see them.
232232
zero_w_mask = cell["w_gt"] <= 0
233233
if zero_w_mask.any():
234-
cell.loc[zero_w_mask, "n_gt"] = 0
235-
cell.loc[zero_w_mask, "y_gt"] = 0.0
234+
cell = cell[~zero_w_mask].reset_index(drop=True)
236235
df.drop(columns=["_w_", "_wy_"], inplace=True)
237236
else:
238237
cell = df.groupby([group, time], as_index=False).agg(
@@ -744,7 +743,23 @@ def fit(
744743
# Use the coerced copy joined with group/time from original data.
745744
x_agg_input = data[[group, time]].copy()
746745
x_agg_input[controls] = data_controls[controls].values
747-
x_cell_agg = x_agg_input.groupby([group, time], as_index=False)[controls].mean()
746+
if survey_weights is not None:
747+
# Survey-weighted covariate cell means: sum(w*x)/sum(w)
748+
x_agg_input["_w_"] = survey_weights
749+
for c in controls:
750+
x_agg_input[f"_wx_{c}"] = survey_weights * x_agg_input[c].values
751+
wx_cols = [f"_wx_{c}" for c in controls]
752+
g_agg = x_agg_input.groupby([group, time], as_index=False).agg(
753+
{**{wc: "sum" for wc in wx_cols}, "_w_": "sum"}
754+
)
755+
for c in controls:
756+
w_safe = g_agg["_w_"].replace(0, 1)
757+
g_agg[c] = g_agg[f"_wx_{c}"] / w_safe
758+
x_cell_agg = g_agg[[group, time] + controls]
759+
else:
760+
x_cell_agg = x_agg_input.groupby(
761+
[group, time], as_index=False
762+
)[controls].mean()
748763
cell = cell.merge(x_cell_agg, on=[group, time], how="left")
749764

750765
# ------------------------------------------------------------------
@@ -1959,17 +1974,17 @@ def fit(
19591974
overall_se = br.overall_se
19601975
overall_p = br.overall_p_value if br.overall_p_value is not None else np.nan
19611976
overall_ci = br.overall_ci if br.overall_ci is not None else (np.nan, np.nan)
1962-
overall_t = safe_inference(overall_att, overall_se, alpha=self.alpha, df=None)[0]
1977+
overall_t = safe_inference(overall_att, overall_se, alpha=self.alpha, df=_df_survey)[0]
19631978
if joiners_available and br.joiners_se is not None and np.isfinite(br.joiners_se):
19641979
joiners_se = br.joiners_se
19651980
joiners_p = br.joiners_p_value if br.joiners_p_value is not None else np.nan
19661981
joiners_ci = br.joiners_ci if br.joiners_ci is not None else (np.nan, np.nan)
1967-
joiners_t = safe_inference(joiners_att, joiners_se, alpha=self.alpha, df=None)[0]
1982+
joiners_t = safe_inference(joiners_att, joiners_se, alpha=self.alpha, df=_df_survey)[0]
19681983
if leavers_available and br.leavers_se is not None and np.isfinite(br.leavers_se):
19691984
leavers_se = br.leavers_se
19701985
leavers_p = br.leavers_p_value if br.leavers_p_value is not None else np.nan
19711986
leavers_ci = br.leavers_ci if br.leavers_ci is not None else (np.nan, np.nan)
1972-
leavers_t = safe_inference(leavers_att, leavers_se, alpha=self.alpha, df=None)[0]
1987+
leavers_t = safe_inference(leavers_att, leavers_se, alpha=self.alpha, df=_df_survey)[0]
19731988

19741989
# ------------------------------------------------------------------
19751990
# Step 20: Build the results dataclass
@@ -2216,7 +2231,7 @@ def fit(
22162231
bs_ci if bs_ci is not None else (np.nan, np.nan)
22172232
)
22182233
placebo_event_study_dict[neg_key]["t_stat"] = safe_inference(
2219-
eff, bs_se, alpha=self.alpha, df=None
2234+
eff, bs_se, alpha=self.alpha, df=_df_survey
22202235
)[0]
22212236

22222237
# Phase 2: build normalized_effects with SE
@@ -2229,7 +2244,7 @@ def fit(
22292244
# SE via delta method: SE(DID^n_l) = SE(DID_l) / delta^D_l
22302245
se_did_l = multi_horizon_se.get(l_h, float("nan"))
22312246
se_norm = se_did_l / denom if np.isfinite(denom) and denom > 0 else float("nan")
2232-
t_n, p_n, ci_n = safe_inference(eff, se_norm, alpha=self.alpha, df=None)
2247+
t_n, p_n, ci_n = safe_inference(eff, se_norm, alpha=self.alpha, df=_df_survey)
22332248
normalized_effects_out[l_h] = {
22342249
"effect": eff,
22352250
"se": se_norm,
@@ -2288,7 +2303,7 @@ def fit(
22882303
else:
22892304
running_se_ub = float("nan")
22902305
cum_t, cum_p, cum_ci = safe_inference(
2291-
cum_effect, running_se_ub, alpha=self.alpha, df=None
2306+
cum_effect, running_se_ub, alpha=self.alpha, df=_df_survey
22922307
)
22932308
cumulated[l_h] = {
22942309
"effect": cum_effect,

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -463,7 +463,7 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
463463
- [de Chaisemartin, C. & D'Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. *American Economic Review*, 110(9), 2964-2996.](https://doi.org/10.1257/aer.20181169)
464464
- [de Chaisemartin, C. & D'Haultfœuille, X. (2022, revised 2024). Difference-in-Differences Estimators of Intertemporal Treatment Effects. NBER Working Paper 29873.](https://www.nber.org/papers/w29873) — Web Appendix Section 3.7.3 contains the cohort-recentered plug-in variance formula implemented here.
465465

466-
**Phase 1-2 scope:** Ships the contemporaneous-switch estimator `DID_M` (= `DID_1` at horizon `l = 1`) from the AER 2020 paper **plus** the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper. Phase 2 adds: per-group `DID_{g,l}` building block (Equation 3), dynamic placebos `DID^{pl}_l`, normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, sup-t simultaneous confidence bands, and `plot_event_study()` integration. Phase 3 adds covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, and HonestDiD integration. Survey design support is deferred to a separate effort after all phases ship. **This is the only modern staggered estimator in the library that handles non-absorbing (reversible) treatments** - treatment can switch on AND off over time, making it the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles.
466+
**Phase 1-2 scope:** Ships the contemporaneous-switch estimator `DID_M` (= `DID_1` at horizon `l = 1`) from the AER 2020 paper **plus** the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper. Phase 2 adds: per-group `DID_{g,l}` building block (Equation 3), dynamic placebos `DID^{pl}_l`, normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, sup-t simultaneous confidence bands, and `plot_event_study()` integration. Phase 3 adds covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, and HonestDiD integration. Survey design supports pweight with strata/PSU/FPC via Taylor Series Linearization; replicate weights and PSU-level bootstrap are deferred. **This is the only modern staggered estimator in the library that handles non-absorbing (reversible) treatments** - treatment can switch on AND off over time, making it the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles.
467467

468468
**Key implementation requirements:**
469469

0 commit comments

Comments
 (0)