Fix CI review Round 7: suppress joiner/leaver for all L_max>=1, no-switcher guard

igerber · claude · igerber · commit 79a01abb7909 · 2026-04-12T21:16:03.000-04:00
P1: Suppress joiner/leaver decomposition for ALL L_max &gt;= 1 (not just
    non-binary). The decomposition is a per-period DID_M concept that
    can differ from the per-group DID_1 estimand on mixed panels.
P1: Add no-switcher guard after multi-horizon computation - raise
    ValueError if N_l == 0 at horizon 1 (catches constant-treatment
    non-binary panels).
P3: Update REGISTRY SE parity tolerance note (10%/15% for multi-horizon,
    5% for single-horizon). Fix stale Step 12c comment.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -1048,7 +1048,7 @@ def fit(
         )
 
         # ------------------------------------------------------------------
-        # Step 12c: Multi-horizon computation (Phase 2, only when L_max>=2)
+        # Step 12c: Multi-horizon per-group computation (L_max >= 1)
         # ------------------------------------------------------------------
         multi_horizon_dids: Optional[Dict[int, Dict[str, Any]]] = None
         multi_horizon_if: Optional[Dict[int, np.ndarray]] = None
@@ -1080,6 +1080,15 @@ def fit(
                     stacklevel=2,
                 )
 
+            # Guard: if no eligible switchers at horizon 1 (e.g., all
+            # groups have constant treatment), raise ValueError.
+            if 1 in multi_horizon_dids and multi_horizon_dids[1]["N_l"] == 0:
+                raise ValueError(
+                    "No switching groups found at horizon 1 after filtering. "
+                    "dCDH requires at least one group whose treatment changes "
+                    "from the baseline period."
+                )
+
             multi_horizon_if = _compute_per_group_if_multi_horizon(
                 D_mat=D_mat,
                 Y_mat=Y_mat,
@@ -1831,28 +1840,32 @@ def fit(
             twfe_sigma_fe = twfe_diagnostic_payload.sigma_fe
             twfe_beta_fe = twfe_diagnostic_payload.beta_fe
 
-        # When L_max >= 1 on non-binary data, the binary-only metadata
-        # (N_S, joiner/leaver counts, n_treated_obs) doesn't match the
-        # per-group DID_1 estimand. Use per-group metadata instead and
-        # suppress the joiner/leaver decomposition.
+        # When L_max >= 1, the overall estimand is per-group DID_1
+        # (not per-period DID_M). The joiner/leaver decomposition is a
+        # per-period DID_M concept and can differ from DID_1 on mixed
+        # panels, so it's suppressed for all L_max >= 1 cases. N_S and
+        # n_treated_obs are updated from the per-group path.
         effective_N_S = N_S
         effective_n_treated = n_treated_obs_post
         effective_joiners_available = joiners_available
         effective_leavers_available = leavers_available
         if (
-            not is_binary
-            and L_max is not None
+            L_max is not None
             and L_max >= 1
             and multi_horizon_dids is not None
             and 1 in multi_horizon_dids
         ):
             # Use horizon-1 eligible switcher count as the effective N_S
             effective_N_S = multi_horizon_dids[1]["N_l"]
-            # Count all observations where treatment differs from baseline
-            effective_n_treated = int(
-                N_mat[D_mat != D_mat[:, 0:1]].sum()
-            ) if D_mat.shape[1] > 1 else 0
-            # Suppress joiner/leaver decomposition for non-binary
+            if not is_binary:
+                # For non-binary: count all observations where treatment
+                # differs from baseline
+                effective_n_treated = int(
+                    N_mat[D_mat != D_mat[:, 0:1]].sum()
+                ) if D_mat.shape[1] > 1 else 0
+            # Suppress joiner/leaver decomposition for all L_max >= 1
+            # (the decomposition is a per-period DID_M concept, not
+            # applicable to the per-group DID_1 estimand)
             effective_joiners_available = False
             effective_leavers_available = False
 
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -601,7 +601,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 
 - **Note:** Groups whose baseline treatment value `D_{g,1}` is unique in the post-drop panel (not shared by any other group) are excluded from the **variance computation only** per footnote 15 of the dynamic companion paper. They have no cohort peer for the cohort-recentered plug-in formula. They are **retained in the point-estimate sample** as period-based stable controls (Python's documented period-vs-cohort interpretation). The dropped count is stored on `results.n_groups_dropped_singleton_baseline`, a warning lists example group IDs, and the warning text explicitly states "VARIANCE computation only" so users know the filter does not change `DID_M`.
 
-- **Note (deviation from R DIDmultiplegtDYN):** Python uses **period-based** stable-control sets — `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`, and similarly for `stable_1(t)`. R `DIDmultiplegtDYN` uses **cohort-based** stable-control sets that additionally require `D_{g,1}` to match the side. Python's definition matches the AER 2020 Theorem 3 cell-count notation `N_{0,0,t}` and `N_{1,1,t}` literally; R's definition matches the dynamic companion paper's cohort `(D_{g,1}, F_g, S_g)` framework. The two definitions agree exactly on (a) panels containing only joiners, (b) panels containing only leavers, (c) the hand-calculable 4-group worked example, or (d) any panel where no joiner's post-switch state overlaps a period when leavers are switching. They disagree by O(1%) on the **point estimate** when both joiners and leavers exist AND some joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 fix that implemented the full `Lambda^G_{g,l=1}` influence function, the **standard error** parity gap on pure-direction scenarios narrowed from ~18% to ~3%. The R parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` use a tight `1e-4` tolerance for pure-direction point estimates, a 5% rtol for pure-direction SEs, and a 2.5% tolerance for mixed-direction point estimates (with the SE check skipped on mixed scenarios because the period-vs-cohort point-estimate deviation cascades into the variance).
+- **Note (deviation from R DIDmultiplegtDYN):** Python uses **period-based** stable-control sets — `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`, and similarly for `stable_1(t)`. R `DIDmultiplegtDYN` uses **cohort-based** stable-control sets that additionally require `D_{g,1}` to match the side. Python's definition matches the AER 2020 Theorem 3 cell-count notation `N_{0,0,t}` and `N_{1,1,t}` literally; R's definition matches the dynamic companion paper's cohort `(D_{g,1}, F_g, S_g)` framework. The two definitions agree exactly on (a) panels containing only joiners, (b) panels containing only leavers, (c) the hand-calculable 4-group worked example, or (d) any panel where no joiner's post-switch state overlaps a period when leavers are switching. They disagree by O(1%) on the **point estimate** when both joiners and leavers exist AND some joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 fix that implemented the full `Lambda^G_{g,l=1}` influence function, the **standard error** parity gap on pure-direction scenarios narrowed from ~18% to ~3%. The R parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` use a tight `1e-4` tolerance for pure-direction point estimates, 10% rtol for multi-horizon SEs (15% for L_max=5 long panels where the cell-count weighting deviation compounds), 5% rtol for single-horizon SEs, and a 2.5% tolerance for mixed-direction point estimates (with the SE check skipped on mixed scenarios because the period-vs-cohort point-estimate deviation cascades into the variance).
 
 - **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.