Round 13: honor rank_deficient_action='error' on fitted TWFE path

igerber · claude · igerber · commit b5e184781eef · 2026-04-12T06:30:53.000-04:00
P1: the blanket except Exception around the TWFE diagnostic call in
fit() Step 5a swallowed ALL exceptions, including the ValueError that
solve_ols raises when rank_deficient_action="error". This broke the
public-parameter contract: a user requesting strict failure on a
rank-deficient TWFE design would get a successful fit with the
diagnostic silently omitted instead. The standalone twowayfeweights()
already honored the parameter correctly; only the fitted path was
broken.

Fix: in the except block, re-raise ValueError when
rank_deficient_action=="error" so the user's strict-failure request
is honored. Other exceptions (genuinely non-fatal diagnostic failures)
are still downgraded to a warning + skipped diagnostic.

P3: fixed two stale docstrings that still described the old "warning
+ majority rounding" behavior for fractional within-cell treatment
(the _validate_and_aggregate_to_cells step 5 docstring and the
fit() group parameter docstring). Both now correctly describe the
Phase 1 ValueError rejection.

Regression test: test_rank_deficient_action_error_raises_on_fitted_twfe
in TestForwardCompatGates. Uses a minimal 1-group 2-period panel
(2 cells &lt; 3 FE columns) to trigger the underdetermined-system
ValueError. Asserts rank_deficient_action="error" raises, and
rank_deficient_action="warn" does NOT raise the TWFE-specific error.

Test counts: 112 -&gt; 113. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -120,8 +120,8 @@ def _validate_and_aggregate_to_cells(
        and raises ``ValueError``.
     5. **Cell aggregation** via ``groupby([group, time]).agg(...)``
        producing ``y_gt`` (cell mean of ``outcome``), ``d_gt`` (cell
-       mean of ``treatment``, then majority-rounded), and ``n_gt``
-       (count of original observations in the cell).
+       mean of ``treatment``), and ``n_gt`` (count of original
+       observations in the cell).
     6. **Within-cell-varying treatment** (any cell with fractional
        ``d_gt``) raises ``ValueError``. Phase 1 requires treatment to
        be constant within each ``(group, time)`` cell; fuzzy DiD is
@@ -477,10 +477,11 @@ def fit(
         outcome : str
             Outcome variable column name.
         group : str
-            Group identifier column name. Treatment is assumed constant
-            within each ``(group, time)`` cell after aggregation; a
-            warning is emitted and the cell-level treatment is rounded to
-            majority if any cell has fractional treatment after grouping.
+            Group identifier column name. Treatment must be constant
+            within each ``(group, time)`` cell after aggregation;
+            ``ValueError`` is raised if any cell has fractional
+            treatment after grouping (within-cell-varying treatment
+            indicates a fuzzy design not supported in Phase 1).
         time : str
             Time period column name. Must be sortable.
         treatment : str
@@ -588,6 +589,14 @@ def fit(
                     rank_deficient_action=self.rank_deficient_action,
                 )
             except Exception as exc:  # noqa: BLE001
+                # Honor rank_deficient_action="error": if the user
+                # explicitly requested strict failure on rank-deficient
+                # designs, re-raise instead of downgrading to a warning.
+                # Only genuinely non-fatal failures (e.g., numerical
+                # issues unrelated to rank deficiency) should be
+                # swallowed as warnings.
+                if self.rank_deficient_action == "error" and isinstance(exc, ValueError):
+                    raise
                 warnings.warn(
                     f"TWFE decomposition diagnostic failed: {exc}. "
                     "Skipping diagnostic; main estimation continues.",
diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py
@@ -370,6 +370,86 @@ def test_cluster_parameter_raises_not_implemented(self, data):
                 cluster="state",
             )
 
+    def test_rank_deficient_action_error_raises_on_fitted_twfe(self):
+        """
+        Per Round 13: rank_deficient_action="error" must be honored on
+        the fitted TWFE diagnostic path, not swallowed by the blanket
+        try/except. The standalone twowayfeweights() always honors it;
+        the fitted path must too.
+
+        Uses a minimal panel (1 joiner group + 1 control group, 3
+        periods, 1 obs per cell = 6 cells total) where the FE design
+        has more columns than cells and triggers the underdetermined-
+        system ValueError from solve_ols.
+        """
+        # 2 groups, 3 periods: 6 cells but the FE design has
+        # (2-1) + (3-1) + 1 = 4 columns. That's fine.
+        # To trigger rank-deficient: use a panel so small that the
+        # number of cells equals the number of FE dummies.
+        # With 3 groups, 3 periods: 9 cells, (3-1) + (3-1) + 1 = 5 columns. Not rank-deficient.
+        # With 2 groups, 2 periods: 4 cells, (2-1) + (2-1) + 1 = 3 columns. Not rank-deficient.
+        # Trigger via an unbalanced panel: 3 groups, 3 periods, but
+        # group 3 only has period 0 (terminal missingness), giving
+        # 7 cells with 3+3-1 = 5 columns. Not rank-deficient.
+        #
+        # Simplest route: a single-group joiner panel (1 group, 2
+        # periods = 2 cells, but group+time dummies need 3 columns).
+        # This also needs a control group. Use 2 groups, but one
+        # is a singleton-period (contributing 1 cell to 1 period only).
+        # Actually, the easiest verified trigger: 1 group, 2 periods.
+        # solve_ols raises "Fewer observations (2) than parameters (3)."
+        # But fit() will also raise for missing-baseline or insufficient
+        # groups BEFORE reaching the TWFE diagnostic — so the TWFE
+        # diagnostic must run first (it does: Step 5a).
+        #
+        # Use the confirmed trigger: 1 group, 2 periods, which has
+        # 2 cells < 3 columns in the FE design.
+        df = pd.DataFrame(
+            {
+                "group": [1, 1],
+                "period": [0, 1],
+                "treatment": [0, 1],
+                "outcome": [10.0, 12.0],
+            }
+        )
+        # rank_deficient_action="error" should propagate through
+        est = ChaisemartinDHaultfoeuille(twfe_diagnostic=True, rank_deficient_action="error")
+        with pytest.raises(ValueError, match="Fewer observations"):
+            est.fit(
+                df,
+                outcome="outcome",
+                group="group",
+                time="period",
+                treatment="treatment",
+            )
+
+        # rank_deficient_action="warn" should NOT raise on the same panel
+        # (the diagnostic fails gracefully and main estimation continues)
+        est_warn = ChaisemartinDHaultfoeuille(twfe_diagnostic=True, rank_deficient_action="warn")
+        with warnings.catch_warnings(record=True):
+            warnings.simplefilter("always")
+            # The estimation may still raise for other reasons (e.g.,
+            # no switching cells after the 1-group panel has no controls).
+            # What we're testing is that the TWFE diagnostic does NOT
+            # raise. If the main estimation raises, that's fine — the
+            # test goal is that rank_deficient_action="warn" doesn't
+            # propagate the ValueError.
+            try:
+                est_warn.fit(
+                    df,
+                    outcome="outcome",
+                    group="group",
+                    time="period",
+                    treatment="treatment",
+                )
+            except ValueError as exc:
+                # Acceptable if the error is from main estimation
+                # (not from the TWFE diagnostic)
+                assert "Fewer observations" not in str(exc), (
+                    "rank_deficient_action='warn' should not raise the "
+                    "TWFE rank-deficiency error"
+                )
+
 
 # =============================================================================
 # drop_larger_lower (Critical #1)