Skip to content

Commit b5e1847

Browse files
igerberclaude
andcommitted
Round 13: honor rank_deficient_action='error' on fitted TWFE path
P1: the blanket except Exception around the TWFE diagnostic call in fit() Step 5a swallowed ALL exceptions, including the ValueError that solve_ols raises when rank_deficient_action="error". This broke the public-parameter contract: a user requesting strict failure on a rank-deficient TWFE design would get a successful fit with the diagnostic silently omitted instead. The standalone twowayfeweights() already honored the parameter correctly; only the fitted path was broken. Fix: in the except block, re-raise ValueError when rank_deficient_action=="error" so the user's strict-failure request is honored. Other exceptions (genuinely non-fatal diagnostic failures) are still downgraded to a warning + skipped diagnostic. P3: fixed two stale docstrings that still described the old "warning + majority rounding" behavior for fractional within-cell treatment (the _validate_and_aggregate_to_cells step 5 docstring and the fit() group parameter docstring). Both now correctly describe the Phase 1 ValueError rejection. Regression test: test_rank_deficient_action_error_raises_on_fitted_twfe in TestForwardCompatGates. Uses a minimal 1-group 2-period panel (2 cells < 3 FE columns) to trigger the underdetermined-system ValueError. Asserts rank_deficient_action="error" raises, and rank_deficient_action="warn" does NOT raise the TWFE-specific error. Test counts: 112 -> 113. Black, ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent a8b161c commit b5e1847

2 files changed

Lines changed: 95 additions & 6 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,8 @@ def _validate_and_aggregate_to_cells(
120120
and raises ``ValueError``.
121121
5. **Cell aggregation** via ``groupby([group, time]).agg(...)``
122122
producing ``y_gt`` (cell mean of ``outcome``), ``d_gt`` (cell
123-
mean of ``treatment``, then majority-rounded), and ``n_gt``
124-
(count of original observations in the cell).
123+
mean of ``treatment``), and ``n_gt`` (count of original
124+
observations in the cell).
125125
6. **Within-cell-varying treatment** (any cell with fractional
126126
``d_gt``) raises ``ValueError``. Phase 1 requires treatment to
127127
be constant within each ``(group, time)`` cell; fuzzy DiD is
@@ -477,10 +477,11 @@ def fit(
477477
outcome : str
478478
Outcome variable column name.
479479
group : str
480-
Group identifier column name. Treatment is assumed constant
481-
within each ``(group, time)`` cell after aggregation; a
482-
warning is emitted and the cell-level treatment is rounded to
483-
majority if any cell has fractional treatment after grouping.
480+
Group identifier column name. Treatment must be constant
481+
within each ``(group, time)`` cell after aggregation;
482+
``ValueError`` is raised if any cell has fractional
483+
treatment after grouping (within-cell-varying treatment
484+
indicates a fuzzy design not supported in Phase 1).
484485
time : str
485486
Time period column name. Must be sortable.
486487
treatment : str
@@ -588,6 +589,14 @@ def fit(
588589
rank_deficient_action=self.rank_deficient_action,
589590
)
590591
except Exception as exc: # noqa: BLE001
592+
# Honor rank_deficient_action="error": if the user
593+
# explicitly requested strict failure on rank-deficient
594+
# designs, re-raise instead of downgrading to a warning.
595+
# Only genuinely non-fatal failures (e.g., numerical
596+
# issues unrelated to rank deficiency) should be
597+
# swallowed as warnings.
598+
if self.rank_deficient_action == "error" and isinstance(exc, ValueError):
599+
raise
591600
warnings.warn(
592601
f"TWFE decomposition diagnostic failed: {exc}. "
593602
"Skipping diagnostic; main estimation continues.",

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -370,6 +370,86 @@ def test_cluster_parameter_raises_not_implemented(self, data):
370370
cluster="state",
371371
)
372372

373+
def test_rank_deficient_action_error_raises_on_fitted_twfe(self):
374+
"""
375+
Per Round 13: rank_deficient_action="error" must be honored on
376+
the fitted TWFE diagnostic path, not swallowed by the blanket
377+
try/except. The standalone twowayfeweights() always honors it;
378+
the fitted path must too.
379+
380+
Uses a minimal panel (1 joiner group + 1 control group, 3
381+
periods, 1 obs per cell = 6 cells total) where the FE design
382+
has more columns than cells and triggers the underdetermined-
383+
system ValueError from solve_ols.
384+
"""
385+
# 2 groups, 3 periods: 6 cells but the FE design has
386+
# (2-1) + (3-1) + 1 = 4 columns. That's fine.
387+
# To trigger rank-deficient: use a panel so small that the
388+
# number of cells equals the number of FE dummies.
389+
# With 3 groups, 3 periods: 9 cells, (3-1) + (3-1) + 1 = 5 columns. Not rank-deficient.
390+
# With 2 groups, 2 periods: 4 cells, (2-1) + (2-1) + 1 = 3 columns. Not rank-deficient.
391+
# Trigger via an unbalanced panel: 3 groups, 3 periods, but
392+
# group 3 only has period 0 (terminal missingness), giving
393+
# 7 cells with 3+3-1 = 5 columns. Not rank-deficient.
394+
#
395+
# Simplest route: a single-group joiner panel (1 group, 2
396+
# periods = 2 cells, but group+time dummies need 3 columns).
397+
# This also needs a control group. Use 2 groups, but one
398+
# is a singleton-period (contributing 1 cell to 1 period only).
399+
# Actually, the easiest verified trigger: 1 group, 2 periods.
400+
# solve_ols raises "Fewer observations (2) than parameters (3)."
401+
# But fit() will also raise for missing-baseline or insufficient
402+
# groups BEFORE reaching the TWFE diagnostic — so the TWFE
403+
# diagnostic must run first (it does: Step 5a).
404+
#
405+
# Use the confirmed trigger: 1 group, 2 periods, which has
406+
# 2 cells < 3 columns in the FE design.
407+
df = pd.DataFrame(
408+
{
409+
"group": [1, 1],
410+
"period": [0, 1],
411+
"treatment": [0, 1],
412+
"outcome": [10.0, 12.0],
413+
}
414+
)
415+
# rank_deficient_action="error" should propagate through
416+
est = ChaisemartinDHaultfoeuille(twfe_diagnostic=True, rank_deficient_action="error")
417+
with pytest.raises(ValueError, match="Fewer observations"):
418+
est.fit(
419+
df,
420+
outcome="outcome",
421+
group="group",
422+
time="period",
423+
treatment="treatment",
424+
)
425+
426+
# rank_deficient_action="warn" should NOT raise on the same panel
427+
# (the diagnostic fails gracefully and main estimation continues)
428+
est_warn = ChaisemartinDHaultfoeuille(twfe_diagnostic=True, rank_deficient_action="warn")
429+
with warnings.catch_warnings(record=True):
430+
warnings.simplefilter("always")
431+
# The estimation may still raise for other reasons (e.g.,
432+
# no switching cells after the 1-group panel has no controls).
433+
# What we're testing is that the TWFE diagnostic does NOT
434+
# raise. If the main estimation raises, that's fine — the
435+
# test goal is that rank_deficient_action="warn" doesn't
436+
# propagate the ValueError.
437+
try:
438+
est_warn.fit(
439+
df,
440+
outcome="outcome",
441+
group="group",
442+
time="period",
443+
treatment="treatment",
444+
)
445+
except ValueError as exc:
446+
# Acceptable if the error is from main estimation
447+
# (not from the TWFE diagnostic)
448+
assert "Fewer observations" not in str(exc), (
449+
"rank_deficient_action='warn' should not raise the "
450+
"TWFE rank-deficiency error"
451+
)
452+
373453

374454
# =============================================================================
375455
# drop_larger_lower (Critical #1)

0 commit comments

Comments
 (0)