Skip to content

Commit 83cc093

Browse files
igerberclaude
andcommitted
Round 7: cluster gate + TWFE diagnostic order + singleton-baseline language
Resolves three findings from the prior CI review: P1 — cluster parameter was a public no-op. The constructor exposed `cluster: Optional[str] = None` and stored it on `self.cluster`, but neither `fit()` nor `_compute_dcdh_bootstrap()` ever read it, so `cluster="state"` silently produced the same group-level inference as `cluster=None`. The Phase 1 contract is now: dCDH always clusters at the group level (via the cohort-recentered influence function and the multiplier bootstrap), and any non-None cluster value raises `NotImplementedError` at construction time. The same gate fires from `set_params`. Added `test_cluster_parameter_raises_not_implemented` covering all four entry points (`__init__`, `set_params`, the `cluster=None` happy path, and the convenience function). P3 — TWFE diagnostic was running on the post-Step-5a sample but the inline comment claimed "FULL pre-filter cell dataset" and the standalone `twowayfeweights()` function used the truly pre-filter cell. On ragged panels with interior gaps, the two paths diverged. Fixed by reordering: the TWFE diagnostic now runs as Step 5a (was 5b) BEFORE the ragged-panel validation, which is now Step 5b (was 5a). The blocks are independent — the diagnostic just reads `cell` while the ragged-panel block mutates it. Both API surfaces now operate on the same `_validate_and_aggregate_to_cells()` output. P3 — API rst step 3 said "Filters singleton-baseline groups" which read as a point-estimate sample drop. After Round 3, singleton- baseline groups are excluded from the variance computation only (they remain in the point-estimate sample as period-based stable controls). Fixed the language to match. Documentation: - REGISTRY.md gets a new `**Note (Phase 1 cluster contract):**` block in the dCDH section explaining the always-group-level semantics and the construction-time NotImplementedError gate. - API rst step 3 reflects the variance-only singleton-baseline scope. - The dCDH `cluster` parameter docstring now describes the Phase 1 contract instead of "Currently unused — analytical SEs are always at the group level via the cohort-recentered plug-in." Test counts: 106 -> 107 (one new cluster gate regression test). Files modified: - diff_diff/chaisemartin_dhaultfoeuille.py (cluster gate in __init__ and set_params, docstring update, TWFE diagnostic block reorder with renumbered Step 5a/5b labels) - docs/methodology/REGISTRY.md (cluster contract Note + rename Step 5a -> Step 5b in the ragged-panel deviation Note) - docs/api/chaisemartin_dhaultfoeuille.rst (singleton-baseline language fix in step 3 of the overview) - tests/test_chaisemartin_dhaultfoeuille.py (test_cluster_parameter_raises_not_implemented in TestForwardCompatGates; rename Step 5a -> Step 5b in three ragged-panel test docstrings) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent cf639d4 commit 83cc093

4 files changed

Lines changed: 112 additions & 33 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 58 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -246,10 +246,15 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
246246
----------
247247
alpha : float, default=0.05
248248
Significance level for confidence intervals.
249-
cluster : str, optional
250-
Reserved for future cluster-robust SE customization. Currently
251-
unused — analytical SEs are always at the group level via the
252-
cohort-recentered plug-in.
249+
cluster : str, optional, default=None
250+
**Phase 1 contract:** ``cluster`` must be ``None`` (the default).
251+
dCDH always clusters at the group level via the cohort-recentered
252+
influence-function plug-in (analytical SEs) and the multiplier
253+
bootstrap (also grouped at the ``group`` column). Passing any
254+
non-``None`` value raises ``NotImplementedError`` with a Phase 1
255+
pointer. Custom clustering at a coarser or finer level than the
256+
group is reserved for a future phase. See REGISTRY.md
257+
``ChaisemartinDHaultfoeuille`` section for the full contract.
253258
n_bootstrap : int, default=0
254259
Number of multiplier-bootstrap iterations. ``0`` (default) uses
255260
only the analytical SE. Set to ``999`` or higher for stable
@@ -344,6 +349,18 @@ def __init__(
344349
raise ValueError(f"alpha must be in (0, 1), got {alpha}")
345350
if n_bootstrap < 0:
346351
raise ValueError(f"n_bootstrap must be non-negative, got {n_bootstrap}")
352+
if cluster is not None:
353+
raise NotImplementedError(
354+
f"cluster={cluster!r}: custom clustering is not supported in "
355+
f"Phase 1 of ChaisemartinDHaultfoeuille. dCDH always clusters "
356+
f"at the group level via the cohort-recentered influence-"
357+
f"function plug-in (analytical SEs) and the multiplier "
358+
f"bootstrap (also grouped at the group column). To use the "
359+
f"supported group-level clustering, pass cluster=None (the "
360+
f"default). Custom clustering is reserved for a future "
361+
f"phase. See REGISTRY.md ChaisemartinDHaultfoeuille section "
362+
f"for the full contract."
363+
)
347364

348365
self.alpha = alpha
349366
self.cluster = cluster
@@ -403,6 +420,15 @@ def set_params(self, **params: Any) -> "ChaisemartinDHaultfoeuille":
403420
raise ValueError(f"alpha must be in (0, 1), got {self.alpha}")
404421
if self.n_bootstrap < 0:
405422
raise ValueError(f"n_bootstrap must be non-negative, got {self.n_bootstrap}")
423+
if self.cluster is not None:
424+
raise NotImplementedError(
425+
f"cluster={self.cluster!r}: custom clustering is not supported "
426+
f"in Phase 1 of ChaisemartinDHaultfoeuille. dCDH always clusters "
427+
f"at the group level. To use the supported group-level "
428+
f"clustering, pass cluster=None (the default). Custom clustering "
429+
f"is reserved for a future phase. See REGISTRY.md "
430+
f"ChaisemartinDHaultfoeuille section for the full contract."
431+
)
406432
return self
407433

408434
# ------------------------------------------------------------------
@@ -531,7 +557,34 @@ def fit(
531557
)
532558

533559
# ------------------------------------------------------------------
534-
# Step 5a: Ragged panel validation
560+
# Step 5a: Compute the TWFE diagnostic on the FULL pre-filter cell
561+
# dataset, so the diagnostic reflects the data the user
562+
# actually passed in. This MUST run BEFORE Step 5b (the
563+
# ragged-panel filter) so that the fitted diagnostic and
564+
# the standalone twowayfeweights() function produce
565+
# identical results on ragged panels — both operate on
566+
# the same _validate_and_aggregate_to_cells() output.
567+
# ------------------------------------------------------------------
568+
twfe_diagnostic_payload = None
569+
if self.twfe_diagnostic:
570+
try:
571+
twfe_diagnostic_payload = _compute_twfe_diagnostic(
572+
cell=cell,
573+
group_col=group,
574+
time_col=time,
575+
rank_deficient_action=self.rank_deficient_action,
576+
)
577+
except Exception as exc: # noqa: BLE001
578+
warnings.warn(
579+
f"TWFE decomposition diagnostic failed: {exc}. "
580+
"Skipping diagnostic; main estimation continues.",
581+
UserWarning,
582+
stacklevel=2,
583+
)
584+
twfe_diagnostic_payload = None
585+
586+
# ------------------------------------------------------------------
587+
# Step 5b: Ragged panel validation
535588
#
536589
# The cohort/variance path treats D_{g,1} as the canonical
537590
# baseline and walks adjacent observed periods to detect first
@@ -613,29 +666,6 @@ def fit(
613666
f"got {len(all_periods_pre_drop)}"
614667
)
615668

616-
# ------------------------------------------------------------------
617-
# Step 5b: Compute the TWFE diagnostic on the FULL pre-filter cell
618-
# dataset, so the diagnostic reflects the data the user
619-
# actually passed in (per the plan).
620-
# ------------------------------------------------------------------
621-
twfe_diagnostic_payload = None
622-
if self.twfe_diagnostic:
623-
try:
624-
twfe_diagnostic_payload = _compute_twfe_diagnostic(
625-
cell=cell,
626-
group_col=group,
627-
time_col=time,
628-
rank_deficient_action=self.rank_deficient_action,
629-
)
630-
except Exception as exc: # noqa: BLE001
631-
warnings.warn(
632-
f"TWFE decomposition diagnostic failed: {exc}. "
633-
"Skipping diagnostic; main estimation continues.",
634-
UserWarning,
635-
stacklevel=2,
636-
)
637-
twfe_diagnostic_payload = None
638-
639669
# ------------------------------------------------------------------
640670
# Step 6: Drop A5-violating (multi-switch) cells per drop_larger_lower
641671
# ------------------------------------------------------------------

docs/api/chaisemartin_dhaultfoeuille.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The estimator:
1919

2020
1. Aggregates individual-level panel data to ``(group, time)`` cells
2121
2. Drops multi-switch groups by default (matches R ``DIDmultiplegtDYN``)
22-
3. Filters singleton-baseline groups (footnote 15 of the dynamic paper)
22+
3. Excludes singleton-baseline groups from the variance computation only (footnote 15 of the dynamic paper)
2323
4. Computes per-period joiner (``DID_{+,t}``) and leaver (``DID_{-,t}``)
2424
contributions via Theorem 3 of the AER 2020 paper
2525
5. Aggregates them into ``DID_M``, the joiners-only ``DID_+``, and the

docs/methodology/REGISTRY.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -556,6 +556,8 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
556556

557557
- **Note:** When every variance-eligible group forms its own `(D_{g,1}, F_g, S_g)` cohort (a degenerate small-panel case where the cohort framework has zero degrees of freedom), the cohort-recentered plug-in formula is unidentified: cohort recentering subtracts the cohort mean from each group's `U^G_g`, and for singleton cohorts the centered value is exactly zero, so the centered influence function vector collapses to all zeros. The estimator returns `overall_se = NaN` with a `UserWarning` rather than silently collapsing to `0.0` (which would falsely imply infinite precision). The `DID_M` point estimate remains well-defined. The bootstrap path inherits the same degeneracy on these panels — the multiplier weights act on an all-zero vector, so the bootstrap distribution is also degenerate. **Deviation from R `DIDmultiplegtDYN`:** R returns a non-zero SE on the canonical 4-group worked example via small-sample sandwich machinery that Python does not implement. Both responses are valid for a degenerate case; Python's `NaN`+warning is the safer default. To get a non-degenerate SE, include more groups so cohorts have peers (real-world panels typically have `G >> K`).
558558

559+
- **Note (Phase 1 cluster contract):** `ChaisemartinDHaultfoeuille` always clusters at the group level. The cohort-recentered analytical SE plug-in operates on per-group influence-function values (one `U^G_g` per group); the multiplier bootstrap generates one weight per group; both inference paths cluster at the user's `group` column with no other option. The constructor accepts `cluster=None` (the default and only supported value); passing any non-`None` value raises `NotImplementedError` with a Phase 1 pointer at construction time (and the same gate fires from `set_params`). Custom clustering at a coarser or finer level than the group is reserved for a future phase. The matching test is `test_cluster_parameter_raises_not_implemented` in `tests/test_chaisemartin_dhaultfoeuille.py::TestForwardCompatGates`.
560+
559561
- **Note:** Placebo Assumption 11 violations (placebo joiners exist but no 3-period stable_0 controls, or symmetric for leavers/stable_1) trigger zero-retention in the placebo numerator AND emit a consolidated `Placebo (DID_M^pl) Assumption 11 violations` warning from `fit()`, mirroring the main DID path's contract documented above. The zeroed placebo periods retain their switcher counts in the placebo `N_S^pl` denominator, biasing `DID_M^pl` toward zero in the offending direction (matching the Theorem 4 paper convention).
560562

561563
- **Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula (Web Appendix Section 3.7.3 of the dynamic paper, which assumes Assumption 5 / no-crossing) to be consistent with the AER 2020 Theorem 3 point estimate. Both formulas operate on the same post-drop dataset. Setting `drop_larger_lower=False` is supported for diagnostic comparison but produces an inconsistent estimator-variance pairing for any multi-switch groups present, and emits an explicit warning.
@@ -566,7 +568,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
566568

567569
- **Note (deviation from R DIDmultiplegtDYN):** Python uses **period-based** stable-control sets — `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`, and similarly for `stable_1(t)`. R `DIDmultiplegtDYN` uses **cohort-based** stable-control sets that additionally require `D_{g,1}` to match the side. Python's definition matches the AER 2020 Theorem 3 cell-count notation `N_{0,0,t}` and `N_{1,1,t}` literally; R's definition matches the dynamic companion paper's cohort `(D_{g,1}, F_g, S_g)` framework. The two definitions agree exactly on (a) panels containing only joiners, (b) panels containing only leavers, (c) the hand-calculable 4-group worked example, or (d) any panel where no joiner's post-switch state overlaps a period when leavers are switching. They disagree by O(1%) on the **point estimate** when both joiners and leavers exist AND some joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 fix that implemented the full `Lambda^G_{g,l=1}` influence function, the **standard error** parity gap on pure-direction scenarios narrowed from ~18% to ~3%. The R parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` use a tight `1e-4` tolerance for pure-direction point estimates, a 5% rtol for pure-direction SEs, and a 2.5% tolerance for mixed-direction point estimates (with the SE check skipped on mixed scenarios because the period-vs-cohort point-estimate deviation cascades into the variance).
568570

569-
- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5a validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5a `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.
571+
- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.
570572

571573
**Reference implementation(s):**
572574
- R: [`DIDmultiplegtDYN`](https://cran.r-project.org/package=DIDmultiplegtDYN) (CRAN, maintained by the paper authors). The Python implementation matches `did_multiplegt_dyn(..., effects=1)` at horizon `l = 1`. Parity tests live in `tests/test_chaisemartin_dhaultfoeuille_parity.py`.

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 50 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,53 @@ def test_survey_design_raises_not_implemented(self, data):
323323
survey_design=object(),
324324
)
325325

326+
def test_cluster_parameter_raises_not_implemented(self, data):
327+
"""
328+
Per Phase 1 cluster contract: dCDH always clusters at the
329+
group level via the cohort-recentered influence function
330+
(analytical SEs) and the multiplier bootstrap (also grouped at
331+
the group column). Custom clustering is not supported in
332+
Phase 1.
333+
334+
The reviewer flagged that ``cluster`` was previously accepted
335+
on ``__init__`` and stored on ``self.cluster`` but never
336+
actually read by ``fit()`` or ``_compute_dcdh_bootstrap()``,
337+
making it a silent no-op. This test pins the new contract: any
338+
non-None cluster value raises ``NotImplementedError`` at
339+
construction time with a message naming the offending value
340+
and pointing at the Phase 1 reservation. The same gate fires
341+
from ``set_params``.
342+
343+
See REGISTRY.md ``Note (Phase 1 cluster contract)``.
344+
"""
345+
# __init__ rejects any non-None cluster
346+
with pytest.raises(NotImplementedError, match=r"cluster.*Phase 1"):
347+
ChaisemartinDHaultfoeuille(cluster="state")
348+
with pytest.raises(NotImplementedError, match=r"cluster.*Phase 1"):
349+
ChaisemartinDHaultfoeuille(cluster="unit")
350+
351+
# set_params after construction also rejects
352+
est = ChaisemartinDHaultfoeuille()
353+
with pytest.raises(NotImplementedError, match=r"cluster.*Phase 1"):
354+
est.set_params(cluster="state")
355+
356+
# cluster=None still works (the only supported value)
357+
est_default = ChaisemartinDHaultfoeuille(cluster=None)
358+
assert est_default.cluster is None
359+
assert est_default.get_params()["cluster"] is None
360+
361+
# The convenience function also rejects (forward-compat gate
362+
# propagates through the wrapper at __init__ time)
363+
with pytest.raises(NotImplementedError, match=r"cluster.*Phase 1"):
364+
chaisemartin_dhaultfoeuille(
365+
data,
366+
outcome="outcome",
367+
group="group",
368+
time="period",
369+
treatment="treatment",
370+
cluster="state",
371+
)
372+
326373

327374
# =============================================================================
328375
# drop_larger_lower (Critical #1)
@@ -461,7 +508,7 @@ def test_singleton_baseline_filter_variance_only(self):
461508

462509
def test_missing_baseline_period_raises_value_error(self):
463510
"""
464-
Per fit() Step 5a: groups missing the first global period have
511+
Per fit() Step 5b: groups missing the first global period have
465512
an undefined baseline D_{g,1} and must be rejected with a clear
466513
error rather than crashing the cohort enumeration with NaN.
467514
"""
@@ -480,7 +527,7 @@ def test_missing_baseline_period_raises_value_error(self):
480527

481528
def test_interior_gap_drops_group_with_warning(self):
482529
"""
483-
Per fit() Step 5a: groups with missing intermediate periods
530+
Per fit() Step 5b: groups with missing intermediate periods
484531
(interior gaps between their first and last observed period)
485532
are dropped with an explicit warning. The cohort/variance path
486533
requires consecutive observed periods to detect first switches
@@ -508,7 +555,7 @@ def test_interior_gap_drops_group_with_warning(self):
508555

509556
def test_terminal_missingness_retained(self):
510557
"""
511-
Per fit() Step 5a contract: groups observed at the baseline but
558+
Per fit() Step 5b contract: groups observed at the baseline but
512559
missing one or more LATER periods (terminal missingness / early
513560
exit / right-censoring) are RETAINED. The group contributes from
514561
its observed periods only, masked out of missing transitions by

0 commit comments

Comments
 (0)