You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Round 7: cluster gate + TWFE diagnostic order + singleton-baseline language
Resolves three findings from the prior CI review:
P1 — cluster parameter was a public no-op. The constructor exposed
`cluster: Optional[str] = None` and stored it on `self.cluster`, but
neither `fit()` nor `_compute_dcdh_bootstrap()` ever read it, so
`cluster="state"` silently produced the same group-level inference as
`cluster=None`. The Phase 1 contract is now: dCDH always clusters at
the group level (via the cohort-recentered influence function and the
multiplier bootstrap), and any non-None cluster value raises
`NotImplementedError` at construction time. The same gate fires from
`set_params`. Added `test_cluster_parameter_raises_not_implemented`
covering all four entry points (`__init__`, `set_params`, the
`cluster=None` happy path, and the convenience function).
P3 — TWFE diagnostic was running on the post-Step-5a sample but the
inline comment claimed "FULL pre-filter cell dataset" and the
standalone `twowayfeweights()` function used the truly pre-filter
cell. On ragged panels with interior gaps, the two paths diverged.
Fixed by reordering: the TWFE diagnostic now runs as Step 5a (was
5b) BEFORE the ragged-panel validation, which is now Step 5b (was
5a). The blocks are independent — the diagnostic just reads `cell`
while the ragged-panel block mutates it. Both API surfaces now
operate on the same `_validate_and_aggregate_to_cells()` output.
P3 — API rst step 3 said "Filters singleton-baseline groups" which
read as a point-estimate sample drop. After Round 3, singleton-
baseline groups are excluded from the variance computation only
(they remain in the point-estimate sample as period-based stable
controls). Fixed the language to match.
Documentation:
- REGISTRY.md gets a new `**Note (Phase 1 cluster contract):**`
block in the dCDH section explaining the always-group-level
semantics and the construction-time NotImplementedError gate.
- API rst step 3 reflects the variance-only singleton-baseline scope.
- The dCDH `cluster` parameter docstring now describes the Phase 1
contract instead of "Currently unused — analytical SEs are always
at the group level via the cohort-recentered plug-in."
Test counts: 106 -> 107 (one new cluster gate regression test).
Files modified:
- diff_diff/chaisemartin_dhaultfoeuille.py (cluster gate in __init__
and set_params, docstring update, TWFE diagnostic block reorder
with renumbered Step 5a/5b labels)
- docs/methodology/REGISTRY.md (cluster contract Note + rename
Step 5a -> Step 5b in the ragged-panel deviation Note)
- docs/api/chaisemartin_dhaultfoeuille.rst (singleton-baseline
language fix in step 3 of the overview)
- tests/test_chaisemartin_dhaultfoeuille.py
(test_cluster_parameter_raises_not_implemented in
TestForwardCompatGates; rename Step 5a -> Step 5b in three
ragged-panel test docstrings)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/methodology/REGISTRY.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -556,6 +556,8 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
556
556
557
557
- **Note:** When every variance-eligible group forms its own `(D_{g,1}, F_g, S_g)` cohort (a degenerate small-panel case where the cohort framework has zero degrees of freedom), the cohort-recentered plug-in formula is unidentified: cohort recentering subtracts the cohort mean from each group's `U^G_g`, and for singleton cohorts the centered value is exactly zero, so the centered influence function vector collapses to all zeros. The estimator returns `overall_se = NaN` with a `UserWarning` rather than silently collapsing to `0.0` (which would falsely imply infinite precision). The `DID_M` point estimate remains well-defined. The bootstrap path inherits the same degeneracy on these panels — the multiplier weights act on an all-zero vector, so the bootstrap distribution is also degenerate. **Deviation from R `DIDmultiplegtDYN`:** R returns a non-zero SE on the canonical 4-group worked example via small-sample sandwich machinery that Python does not implement. Both responses are valid for a degenerate case; Python's `NaN`+warning is the safer default. To get a non-degenerate SE, include more groups so cohorts have peers (real-world panels typically have `G >> K`).
558
558
559
+
-**Note (Phase 1 cluster contract):**`ChaisemartinDHaultfoeuille` always clusters at the group level. The cohort-recentered analytical SE plug-in operates on per-group influence-function values (one `U^G_g` per group); the multiplier bootstrap generates one weight per group; both inference paths cluster at the user's `group` column with no other option. The constructor accepts `cluster=None` (the default and only supported value); passing any non-`None` value raises `NotImplementedError` with a Phase 1 pointer at construction time (and the same gate fires from `set_params`). Custom clustering at a coarser or finer level than the group is reserved for a future phase. The matching test is `test_cluster_parameter_raises_not_implemented` in `tests/test_chaisemartin_dhaultfoeuille.py::TestForwardCompatGates`.
560
+
559
561
-**Note:** Placebo Assumption 11 violations (placebo joiners exist but no 3-period stable_0 controls, or symmetric for leavers/stable_1) trigger zero-retention in the placebo numerator AND emit a consolidated `Placebo (DID_M^pl) Assumption 11 violations` warning from `fit()`, mirroring the main DID path's contract documented above. The zeroed placebo periods retain their switcher counts in the placebo `N_S^pl` denominator, biasing `DID_M^pl` toward zero in the offending direction (matching the Theorem 4 paper convention).
560
562
561
563
-**Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula (Web Appendix Section 3.7.3 of the dynamic paper, which assumes Assumption 5 / no-crossing) to be consistent with the AER 2020 Theorem 3 point estimate. Both formulas operate on the same post-drop dataset. Setting `drop_larger_lower=False` is supported for diagnostic comparison but produces an inconsistent estimator-variance pairing for any multi-switch groups present, and emits an explicit warning.
@@ -566,7 +568,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
566
568
567
569
- **Note (deviation from R DIDmultiplegtDYN):** Python uses **period-based** stable-control sets — `stable_0(t)` is any cell with `D_{g,t-1} = D_{g,t} = 0` regardless of baseline `D_{g,1}`, and similarly for `stable_1(t)`. R `DIDmultiplegtDYN` uses **cohort-based** stable-control sets that additionally require `D_{g,1}` to match the side. Python's definition matches the AER 2020 Theorem 3 cell-count notation `N_{0,0,t}` and `N_{1,1,t}` literally; R's definition matches the dynamic companion paper's cohort `(D_{g,1}, F_g, S_g)` framework. The two definitions agree exactly on (a) panels containing only joiners, (b) panels containing only leavers, (c) the hand-calculable 4-group worked example, or (d) any panel where no joiner's post-switch state overlaps a period when leavers are switching. They disagree by O(1%) on the **point estimate** when both joiners and leavers exist AND some joiners' post-switch cells could serve as leavers' controls (or vice versa). After the Round 2 fix that implemented the full `Lambda^G_{g,l=1}` influence function, the **standard error** parity gap on pure-direction scenarios narrowed from ~18% to ~3%. The R parity tests in `tests/test_chaisemartin_dhaultfoeuille_parity.py` use a tight `1e-4` tolerance for pure-direction point estimates, a 5% rtol for pure-direction SEs, and a 2.5% tolerance for mixed-direction point estimates (with the SE check skipped on mixed scenarios because the period-vs-cohort point-estimate deviation cascades into the variance).
568
570
569
-
- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5a validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5a `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.
571
+
- **Note (deviation from R DIDmultiplegtDYN):** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. The Step 5b validation in `fit()` enforces this contract: groups missing the baseline raise `ValueError`; groups with interior gaps are dropped with a `UserWarning`; groups with **terminal missingness** (early exit / right-censoring — observed at the baseline but missing one or more later periods) are retained and contribute from their observed periods only. R `DIDmultiplegtDYN` accepts unbalanced panels with documented missing-treatment-before-first-switch handling. Python's restriction is a Phase 1 limitation: the cohort enumeration uses `D_{g,1}` as the canonical baseline (so the baseline observation must exist) and the first-switch detection walks adjacent observed periods (so interior gaps create ambiguous transition counts). Terminal missingness is supported because the per-period `present = (N_mat[:, t] > 0) & (N_mat[:, t-1] > 0)` guard appears at three sites in the variance computation (`_compute_per_period_dids`, `_compute_full_per_group_contributions`, `_compute_cohort_recentered_inputs`) and cleanly masks out missing transitions without propagating NaN into the arithmetic. **Workaround for unbalanced panels:** pre-process your data to back-fill the baseline (or drop late-entry groups before fitting), or use R `DIDmultiplegtDYN` until a future phase lifts the restriction. The Step 5b `ValueError` and `UserWarning` messages name the offending group IDs so you can locate them quickly.
570
572
571
573
**Reference implementation(s):**
572
574
- R: [`DIDmultiplegtDYN`](https://cran.r-project.org/package=DIDmultiplegtDYN) (CRAN, maintained by the paper authors). The Python implementation matches `did_multiplegt_dyn(..., effects=1)` at horizon `l = 1`. Parity tests live in `tests/test_chaisemartin_dhaultfoeuille_parity.py`.
0 commit comments