You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Documents the dCDH TWFE diagnostic sample contract that Round 7's
swap left implicit. The fitted results.twfe_* values are computed on
the FULL pre-filter cell sample (matching the standalone
twowayfeweights() function), NOT on the post-filter estimation
sample used by overall_att / results.groups / inference fields. The
existing user-facing wording said "TWFE on the same data" /
"diagnostic from the same fit" — phrases that naturally read as
"same data as overall_att" — which contradicted the post-Round-7
behavior. This commit:
1. Adds a new `**Note (TWFE diagnostic sample contract):**` block in
REGISTRY.md enumerating all three sample-shaping filters
(interior-gap, multi-switch, singleton-baseline) and explicitly
carving singleton-baseline as variance-only (no fitted-vs-overall_att
mismatch, so no warning).
2. Rewrites the `twfe_diagnostic` parameter docstring in
chaisemartin_dhaultfoeuille.py to describe the pre-filter contract
and the divergence warning.
3. Rewrites the twfe_weights / twfe_fraction_negative / twfe_sigma_fe
/ twfe_beta_fe field docstrings in the results dataclass to clarify
they describe the FULL pre-filter cell sample, with a pointer to
the REGISTRY contract Note.
4. Adds a `UserWarning` from `fit()` whenever the user requested the
TWFE diagnostic AND any of the interior-gap or multi-switch filters
dropped groups. The warning explains the divergence with explicit
counts and points at REGISTRY for the rationale. The warning fires
regardless of whether the diagnostic itself succeeded or hit the
rank-deficient fallback (the plan-review correctly flagged that the
`twfe_diagnostic_payload is not None` guard would swallow the rare
rank-deficient + filtered-panel intersection — dropped that guard).
5. Updates docs/api/chaisemartin_dhaultfoeuille.rst and
docs/choosing_estimator.rst to replace "from the same fit" with
"computed on the data you pass in (pre-filter)".
6. Adds three regression tests in TestTwowayFeweightsHelper:
- test_twfe_pre_filter_contract_with_interior_gap_drop: panel with
a dropped interior-gap group, asserts fitted twfe_* matches
standalone, estimation sample is smaller, and the divergence
warning fires with the expected counts.
- test_twfe_pre_filter_contract_with_multi_switch_drop: panel with
an injected multi-switch crosser, similar assertions.
- test_twfe_no_divergence_warning_on_clean_panel: negative test
asserting NO divergence warning fires on a clean panel
(hard-codes pattern="single_switch" to close a future footgun).
7. Fixes the stale "Step 5a guarantees..." comment at line 712 to
"Step 5b guarantees..." (post-Round-7 the ragged-panel validation
is Step 5b, not Step 5a). Independent cleanup; bundled because
it's in the same file and the same topic.
This resolution preserves Round 7's standalone-vs-fitted parity
(both APIs use the pre-filter cell sample) and addresses Round 9's
P1 about the documentation contract. Both reviewers' concerns are
now satisfied: the standalone and fitted produce identical numbers
on the same input, AND users see an explicit warning when filters
make the fitted sample diverge from the dCDH estimation sample.
Test counts: 107 -> 110 (three new sample-contract regression
tests). Black, ruff clean.
Files modified:
- docs/methodology/REGISTRY.md
(new TWFE sample contract Note enumerating all three filters)
- diff_diff/chaisemartin_dhaultfoeuille.py
(twfe_diagnostic param docstring, n_groups_dropped_interior_gap
tracking, divergence warning at Step 6b, stale comment fix)
- diff_diff/chaisemartin_dhaultfoeuille_results.py
(twfe_weights / twfe_fraction_negative / twfe_sigma_fe /
twfe_beta_fe field docstrings)
- docs/api/chaisemartin_dhaultfoeuille.rst (wording fix)
- docs/choosing_estimator.rst (wording fix)
- tests/test_chaisemartin_dhaultfoeuille.py (3 new tests + 1
parity test comment update)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/methodology/REGISTRY.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -560,6 +560,8 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
560
560
561
561
-**Note:** Placebo Assumption 11 violations (placebo joiners exist but no 3-period stable_0 controls, or symmetric for leavers/stable_1) trigger zero-retention in the placebo numerator AND emit a consolidated `Placebo (DID_M^pl) Assumption 11 violations` warning from `fit()`, mirroring the main DID path's contract documented above. The zeroed placebo periods retain their switcher counts in the placebo `N_S^pl` denominator, biasing `DID_M^pl` toward zero in the offending direction (matching the Theorem 4 paper convention).
562
562
563
+
- **Note (TWFE diagnostic sample contract):** The fitted `results.twfe_weights` / `results.twfe_fraction_negative` / `results.twfe_sigma_fe` / `results.twfe_beta_fe` are computed on the **FULL pre-filter cell sample** — the data the user passed in, after `_validate_and_aggregate_to_cells()` runs but **before** the ragged-panel validation (Step 5b) and the multi-switch filter (`drop_larger_lower`, Step 6). They do NOT describe the post-filter estimation sample used by `overall_att`, `results.groups`, and the inference fields. `fit()` has three sample-shaping filters in total: (1) interior-gap drops in Step 5b, (2) multi-switch drops in Step 6, and (3) the singleton-baseline filter in Step 7. Filters (1) and (2) actually shrink the point-estimate sample, so when either fires, the fitted TWFE diagnostic and `overall_att` describe **different samples** and the estimator emits a `UserWarning` explaining the divergence with explicit counts. Filter (3) is **variance-only** — singleton-baseline groups remain in the point-estimate sample as period-based stable controls (see the singleton-baseline Note above) — so it does NOT create a fitted-vs-`overall_att` mismatch and does NOT trigger the divergence warning. Rationale for the pre-filter design: the TWFE diagnostic answers "what would the plain TWFE estimator say on the data you passed in?" — not "what would TWFE say on the data dCDH actually used after filtering?" — so users comparing TWFE vs dCDH on a fixed input can do so without an interaction effect from the dCDH-specific filters. The standalone `twowayfeweights()` function uses the same pre-filter sample, so the fitted and standalone APIs always produce identical numbers on the same input. To reproduce the dCDH estimation sample for an external TWFE comparison, pre-process your data to drop the multi-switch and interior-gap groups before fitting (the warning lists offending IDs). The matching tests are `test_twfe_pre_filter_contract_with_interior_gap_drop` and `test_twfe_pre_filter_contract_with_multi_switch_drop` in `tests/test_chaisemartin_dhaultfoeuille.py`.
564
+
563
565
-**Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula (Web Appendix Section 3.7.3 of the dynamic paper, which assumes Assumption 5 / no-crossing) to be consistent with the AER 2020 Theorem 3 point estimate. Both formulas operate on the same post-drop dataset. Setting `drop_larger_lower=False` is supported for diagnostic comparison but produces an inconsistent estimator-variance pairing for any multi-switch groups present, and emits an explicit warning.
564
566
565
567
-**Note:** When Assumption 11 (existence of stable controls) is violated for some period `t` — i.e., joiners exist but no stable-untreated controls, or leavers exist but no stable-treated controls — `DID_{+,t}` (or `DID_{-,t}`) is set to zero by paper convention, and the period's switcher count is **retained** in the `N_S` denominator. This means the affected period contributes a zero to the numerator with a non-zero weight in the denominator, biasing `DID_M` toward zero in the offending direction. Users can detect this by inspecting `results.per_period_effects[t]['did_plus_t_a11_zeroed']` (or `did_minus_t_a11_zeroed`) or the consolidated `fit()` warning. This matches the AER 2020 Theorem 3 paper convention and the worked example arithmetic.
0 commit comments