Skip to content

Commit df7a0a3

Browse files
igerberclaude
andcommitted
Round 3: ragged panels, validation refactor, metadata fixes
Addresses two new P1s, two P2s, and two P3s from CI re-review: P1: ragged panel handling - Add fit() Step 5a validation: reject groups missing the first global period with a clear ValueError listing offenders, and drop groups with interior period gaps with an explicit UserWarning. Eliminates the silent NaN propagation in _compute_cohort_recentered_inputs that could crash the cohort enumeration with int(np.nan) or misclassify late-entry groups' baselines. - Fix singleton-baseline computation in fit() Step 7 to read the validated first global period explicitly instead of using groupby.first() which returned the first OBSERVED row per group. - Add defensive presence-gating in _compute_cohort_recentered_inputs: the helper now refuses to run if N_mat[:, 0] has any zero entries (a fit() validation regression tripwire), and the first-switch detection loop only counts transitions between adjacent OBSERVED periods. Even with fit() validation in place, the helper is now safe to call directly. - Add 2 regression tests: test_missing_baseline_period_raises_value_error and test_interior_gap_drops_group_with_warning. P1: twowayfeweights validation refactor - Extract _validate_and_aggregate_to_cells helper enforcing the dCDH validation contract (NaN treatment / NaN outcome / non-binary treatment all raise ValueError; within-cell varying treatment emits UserWarning before majority rounding). - Both fit() and twowayfeweights() now call the helper. Single source of truth for the validation rules, no drift between the two public entry points. - Add 4 regression tests for twowayfeweights() validation: test_twowayfeweights_rejects_nan_treatment, test_twowayfeweights_rejects_nan_outcome, test_twowayfeweights_rejects_non_binary_treatment, test_twowayfeweights_warns_on_within_cell_rounding. P2: joiner/leaver metadata - Fix n_joiner_cells = int(n_10_t_arr.sum()) (was count_nonzero counting PERIODS, not cells). Same for n_leaver_cells. - Compute n_joiner_obs and n_leaver_obs as actual observation counts (sum of n_gt over the joiner/leaver cells across periods), not as cell totals. For balanced one-obs-per-cell panels they equal n_*_cells; for individual-level inputs with multiple obs per cell they can be larger. Update results dataclass docstrings. P2: parity tests run on JSON without R - Decouple golden_values fixture from require_r_dcdh. Tests now run whenever the JSON file exists. R is only needed to regenerate the JSON via benchmarks/R/generate_dcdh_dynr_test_values.R. - Verified by running DIFF_DIFF_R=skip pytest tests/test_chaisemartin_dhaultfoeuille_parity.py — all 5 parity scenarios PASS (was previously skipping entirely). P3: summary() label rename - Rename "Groups dropped before estimation:" to "Group filter / metadata counts:". Label never-switching as "(reported, not dropped)". Reflects the Round 2 change where never-switching groups participate in the variance via stable-control roles. P3: CHANGELOG/ROADMAP consistency - Remove the CHANGELOG bullet that claimed three paper review files were committed under docs/methodology/papers/. Replace with a reference to REGISTRY.md as the canonical methodology surface (matching the ROADMAP wording). Tests: 103 dCDH passing (97 + 6 new). Worked example DID_M = 2.5 still exact. Pure-direction R parity tests still pass at 1e-4 / 5% rtol. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0d70f4d commit df7a0a3

5 files changed

Lines changed: 417 additions & 108 deletions

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1919
- Validated against R `DIDmultiplegtDYN` v2.3.3 at horizon `l = 1` via `tests/test_chaisemartin_dhaultfoeuille_parity.py`
2020
- **`twowayfeweights()`** — standalone helper function for the TWFE decomposition diagnostic (Theorem 1 of de Chaisemartin & D'Haultfœuille 2020), available without instantiating the full estimator. Returns a `TWFEWeightsResult` with per-cell weights, fraction negative, `sigma_fe`, and `beta_fe`.
2121
- **`generate_reversible_did_data()`** — new generator in `diff_diff.prep` producing reversible-treatment panel data for testing and tutorials. Patterns: `single_switch` (default, A5-safe), `joiners_only`, `leavers_only`, `mixed_single_switch`, `random`, `cycles`, `marketing`. Returns columns `group`, `period`, `treatment`, `outcome`, `true_effect`, `d_lag`, `switcher_type`.
22-
- **Three paper reviews** committed under `docs/methodology/papers/`: AER 2020 main paper, AER 2020 online appendix, and the dynamic companion (NBER WP 29873) — the third is the source for the cohort-recentered variance formula.
22+
- **REGISTRY.md `## ChaisemartinDHaultfoeuille` section** — single canonical source for dCDH methodology, equations, edge cases, and all documented deviations from the R `DIDmultiplegtDYN` reference implementation. Cites the AER 2020 paper and the dynamic companion paper (NBER WP 29873) by reference; primary papers are upstream sources, not in-repo files.
2323

2424
## [3.0.1] - 2026-04-07
2525

0 commit comments

Comments
 (0)