Skip to content

Commit b14aa22

Browse files
igerberclaude
andcommitted
Document post-period attribution convention + invariant tests
Adds a row-sum identity test for the cell-period allocator on a hand-computed 4-group x 3-period panel, verifying: (1) per-group IF matches the closed-form joiner/stable_0 decomposition; (2) per-period attribution sums across time to the per-group IF exactly; (3) post- period attribution places all mass at t=2 with zeros at earlier columns; (4) column-wise cohort centering preserves the row-sum identity. REGISTRY.md frames the post-period attribution as a library convention — adopted because it preserves the group/PSU-sum identities of the prior group-level expansion and produces approximately nominal MC coverage on the test DGP — rather than a theorem derived from the observation-level survey linearization. Inline comments in _compute_full_per_group_contributions, _compute_per_group_if_multi_horizon, and _compute_per_group_if_placebo_horizon mirror that framing. Updates stale local type annotations for multi_horizon_if / placebo_horizon_if to the Tuple[np.ndarray, np.ndarray] returns. TODO.md gains a Methodology/Correctness entry tracking the open validation question (formal derivation against observation-level survey linearization, or replacement with a covariance-aware alternative). All 348 tests pass (slow MC coverage sim included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f524611 commit b14aa22

4 files changed

Lines changed: 105 additions & 6 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ Deferred items from PR reviews that were not addressed before merge.
5757
| Issue | Location | PR | Priority |
5858
|-------|----------|----|----------|
5959
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
60+
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | PR 2 | Medium |
6061
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
6162
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
6263
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1601,7 +1601,7 @@ def fit(
16011601
# Step 12c: Multi-horizon per-group computation (L_max >= 1)
16021602
# ------------------------------------------------------------------
16031603
multi_horizon_dids: Optional[Dict[int, Dict[str, Any]]] = None
1604-
multi_horizon_if: Optional[Dict[int, np.ndarray]] = None
1604+
multi_horizon_if: Optional[Dict[int, Tuple[np.ndarray, np.ndarray]]] = None
16051605
multi_horizon_se: Optional[Dict[int, float]] = None
16061606
multi_horizon_inference: Optional[Dict[int, Dict[str, Any]]] = None
16071607

@@ -1747,7 +1747,7 @@ def fit(
17471747

17481748
# Phase 2: placebos, normalized effects, cost-benefit delta
17491749
multi_horizon_placebos: Optional[Dict[int, Dict[str, Any]]] = None
1750-
placebo_horizon_if: Optional[Dict[int, np.ndarray]] = None
1750+
placebo_horizon_if: Optional[Dict[int, Tuple[np.ndarray, np.ndarray]]] = None
17511751
placebo_horizon_se: Optional[Dict[int, float]] = None
17521752
placebo_horizon_inference: Optional[Dict[int, Dict[str, Any]]] = None
17531753
normalized_effects_dict: Optional[Dict[int, Dict[str, Any]]] = None
@@ -4408,13 +4408,18 @@ def _compute_per_group_if_multi_horizon(
44084408
# contribution to U_l is zero, but its count is in N_l.
44094409
continue
44104410

4411-
# Switcher contribution: +S_g * (Y_{g, out} - Y_{g, ref})
4411+
# Switcher contribution: +S_g * (Y_{g, out} - Y_{g, ref}).
4412+
# Per-cell attribution convention: assign the whole contrast
4413+
# to the outcome cell (g, out_idx). See REGISTRY.md's Note
4414+
# on survey IF expansion for the rationale behind this
4415+
# convention (library choice, not a derived result).
44124416
switcher_change = Y_mat[g, out_idx] - Y_mat[g, ref_idx]
44134417
U_l[g] += S_g * switcher_change
44144418
U_per_period_l[g, out_idx] += S_g * switcher_change
44154419

44164420
# Control contributions: each control g' in the pool gets
4417-
# -S_g * (1/n_ctrl) * (Y_{g', out} - Y_{g', ref})
4421+
# -S_g * (1/n_ctrl) * (Y_{g', out} - Y_{g', ref}). Same
4422+
# post-period attribution as the switcher side.
44184423
ctrl_changes = Y_mat[ctrl_pool, out_idx] - Y_mat[ctrl_pool, ref_idx]
44194424
ctrl_contrib = (S_g / n_ctrl) * ctrl_changes
44204425
U_l[ctrl_pool] -= ctrl_contrib
@@ -4516,7 +4521,10 @@ def _compute_per_group_if_placebo_horizon(
45164521
if n_ctrl == 0:
45174522
continue
45184523

4519-
# Switcher contribution: paper convention backward - ref
4524+
# Switcher contribution: paper convention backward - ref.
4525+
# Attribute the whole contrast to the backward cell
4526+
# (mirrors the multi-horizon / DID_M post-period
4527+
# attribution convention).
45204528
switcher_change = Y_mat[g, backward_idx] - Y_mat[g, ref_idx]
45214529
U_pl[g] += S_g * switcher_change
45224530
U_per_period_pl[g, backward_idx] += S_g * switcher_change
@@ -4936,6 +4944,14 @@ def _compute_full_per_group_contributions(
49364944
include_joiners_side = side in ("overall", "joiners")
49374945
include_leavers_side = side in ("overall", "leavers")
49384946

4947+
# Per-cell attribution convention (not a derivation from the
4948+
# observation-level survey linearization — see REGISTRY.md
4949+
# ``ChaisemartinDHaultfoeuille`` Note on survey IF expansion):
4950+
# attribute each (Y_curr - Y_prev) transition as a single
4951+
# difference to its post-period cell (g, t_idx). Preserves the
4952+
# row-sum identity U_per_period.sum(axis=1) == U and therefore
4953+
# the group-sum invariance that makes the cell expansion
4954+
# byte-identical to the pre-allocator convention under PSU=group.
49394955
for t_idx in range(1, n_periods):
49404956
d_curr = D_mat[:, t_idx]
49414957
d_prev = D_mat[:, t_idx - 1]

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -649,7 +649,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
649649
- [x] Design-2 switch-in/switch-out descriptive wrapper (Web Appendix Section 1.6)
650650
- [x] HonestDiD (Rambachan-Roth 2023) integration on placebo + event study surface
651651
- [x] Survey design support: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) **or replicate-weight variance (BRR/Fay/JK1/JKn/SDR)**, covering the main ATT surface, covariate adjustment (DID^X), heterogeneity testing, the TWFE diagnostic (fit and standalone `twowayfeweights()` helper), and HonestDiD bounds. Opt-in **PSU-level Hall-Mammen wild bootstrap** is also supported via `n_bootstrap > 0`.
652-
- **Note (Survey IF expansion — cell-period allocator):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The expansion decomposes the per-group IF `U[g]` into per-cell contributions `U[g, t]` (the per-period term of the sum that yields `U[g]` inside `_compute_full_per_group_contributions` and the per-horizon helpers), cohort-centers each column independently, and expands to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`, then applies the Binder (1983) stratified-PSU variance formula. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is fully supported. This is a strict relaxation of the earlier within-group constancy rule shipped before the cell-period allocator. Byte-identity on the pre-relaxation input set is guaranteed because PSU-level Binder aggregation telescopes per-cell sums to `U_centered[g]` under within-group-constant PSU, matching the previous group-level expansion up to single-ULP floating-point noise. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Scope limitations (follow-up PRs):** (a) `heterogeneity=` combined with within-group-varying PSU/strata raises `NotImplementedError` — the heterogeneity WLS `psi_obs` still uses the legacy group-level expansion, to be extended in PR 3; (b) `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in PR 4.
652+
- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Scope limitations (follow-up PRs):** (a) `heterogeneity=` combined with within-group-varying PSU/strata raises `NotImplementedError` — the heterogeneity WLS `psi_obs` still uses the legacy group-level expansion, to be extended in PR 3; (b) `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in PR 4.
653653
- **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Scope note (cell-period allocator):** The PSU-level bootstrap uses a group-level `group_id_to_psu_code` map and therefore requires PSU to be constant within each group. Combining `n_bootstrap > 0` with a PSU that varies within group raises `NotImplementedError`; the cell-level Hall-Mammen extension is deferred to a follow-up PR. The analytical TSL variance fully supports within-group-varying PSU via the cell-period allocator — use `n_bootstrap=0` for those designs. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design.
654654

655655
---

tests/test_survey_dcdh.py

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1378,6 +1378,88 @@ def test_off_horizon_row_duplication_does_not_change_se(self, base_data):
13781378
f"{r_dup.overall_se}) — auto-inject psu=group is not active."
13791379
)
13801380

1381+
def test_cell_allocator_row_sum_identity(self):
1382+
"""Cell-period allocator contract: for every group, the per-
1383+
period attribution sums across time to the per-group IF
1384+
(before cohort centering). This is the invariant that makes
1385+
PSU-level Binder aggregation telescope to ``U_centered[g]``
1386+
under within-group-constant PSU and therefore guarantees byte-
1387+
identity with the legacy group-level allocator on the old
1388+
accepted input set. Hand-computed on a 4-group × 3-period
1389+
panel: two never-treated (stable_0) and two joiners switching
1390+
at ``t = 2``.
1391+
"""
1392+
from diff_diff.chaisemartin_dhaultfoeuille import (
1393+
_compute_full_per_group_contributions,
1394+
_cohort_recenter,
1395+
_cohort_recenter_per_period,
1396+
)
1397+
1398+
# D_mat, Y_mat, N_mat shaped (n_groups=4, n_periods=3).
1399+
D_mat = np.array(
1400+
[
1401+
[0, 0, 0], # G0 never-treated
1402+
[0, 0, 0], # G1 never-treated
1403+
[0, 0, 1], # G2 joiner at t=2
1404+
[0, 0, 1], # G3 joiner at t=2
1405+
],
1406+
dtype=float,
1407+
)
1408+
Y_mat = np.array(
1409+
[
1410+
[1.0, 2.0, 3.0],
1411+
[2.1, 3.1, 4.2],
1412+
[0.5, 1.2, 5.4],
1413+
[1.3, 2.4, 6.1],
1414+
],
1415+
dtype=float,
1416+
)
1417+
N_mat = np.ones_like(D_mat, dtype=int)
1418+
# Per-period cell counts aligned to periods[1:]
1419+
# t=1: all stable_0 (4 in n_00); t=2: 2 joiners (n_10) + 2 stable_0 (n_00)
1420+
n_10_t_arr = np.array([0, 2], dtype=int)
1421+
n_00_t_arr = np.array([4, 2], dtype=int)
1422+
n_01_t_arr = np.array([0, 0], dtype=int)
1423+
n_11_t_arr = np.array([0, 0], dtype=int)
1424+
# A11 zeroed at t=1 (no joiners); active at t=2.
1425+
a11_plus_zeroed = np.array([True, False], dtype=bool)
1426+
a11_minus_zeroed = np.array([True, True], dtype=bool)
1427+
1428+
U, U_pp = _compute_full_per_group_contributions(
1429+
D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat,
1430+
n_10_t_arr=n_10_t_arr, n_00_t_arr=n_00_t_arr,
1431+
n_01_t_arr=n_01_t_arr, n_11_t_arr=n_11_t_arr,
1432+
a11_plus_zeroed_arr=a11_plus_zeroed,
1433+
a11_minus_zeroed_arr=a11_minus_zeroed,
1434+
side="overall",
1435+
)
1436+
1437+
# Hand computation at t=2 joiner side:
1438+
# G0: stable_0, -(2/2) * (3.0 - 2.0) = -1.0
1439+
# G1: stable_0, -(2/2) * (4.2 - 3.1) = -1.1
1440+
# G2: joiner, (5.4 - 1.2) = 4.2
1441+
# G3: joiner, (6.1 - 2.4) = 3.7
1442+
expected_U = np.array([-1.0, -1.1, 4.2, 3.7])
1443+
np.testing.assert_allclose(U, expected_U, atol=1e-12)
1444+
1445+
# Row-sum identity: U_per_period.sum(axis=1) == U exactly.
1446+
np.testing.assert_allclose(U_pp.sum(axis=1), U, atol=1e-12)
1447+
1448+
# Post-period attribution: all mass at t=2 (the transition's
1449+
# post cell); t=0 and t=1 columns are zero for every group.
1450+
np.testing.assert_array_equal(U_pp[:, 0], np.zeros(4))
1451+
np.testing.assert_array_equal(U_pp[:, 1], np.zeros(4))
1452+
np.testing.assert_allclose(U_pp[:, 2], expected_U, atol=1e-12)
1453+
1454+
# Cohort centering preserves the row-sum identity: per-period
1455+
# cohort centering and group-level cohort centering produce
1456+
# 2D and 1D arrays whose row sums agree to FP precision.
1457+
# Cohorts: A = {G0, G1} (never-treated), B = {G2, G3} (joiners).
1458+
cohort_ids = np.array([0, 0, 1, 1])
1459+
U_c = _cohort_recenter(U, cohort_ids)
1460+
U_pp_c = _cohort_recenter_per_period(U_pp, cohort_ids)
1461+
np.testing.assert_allclose(U_pp_c.sum(axis=1), U_c, atol=1e-12)
1462+
13811463
def test_within_cell_check_excludes_zero_weight_rows(self, base_data):
13821464
"""A zero-weight row with a different PSU label from its cell
13831465
must not trigger rejection — it is out-of-sample by the

0 commit comments

Comments
 (0)