Document post-period attribution convention + invariant tests

igerber · claude · igerber · commit b14aa22b4d81 · 2026-04-18T17:48:55.000-04:00
Adds a row-sum identity test for the cell-period allocator on a
hand-computed 4-group x 3-period panel, verifying: (1) per-group IF
matches the closed-form joiner/stable_0 decomposition; (2) per-period
attribution sums across time to the per-group IF exactly; (3) post-
period attribution places all mass at t=2 with zeros at earlier
columns; (4) column-wise cohort centering preserves the row-sum
identity.

REGISTRY.md frames the post-period attribution as a library
convention — adopted because it preserves the group/PSU-sum
identities of the prior group-level expansion and produces
approximately nominal MC coverage on the test DGP — rather than a
theorem derived from the observation-level survey linearization.
Inline comments in _compute_full_per_group_contributions,
_compute_per_group_if_multi_horizon, and
_compute_per_group_if_placebo_horizon mirror that framing.

Updates stale local type annotations for multi_horizon_if /
placebo_horizon_if to the Tuple[np.ndarray, np.ndarray] returns.

TODO.md gains a Methodology/Correctness entry tracking the open
validation question (formal derivation against observation-level
survey linearization, or replacement with a covariance-aware
alternative).

All 348 tests pass (slow MC coverage sim included).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/TODO.md b/TODO.md
@@ -57,6 +57,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | Issue | Location | PR | Priority |
 |-------|----------|----|----------|
 | dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
+| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | PR 2 | Medium |
 | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -1601,7 +1601,7 @@ def fit(
         # Step 12c: Multi-horizon per-group computation (L_max >= 1)
         # ------------------------------------------------------------------
         multi_horizon_dids: Optional[Dict[int, Dict[str, Any]]] = None
-        multi_horizon_if: Optional[Dict[int, np.ndarray]] = None
+        multi_horizon_if: Optional[Dict[int, Tuple[np.ndarray, np.ndarray]]] = None
         multi_horizon_se: Optional[Dict[int, float]] = None
         multi_horizon_inference: Optional[Dict[int, Dict[str, Any]]] = None
 
@@ -1747,7 +1747,7 @@ def fit(
 
         # Phase 2: placebos, normalized effects, cost-benefit delta
         multi_horizon_placebos: Optional[Dict[int, Dict[str, Any]]] = None
-        placebo_horizon_if: Optional[Dict[int, np.ndarray]] = None
+        placebo_horizon_if: Optional[Dict[int, Tuple[np.ndarray, np.ndarray]]] = None
         placebo_horizon_se: Optional[Dict[int, float]] = None
         placebo_horizon_inference: Optional[Dict[int, Dict[str, Any]]] = None
         normalized_effects_dict: Optional[Dict[int, Dict[str, Any]]] = None
@@ -4408,13 +4408,18 @@ def _compute_per_group_if_multi_horizon(
                 # contribution to U_l is zero, but its count is in N_l.
                 continue
 
-            # Switcher contribution: +S_g * (Y_{g, out} - Y_{g, ref})
+            # Switcher contribution: +S_g * (Y_{g, out} - Y_{g, ref}).
+            # Per-cell attribution convention: assign the whole contrast
+            # to the outcome cell (g, out_idx). See REGISTRY.md's Note
+            # on survey IF expansion for the rationale behind this
+            # convention (library choice, not a derived result).
             switcher_change = Y_mat[g, out_idx] - Y_mat[g, ref_idx]
             U_l[g] += S_g * switcher_change
             U_per_period_l[g, out_idx] += S_g * switcher_change
 
             # Control contributions: each control g' in the pool gets
-            # -S_g * (1/n_ctrl) * (Y_{g', out} - Y_{g', ref})
+            # -S_g * (1/n_ctrl) * (Y_{g', out} - Y_{g', ref}). Same
+            # post-period attribution as the switcher side.
             ctrl_changes = Y_mat[ctrl_pool, out_idx] - Y_mat[ctrl_pool, ref_idx]
             ctrl_contrib = (S_g / n_ctrl) * ctrl_changes
             U_l[ctrl_pool] -= ctrl_contrib
@@ -4516,7 +4521,10 @@ def _compute_per_group_if_placebo_horizon(
             if n_ctrl == 0:
                 continue
 
-            # Switcher contribution: paper convention backward - ref
+            # Switcher contribution: paper convention backward - ref.
+            # Attribute the whole contrast to the backward cell
+            # (mirrors the multi-horizon / DID_M post-period
+            # attribution convention).
             switcher_change = Y_mat[g, backward_idx] - Y_mat[g, ref_idx]
             U_pl[g] += S_g * switcher_change
             U_per_period_pl[g, backward_idx] += S_g * switcher_change
@@ -4936,6 +4944,14 @@ def _compute_full_per_group_contributions(
     include_joiners_side = side in ("overall", "joiners")
     include_leavers_side = side in ("overall", "leavers")
 
+    # Per-cell attribution convention (not a derivation from the
+    # observation-level survey linearization — see REGISTRY.md
+    # ``ChaisemartinDHaultfoeuille`` Note on survey IF expansion):
+    # attribute each (Y_curr - Y_prev) transition as a single
+    # difference to its post-period cell (g, t_idx). Preserves the
+    # row-sum identity U_per_period.sum(axis=1) == U and therefore
+    # the group-sum invariance that makes the cell expansion
+    # byte-identical to the pre-allocator convention under PSU=group.
     for t_idx in range(1, n_periods):
         d_curr = D_mat[:, t_idx]
         d_prev = D_mat[:, t_idx - 1]
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -649,7 +649,7 @@ Alternative: Multiplier bootstrap clustered at group via the `n_bootstrap` param
 - [x] Design-2 switch-in/switch-out descriptive wrapper (Web Appendix Section 1.6)
 - [x] HonestDiD (Rambachan-Roth 2023) integration on placebo + event study surface
 - [x] Survey design support: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) **or replicate-weight variance (BRR/Fay/JK1/JKn/SDR)**, covering the main ATT surface, covariate adjustment (DID^X), heterogeneity testing, the TWFE diagnostic (fit and standalone `twowayfeweights()` helper), and HonestDiD bounds. Opt-in **PSU-level Hall-Mammen wild bootstrap** is also supported via `n_bootstrap > 0`.
-- **Note (Survey IF expansion — cell-period allocator):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The expansion decomposes the per-group IF `U[g]` into per-cell contributions `U[g, t]` (the per-period term of the sum that yields `U[g]` inside `_compute_full_per_group_contributions` and the per-horizon helpers), cohort-centers each column independently, and expands to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`, then applies the Binder (1983) stratified-PSU variance formula. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is fully supported. This is a strict relaxation of the earlier within-group constancy rule shipped before the cell-period allocator. Byte-identity on the pre-relaxation input set is guaranteed because PSU-level Binder aggregation telescopes per-cell sums to `U_centered[g]` under within-group-constant PSU, matching the previous group-level expansion up to single-ULP floating-point noise. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Scope limitations (follow-up PRs):** (a) `heterogeneity=` combined with within-group-varying PSU/strata raises `NotImplementedError` — the heterogeneity WLS `psi_obs` still uses the legacy group-level expansion, to be extended in PR 3; (b) `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in PR 4.
+- **Note (Survey IF expansion — library convention):** Survey IF expansion is a library extension not in the dCDH papers (the paper's plug-in variance assumes iid sampling). The library convention builds observation-level `psi_i` by proportionally distributing per-group IF mass within weight share: either at the group level (`psi_i = U_centered[g] * w_i / W_g`, the previous convention) or at the per-`(g, t)` cell level via the cell-period allocator shipped in this release. Cell-level expansion: decompose `U[g]` into per-period attributions `U[g, t]`, cohort-center each column independently, then expand to observation level as `psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`. Binder (1983) stratified-PSU variance aggregates the resulting `psi` at PSU level. **Post-period attribution convention:** each transition term in the IF sum (of the form `role_weight * (Y_{g, t} - Y_{g, t-1})` for DID_M or `S_g * (Y_{g, out} - Y_{g, ref})` for DID_l) is attributed as a single *difference* to the POST-period cell, not split into a `+Y_post` / `-Y_pre` pair across two cells. This is a library *convention*, not a theorem — adopted because it preserves the group-sum, PSU-sum, and cohort-sum identities of the previous group-level expansion (so Binder variance coincides with the group-level variance under the auto-injected `psu=group`) and because Monte Carlo coverage at nominal 95% is empirically close to nominal on a DGP where PSUs vary across the cells of each group (see `tests/test_dcdh_cell_period_coverage.py`). A covariance-aware two-cell allocator is a plausible alternative and may be worth exploring if future designs motivate an explicit observation-level IF derivation; the method currently in the library is **not derived from the observation-level survey linearization of the contrast** and makes no stronger claim than "coverage is approximately nominal under the tested DGPs and the group-sum identity holds exactly." Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to `U_centered[g]` and Binder variance is byte-identical (up to single-ULP floating-point noise) to the previous group-level expansion. **Strata and PSU must be constant within each `(g, t)` cell** (trivially satisfied in one-obs-per-cell panels — the canonical dCDH structure); variation **across cells of a group** is supported by the allocator. Within-group-varying **weights** are supported as before. When `survey_design.psu` is not specified, `fit()` auto-injects `psu=<group column>` so the TSL variance, `df_survey`, and t-based inference match the per-group PSU structure. Under replicate-weight designs, the same cell-level `psi_i` is aggregated via Rao-Wu weight-ratio rescaling (`compute_replicate_if_variance` at `diff_diff/survey.py:1681`) rather than the Binder TSL formula. All five methods (BRR/Fay/JK1/JKn/SDR) are supported method-agnostically through the unified helper; the effective `df_survey` is reduced to `min(n_valid) - 1` across IF sites when some replicate solves fail (matching `efficient_did.py:1133-1135` and `triple_diff.py:676-686` precedents). Under DID^X, the first-stage residualization coefficient `theta_hat` is computed once on full-sample weights and treated as fixed (FWL plug-in IF convention) — per-replicate refits of `theta_hat` are not performed. **Scope limitations (follow-up PRs):** (a) `heterogeneity=` combined with within-group-varying PSU/strata raises `NotImplementedError` — the heterogeneity WLS `psi_obs` still uses the legacy group-level expansion, to be extended in PR 3; (b) `n_bootstrap > 0` combined with within-group-varying PSU raises `NotImplementedError` — the PSU-level Hall-Mammen wild bootstrap still uses the legacy group-level PSU map, to be extended in PR 4.
 - **Note (survey + bootstrap contract):** When `survey_design` and `n_bootstrap > 0` are both active, the bootstrap uses Hall-Mammen wild multiplier weights (Rademacher/Mammen/Webb) **at the PSU level**. Under the default auto-injected `psu=group`, the PSU coincides with the group so the wild bootstrap is a clean group-level clustered bootstrap (identity-map fast path, bit-identical to the non-survey multiplier bootstrap). When the user passes an explicit strictly-coarser PSU (e.g., `psu=state` with groups at county level), the IF contributions of all groups within a PSU receive the same bootstrap multiplier — the standard Hall-Mammen wild PSU bootstrap. Strata do not participate in the bootstrap randomization (they contribute only through the analytical TSL variance); this is conservative when strata differ substantially in variance. A `UserWarning` fires only when PSU is strictly coarser than group. **Scope note (cell-period allocator):** The PSU-level bootstrap uses a group-level `group_id_to_psu_code` map and therefore requires PSU to be constant within each group. Combining `n_bootstrap > 0` with a PSU that varies within group raises `NotImplementedError`; the cell-level Hall-Mammen extension is deferred to a follow-up PR. The analytical TSL variance fully supports within-group-varying PSU via the cell-period allocator — use `n_bootstrap=0` for those designs. **Replicate-weight designs and `n_bootstrap > 0` are mutually exclusive** (replicate variance is closed-form; bootstrap would double-count variance) — the combination raises `NotImplementedError`, matching `efficient_did.py:989`, `staggered.py:1869`, `two_stage.py:251-253`. For HonestDiD bounds under replicate weights, the replicate-effective `df_survey = min(resolved_survey.df_survey, min(n_valid_across_sites) - 1)` propagates to t-critical values — capped by the design's QR-rank-based df so a rank-deficient replicate matrix never produces a larger effective df than the design supports. When `resolved_survey.df_survey` is undefined (QR-rank ≤ 1), the effective df stays `None` and all inference fields (including HonestDiD bounds) are NaN — per-site `n_valid` cannot rescue a rank-deficient design.
 
 ---
diff --git a/tests/test_survey_dcdh.py b/tests/test_survey_dcdh.py
@@ -1378,6 +1378,88 @@ def test_off_horizon_row_duplication_does_not_change_se(self, base_data):
                 f"{r_dup.overall_se}) — auto-inject psu=group is not active."
             )
 
+    def test_cell_allocator_row_sum_identity(self):
+        """Cell-period allocator contract: for every group, the per-
+        period attribution sums across time to the per-group IF
+        (before cohort centering). This is the invariant that makes
+        PSU-level Binder aggregation telescope to ``U_centered[g]``
+        under within-group-constant PSU and therefore guarantees byte-
+        identity with the legacy group-level allocator on the old
+        accepted input set. Hand-computed on a 4-group × 3-period
+        panel: two never-treated (stable_0) and two joiners switching
+        at ``t = 2``.
+        """
+        from diff_diff.chaisemartin_dhaultfoeuille import (
+            _compute_full_per_group_contributions,
+            _cohort_recenter,
+            _cohort_recenter_per_period,
+        )
+
+        # D_mat, Y_mat, N_mat shaped (n_groups=4, n_periods=3).
+        D_mat = np.array(
+            [
+                [0, 0, 0],  # G0 never-treated
+                [0, 0, 0],  # G1 never-treated
+                [0, 0, 1],  # G2 joiner at t=2
+                [0, 0, 1],  # G3 joiner at t=2
+            ],
+            dtype=float,
+        )
+        Y_mat = np.array(
+            [
+                [1.0, 2.0, 3.0],
+                [2.1, 3.1, 4.2],
+                [0.5, 1.2, 5.4],
+                [1.3, 2.4, 6.1],
+            ],
+            dtype=float,
+        )
+        N_mat = np.ones_like(D_mat, dtype=int)
+        # Per-period cell counts aligned to periods[1:]
+        # t=1: all stable_0 (4 in n_00); t=2: 2 joiners (n_10) + 2 stable_0 (n_00)
+        n_10_t_arr = np.array([0, 2], dtype=int)
+        n_00_t_arr = np.array([4, 2], dtype=int)
+        n_01_t_arr = np.array([0, 0], dtype=int)
+        n_11_t_arr = np.array([0, 0], dtype=int)
+        # A11 zeroed at t=1 (no joiners); active at t=2.
+        a11_plus_zeroed = np.array([True, False], dtype=bool)
+        a11_minus_zeroed = np.array([True, True], dtype=bool)
+
+        U, U_pp = _compute_full_per_group_contributions(
+            D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat,
+            n_10_t_arr=n_10_t_arr, n_00_t_arr=n_00_t_arr,
+            n_01_t_arr=n_01_t_arr, n_11_t_arr=n_11_t_arr,
+            a11_plus_zeroed_arr=a11_plus_zeroed,
+            a11_minus_zeroed_arr=a11_minus_zeroed,
+            side="overall",
+        )
+
+        # Hand computation at t=2 joiner side:
+        #   G0: stable_0, -(2/2) * (3.0 - 2.0) = -1.0
+        #   G1: stable_0, -(2/2) * (4.2 - 3.1) = -1.1
+        #   G2: joiner,  (5.4 - 1.2) = 4.2
+        #   G3: joiner,  (6.1 - 2.4) = 3.7
+        expected_U = np.array([-1.0, -1.1, 4.2, 3.7])
+        np.testing.assert_allclose(U, expected_U, atol=1e-12)
+
+        # Row-sum identity: U_per_period.sum(axis=1) == U exactly.
+        np.testing.assert_allclose(U_pp.sum(axis=1), U, atol=1e-12)
+
+        # Post-period attribution: all mass at t=2 (the transition's
+        # post cell); t=0 and t=1 columns are zero for every group.
+        np.testing.assert_array_equal(U_pp[:, 0], np.zeros(4))
+        np.testing.assert_array_equal(U_pp[:, 1], np.zeros(4))
+        np.testing.assert_allclose(U_pp[:, 2], expected_U, atol=1e-12)
+
+        # Cohort centering preserves the row-sum identity: per-period
+        # cohort centering and group-level cohort centering produce
+        # 2D and 1D arrays whose row sums agree to FP precision.
+        # Cohorts: A = {G0, G1} (never-treated), B = {G2, G3} (joiners).
+        cohort_ids = np.array([0, 0, 1, 1])
+        U_c = _cohort_recenter(U, cohort_ids)
+        U_pp_c = _cohort_recenter_per_period(U_pp, cohort_ids)
+        np.testing.assert_allclose(U_pp_c.sum(axis=1), U_c, atol=1e-12)
+
     def test_within_cell_check_excludes_zero_weight_rows(self, base_data):
         """A zero-weight row with a different PSU label from its cell
         must not trigger rejection — it is out-of-sample by the