igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 2 additions & 2 deletions b/‎TODO.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎benchmarks/R/generate_dcdh_dynr_test_values.R‎
Lines changed: 127 additions & 9 deletions b/‎benchmarks/R/generate_dcdh_dynr_test_values.R‎
Lines changed: 127 additions & 9 deletions
@@ -7,6 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
+
+### Changed
+- **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - n_params` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Rank-deficient caveat:** `n_params = design.shape[1]` is the pre-drop column count; under near-rank-deficient designs that `solve_ols` retains rather than NaN-out, the actual rank may be lower than `n_params` (R's `df.residual` uses post-drop rank). Fully rank-deficient designs are NaN-filled by the existing short-circuit; the gap only affects near-rank-deficient edge cases (tracked as a Low TODO follow-up). The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note.
+
 ## [3.3.3] - 2026-05-15
 
 ### Added
 
@@ -78,8 +78,8 @@ Deferred items from PR reviews that were not addressed before merge.
 | dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
 | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
 | dCDH by_path: negative-baseline path regression (e.g. `(-1, 0, 0, 0)`) is not yet exercised. The existing negative-D test (`test_negative_integer_D_supported`) only covers paths with negative values in non-baseline positions like `(0, -1, -1, -1)`, which does not trigger the R `substr(path, 1, 1)` bug regime (the bug needs a multi-character baseline). Add a switcher fixture with `D_{g,1} = -1` and assert the resulting path tuple key. | `tests/test_chaisemartin_dhaultfoeuille.py` | #419 | Low |
-| dCDH by_path: per-path placebo heterogeneity (`predict_het` rows for negative horizons) is currently NaN-filled in `to_dataframe(level="by_path")` `het_*` columns and unpopulated in `path_heterogeneity_effects`. R `did_multiplegt_dyn(..., by_path, predict_het)` forwards `predict_het` into each per-path `did_multiplegt_main` call alongside `placebo`, so R likely emits placebo het rows we do not yet mirror. Validate R's actual placebo predict_het output, then either implement parity or document the deviation explicitly. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `diff_diff/chaisemartin_dhaultfoeuille_results.py`, `tests/test_chaisemartin_dhaultfoeuille_parity.py` | #422 | Medium |
-| dCDH heterogeneity: `_compute_heterogeneity_test` passes `df=None` to `safe_inference`, so Python uses the normal Z critical value (~1.96) for `t_stat`-derived `p_value` and `conf_int`. R `did_multiplegt_dyn(..., predict_het)` uses the t-distribution with df = n - k from the OLS regression, producing ~0.1-2% rtol gaps on CIs and p-values vs Python. Documented as a deviation in the heterogeneity R-parity Note; parity tests pin only `beta`, `se`, `t_stat`, and `n_obs`. Either thread the OLS df into `safe_inference` to match R, or formalize a separate inference-tolerance constant for the heterogeneity surface. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `tests/test_chaisemartin_dhaultfoeuille_parity.py` | pilot-412 | Low |
+| dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
+| dCDH heterogeneity: rank-deficient designs use `df = n_obs - n_params` (pre-drop column count) in the t-distribution inference. R's `lm(predict_het=...)` uses `df.residual = n - rank(design)` post-drop. Fully rank-deficient designs are NaN-filled by the rank-deficient short-circuit at `_compute_heterogeneity_test:5141-5150`, so the gap only affects near-rank-deficient designs where `solve_ols` retains the design. Thread actual rank from `solve_ols` to close the gap. | `diff_diff/chaisemartin_dhaultfoeuille.py` | follow-up | Low |
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 
@@ -622,12 +622,30 @@ extract_dcdh_by_path <- function(res, n_effects, n_placebos = 0) {
 # res$results$predict_het, a data.frame with columns
 # {effect, covariate, Estimate, SE, t, LB, UB, N, pF}. Estimate is the
 # WLS coefficient on the heterogeneity covariate.
+#
+# `n_effects` retained for backward-compat with scenarios 20/21 callers
+# but unused: we iterate ALL rows in ph$effect and partition by sign so
+# placebo (negative-effect) rows are captured separately. Scenario 22
+# probes whether R emits negative-effect rows when called with
+# `predict_het + placebo`; resolves TODO #422.
 extract_dcdh_predict_het <- function(res, n_effects) {
   ph <- res$results$predict_het
-  horizons <- list()
-  if (is.null(ph) || nrow(ph) == 0) return(list(predict_het = horizons))
-  for (h in seq_len(min(n_effects, nrow(ph)))) {
-    horizons[[as.character(ph$effect[h])]] <- list(
+  # `structure(list(), names = character(0))` produces a named list with
+  # zero entries; jsonlite serializes it as `{}` (object) rather than
+  # `[]` (array). Plain `list()` would serialize as `[]`, which gives
+  # the JSON contract a type-unstable shape (object when populated, array
+  # when empty). Type stability matters for generic consumers — see
+  # `tests/test_chaisemartin_dhaultfoeuille_parity.py::_as_dict` for the
+  # defensive Python-side coercion that backstops this.
+  forward_horizons <- structure(list(), names = character(0))
+  placebo_horizons <- structure(list(), names = character(0))
+  if (is.null(ph) || nrow(ph) == 0) {
+    return(list(predict_het = forward_horizons,
+                placebo_predict_het = placebo_horizons))
+  }
+  for (h in seq_len(nrow(ph))) {
+    effect_val <- as.numeric(ph$effect[h])
+    entry <- list(
       beta = as.numeric(ph$Estimate[h]),
       se = as.numeric(ph$SE[h]),
       t = as.numeric(ph$t[h]),
@@ -636,8 +654,15 @@ extract_dcdh_predict_het <- function(res, n_effects) {
       n_obs = as.numeric(ph$N[h]),
       p_value = as.numeric(ph$pF[h])
     )
+    if (effect_val > 0) {
+      forward_horizons[[as.character(effect_val)]] <- entry
+    } else if (effect_val < 0) {
+      placebo_horizons[[as.character(effect_val)]] <- entry
+    }
+    # effect_val == 0: skip (not a valid event-study horizon).
   }
-  list(predict_het = horizons)
+  list(predict_het = forward_horizons,
+       placebo_predict_het = placebo_horizons)
 }
 
 # Helper: extract per-path predict_het results. Under by_path=k +
@@ -650,10 +675,16 @@ extract_dcdh_by_path_predict_het <- function(res, n_effects) {
   for (i in seq_along(by_levels)) {
     slot <- res[[paste0("by_level_", i)]]
     ph <- slot$results$predict_het
-    horizons <- list()
+    # See extract_dcdh_predict_het comment for the named-list rationale.
+    forward_horizons <- structure(list(), names = character(0))
+    placebo_horizons <- structure(list(), names = character(0))
     if (!is.null(ph) && nrow(ph) > 0) {
-      for (h in seq_len(min(n_effects, nrow(ph)))) {
-        horizons[[as.character(ph$effect[h])]] <- list(
+      # Iterate ALL rows; partition by sign so placebo (negative-effect)
+      # rows are captured under `placebo_horizons`. Scenario 22 probes
+      # whether R emits negative-effect rows on the per-path surface.
+      for (h in seq_len(nrow(ph))) {
+        effect_val <- as.numeric(ph$effect[h])
+        entry <- list(
           beta = as.numeric(ph$Estimate[h]),
           se = as.numeric(ph$SE[h]),
           t = as.numeric(ph$t[h]),
@@ -662,12 +693,18 @@ extract_dcdh_by_path_predict_het <- function(res, n_effects) {
           n_obs = as.numeric(ph$N[h]),
           p_value = as.numeric(ph$pF[h])
         )
+        if (effect_val > 0) {
+          forward_horizons[[as.character(effect_val)]] <- entry
+        } else if (effect_val < 0) {
+          placebo_horizons[[as.character(effect_val)]] <- entry
+        }
       }
     }
     out[[i]] <- list(
       path = by_levels[i],
       frequency_rank = i,
-      horizons = horizons
+      horizons = forward_horizons,
+      placebo_horizons = placebo_horizons
     )
   }
   list(by_path_predict_het = out)
@@ -1201,6 +1238,87 @@ cat("  Scenarios 20/21: multi_path_reversible_predict_het + by_path version\n")
                   dont_drop_larger_lower = TRUE),
     results = extract_dcdh_by_path_predict_het(res21, n_effects = 3)
   )
+
+  # Scenario 23: GLOBAL predict_het + placebo (no by_path). Mirrors
+  # scenario 22's syntax minus by_path so we have a parity anchor for
+  # the GLOBAL `results.heterogeneity_effects` surface emitting both
+  # forward and backward (placebo) horizons. Resolves codex R1 P1 #2:
+  # the Phase 1A change extended the global heterogeneity loop to
+  # cover backward horizons, so a global-surface parity test was
+  # required to lock that contract independently of the per-path
+  # dispatcher. Same `c(-1)` sentinel as scenario 22 (computes ALL
+  # forward + ALL placebo positions); reuses `d20` for DGP parity.
+  res23 <- did_multiplegt_dyn(
+    df = d20, outcome = "outcome", group = "group", time = "period",
+    treatment = "treatment", effects = 3, placebo = 2,
+    dont_drop_larger_lower = TRUE,
+    predict_het = list("het_x", c(-1)),
+    ci_level = 95, graph_off = TRUE
+  )
+  scenarios$multi_path_reversible_predict_het_with_placebo_global <- list(
+    data = list(
+      group = as.numeric(d20$group),
+      period = as.numeric(d20$period),
+      treatment = as.numeric(d20$treatment),
+      outcome = as.numeric(d20$outcome),
+      het_x = as.numeric(d20$het_x)
+    ),
+    params = list(pattern = "multi_path_reversible_predict_het_with_placebo_global",
+                  n_switchers = n_switchers20, n_controls = n_controls20,
+                  n_groups = n_groups20, n_periods = n_periods20,
+                  seed = 120L, effects = 3, placebo = 2,
+                  predict_het_var = "het_x",
+                  predict_het_horizons = c(-1),
+                  ci_level = 95,
+                  dont_drop_larger_lower = TRUE),
+    results = extract_dcdh_predict_het(res23, n_effects = 3)
+  )
+
+  # Scenario 22: by_path + predict_het + placebo (probes TODO #422). Reuses
+  # d20 from scenarios 20/21 for DGP parity. Tests whether R's
+  # did_multiplegt_dyn(by_path=k, predict_het, placebo=N) per-by_level
+  # dispatcher emits predict_het rows on backward (placebo) horizons.
+  #
+  # R's predict_het syntax with `placebo > 0` (per did_multiplegt_main
+  # source `DIDmultiplegtDYN:::did_multiplegt_main` lines 1907 / 2030):
+  # the SAME horizon vector is used for BOTH forward effects AND placebo
+  # positions. Passing `c(1, 2, 3)` with `placebo=2` errors because
+  # `max(c(1, 2, 3)) > placebo=2`. The `c(-1)` sentinel triggers "compute
+  # heterogeneity for ALL forward (1..effects) AND ALL placebo
+  # (1..placebo) positions" by replacing `het_effects` with `1:l_XX` in
+  # the forward block and `1:l_placebo_XX` in the placebo block. Forward
+  # rows are emitted with positive `effect` values (1, 2, 3); placebo
+  # rows with NEGATIVE values (-1, -2) per `effect = matrix(-i, ...)` at
+  # the placebo block's rbind site.
+  #
+  # The extended extract_dcdh_by_path_predict_het partitions the per-path
+  # predict_het table by `effect` sign: forward into `horizons`, placebo
+  # into `placebo_horizons`.
+  res22 <- did_multiplegt_dyn(
+    df = d20, outcome = "outcome", group = "group", time = "period",
+    treatment = "treatment", effects = 3, placebo = 2, by_path = 3,
+    dont_drop_larger_lower = TRUE,
+    predict_het = list("het_x", c(-1)),
+    ci_level = 95, graph_off = TRUE
+  )
+  scenarios$multi_path_reversible_predict_het_with_placebo <- list(
+    data = list(
+      group = as.numeric(d20$group),
+      period = as.numeric(d20$period),
+      treatment = as.numeric(d20$treatment),
+      outcome = as.numeric(d20$outcome),
+      het_x = as.numeric(d20$het_x)
+    ),
+    params = list(pattern = "multi_path_reversible_predict_het_with_placebo",
+                  n_switchers = n_switchers20, n_controls = n_controls20,
+                  n_groups = n_groups20, n_periods = n_periods20,
+                  seed = 120L, effects = 3, placebo = 2, by_path = 3,
+                  predict_het_var = "het_x",
+                  predict_het_horizons = c(-1),
+                  ci_level = 95,
+                  dont_drop_larger_lower = TRUE),
+    results = extract_dcdh_by_path_predict_het(res22, n_effects = 3)
+  )
 }
 
 # ---------------------------------------------------------------------------