Skip to content

Commit 1d64724

Browse files
igerberclaude
andcommitted
dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest
Bundles two adjacent extensions to ChaisemartinDHaultfoeuille.by_path: 1. Non-binary integer treatment (Wave 3 #8): replace the `NotImplementedError` gate at chaisemartin_dhaultfoeuille.py:1870 with a `ValueError` for continuous D (e.g. 1.5) per the no-silent- failures contract; integer-coded D in Z (e.g. ordinal {0, 1, 2}) is now supported and produces integer-state path tuples (e.g. (0, 2, 2, 2)). Validated against R `did_multiplegt_dyn(..., by_path)` for D in {0, 1, 2} via the new `multi_path_reversible_by_path_non_binary` golden-value scenario: per-path point estimates match R bit-exactly (rtol ~1e-9 events, rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol; SE_RTOL=0.15 envelope). 2. paths_of_interest kwarg (Wave 3 #9): new estimator parameter accepting a list of int tuples for explicit user-specified path selection. Mutually exclusive with by_path=k (raises ValueError at __init__ and set_params time). Each tuple length validated at __init__ for uniformity, vs L_max+1 at fit-time. Bool / np.bool_ rejected; np.integer accepted and canonicalized to Python int. Duplicates emit a UserWarning and dedupe; unobserved paths emit a UserWarning and are omitted from path_effects. Composes with non-binary D and all downstream by_path surfaces (bootstrap, placebos, sup-t bands, controls, trends_linear, trends_nonparam). Python-only API (R has no list-based selection in by_path). Threading: paths_of_interest added to 6 path-enumeration helper signatures (`_enumerate_treatment_paths`, `_compute_path_effects`, `_compute_path_cumulated_event_study`, `_collect_path_bootstrap_inputs`, `_compute_path_placebos`, `_collect_path_placebo_bootstrap_inputs`). The `:1118` preconditions gate (drop_larger_lower / L_max / mutex with heterogeneity / design2 / honest_did / survey_design) and the 11 `self.by_path is not None` activation branches in fit() rerouted to fire under either selector. For D >= 10, R's `did_multiplegt_by_path` derives the per-path baseline via `substr(path_index$path, 1, 1)` which captures only the first character of the comma-separated path string (e.g. captures "1" instead of "12" for path = "12,12,..."), mis-allocating R's per-path control-pool subset. Python's tuple-key matching is correct; the parity scenario stays in D in {0, 1, 2} to avoid the R bug. Adds: - `_validate_paths_of_interest` module-level helper for canonicalization - `TestByPathNonBinary` (10 tests, 3 slow) - `TestPathsOfInterest` (27 tests, 7 slow) - `TestDCDHDynRParityByPathNonBinary` (1 parity test) - R script Scenario 19 + regenerated dcdh_dynr_golden_values.json - REGISTRY.md sub-paragraphs for non-binary + paths_of_interest - CHANGELOG.md Unreleased > Added entry - llms-full.txt kwarg listing for paths_of_interest Removes: the now-stale `test_forbids_non_binary_treatment` gate- firing test from TestByPathCoverageFitGates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 33afb6a commit 1d64724

8 files changed

Lines changed: 1365 additions & 106 deletions

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
- **`ChaisemartinDHaultfoeuille.by_path` + non-binary integer treatment** — `by_path=k` now accepts integer-coded discrete treatment (D in Z, e.g. ordinal `{0, 1, 2}`); path tuples become integer-state tuples like `(0, 2, 2, 2)`. The previous `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1870` is replaced by a `ValueError` for continuous D (e.g. `D=1.5`) at fit-time per the no-silent-failures contract — the existing `int(round(float(v)))` cast in `_enumerate_treatment_paths` is now defensive (no-op for integer-coded D). Validated against R `did_multiplegt_dyn(..., by_path)` for D in `{0, 1, 2}` via the new `multi_path_reversible_by_path_non_binary` golden-value scenario (78 switchers, 3 paths, single-baseline custom DGP, F_g >= 4): per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol observed; SE_RTOL=0.15 envelope). **Deviation from R for D >= 10:** R's `did_multiplegt_by_path` derives the per-path baseline via `path_index$baseline_XX <- substr(path_index$path, 1, 1)`, which captures only the first character of the comma-separated path string (e.g. for `path = "12,12,..."` it captures `"1"` instead of `"12"`); this mis-allocates R's per-path control-pool subset for D >= 10. Python's tuple-key matching is correct in this regime — the per-path point estimates we compute are correct; R's per-path subset for the same path is buggy. The shipped parity scenario stays in `D in {0, 1, 2}` to avoid the R bug. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary`; cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary`.
12+
- **New `paths_of_interest` kwarg on `ChaisemartinDHaultfoeuille`** for user-specified treatment-path subsets, alternative to `by_path=k`'s top-k automatic ranking. Mutually exclusive with `by_path`; setting both raises `ValueError` at `__init__` and `set_params` time. Each path tuple must be a list/tuple of `int` of length `L_max + 1` (uniformity validated at `__init__`; length match against `L_max + 1` validated at fit-time); `bool` and `np.bool_` are explicitly rejected, `np.integer` accepted and canonicalized to Python `int` for tuple-key consistency. Duplicates emit a `UserWarning` and are deduplicated; paths not observed in the panel emit a `UserWarning` and are omitted from `path_effects`. Paths appear in `results.path_effects` in the user-specified order, modulo deduplication and unobserved-path filtering. Composes with non-binary D and all downstream `by_path` surfaces (bootstrap, per-path placebos, per-path joint sup-t bands, `controls`, `trends_linear`, `trends_nonparam`) — mechanical filter on observed paths via the same `_enumerate_treatment_paths` call site, no methodology change. **Python-only API extension; no R equivalent** — R's `did_multiplegt_dyn(..., by_path=k)` only accepts a positive int (top-k) or `-1` (all paths). The `by_path` precondition gate at `chaisemartin_dhaultfoeuille.py:1118` (drop_larger_lower / L_max / `heterogeneity` / `design2` / `honest_did` / `survey_design` mutex) and the 11 `self.by_path is not None` activation branches in `fit()` were rerouted to fire under either selector. Validation + behavior + cross-feature regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestPathsOfInterest`.
13+
814
## [3.3.2] - 2026-04-26
915

1016
### Added

benchmarks/R/generate_dcdh_dynr_test_values.R

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -927,6 +927,97 @@ scenarios$multi_path_reversible_by_path_trends_nonparam <- list(
927927
results = extract_dcdh_by_path(res18, n_effects = 3, n_placebos = 1)
928928
)
929929

930+
# Scenario 19: by_path + non-binary integer treatment (D in {0, 1, 2}).
931+
# Phase 3 Wave 3 #8 lift. Custom inline DGP (mirror Scenario 17 structure)
932+
# with 3 single-baseline non-binary paths: low-dose sustained
933+
# (0, 1, 1, 1), high-dose sustained (0, 2, 2, 2), and ramp-up
934+
# (0, 1, 2, 2). All F_g >= 4 (defensive: avoids any pre-window boundary
935+
# edge cases under future trends_lin combinations and matches Scenario 17).
936+
# 78 switchers + 20 never-treated (D=0) + 20 always-treated (D=2) controls.
937+
# n_periods=13, L_max=3.
938+
#
939+
# R's substr(path, 1, 1) baseline-derivation in did_multiplegt_by_path
940+
# is correct for D in {0..9} (single-digit decimal); we stay in {0, 1, 2}
941+
# so no R bug interferes. Python's tuple-key matching is correct
942+
# regardless of D range.
943+
cat(" Scenario 19: multi_path_reversible_by_path_non_binary\n")
944+
{
945+
set.seed(119)
946+
n_periods19 <- 13
947+
L_max19 <- 3
948+
target_paths19 <- list(
949+
c(0L, 1L, 1L, 1L), # path 1, low-dose sustained (rank 1)
950+
c(0L, 2L, 2L, 2L), # path 2, high-dose sustained (rank 2)
951+
c(0L, 1L, 2L, 2L) # path 3, ramp-up (rank 3)
952+
)
953+
fg_path_counts19 <- list(
954+
list(F_g = 4L, path_idx = 1L, count = 18L),
955+
list(F_g = 5L, path_idx = 1L, count = 14L),
956+
list(F_g = 6L, path_idx = 2L, count = 14L),
957+
list(F_g = 7L, path_idx = 2L, count = 12L),
958+
list(F_g = 8L, path_idx = 3L, count = 12L),
959+
list(F_g = 9L, path_idx = 3L, count = 8L)
960+
)
961+
n_switchers19 <- sum(sapply(fg_path_counts19, function(x) x$count))
962+
stopifnot(n_switchers19 == 78L)
963+
D19 <- matrix(0L, nrow = n_switchers19, ncol = n_periods19)
964+
g19 <- 1L
965+
for (entry in fg_path_counts19) {
966+
F_g <- entry$F_g
967+
target <- target_paths19[[entry$path_idx]]
968+
n_here <- entry$count
969+
for (k in seq_len(n_here)) {
970+
if (F_g >= 3L) D19[g19, 1:(F_g - 2L)] <- 0L
971+
for (j in 0:L_max19) D19[g19, F_g - 1L + j] <- target[j + 1L]
972+
if (F_g + L_max19 <= n_periods19) {
973+
D19[g19, (F_g + L_max19):n_periods19] <- target[L_max19 + 1L]
974+
}
975+
g19 <- g19 + 1L
976+
}
977+
}
978+
# Append 20 never-treated (D=0) and 20 always-treated (D=2) controls
979+
D19 <- rbind(
980+
D19,
981+
matrix(0L, nrow = 20L, ncol = n_periods19),
982+
matrix(2L, nrow = 20L, ncol = n_periods19)
983+
)
984+
n_total19 <- nrow(D19)
985+
set.seed(119L)
986+
group_fe19 <- rnorm(n_total19, 0, 2.0)
987+
noise19 <- matrix(rnorm(n_total19 * n_periods19, 0, 0.5),
988+
nrow = n_total19, ncol = n_periods19)
989+
period_arr19 <- 0:(n_periods19 - 1L)
990+
Y19 <- 10.0 +
991+
matrix(group_fe19, nrow = n_total19, ncol = n_periods19) +
992+
matrix(0.1 * period_arr19, nrow = n_total19, ncol = n_periods19, byrow = TRUE) +
993+
1.5 * D19 +
994+
noise19
995+
d19 <- data.frame(
996+
group = rep(seq_len(n_total19) - 1L, each = n_periods19),
997+
period = rep(period_arr19, n_total19),
998+
treatment = as.vector(t(D19)),
999+
outcome = as.vector(t(Y19))
1000+
)
1001+
res19 <- did_multiplegt_dyn(
1002+
df = d19, outcome = "outcome", group = "group", time = "period",
1003+
treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
1004+
ci_level = 95
1005+
)
1006+
scenarios$multi_path_reversible_by_path_non_binary <- list(
1007+
data = list(
1008+
group = as.numeric(d19$group),
1009+
period = as.numeric(d19$period),
1010+
treatment = as.numeric(d19$treatment),
1011+
outcome = as.numeric(d19$outcome)
1012+
),
1013+
params = list(pattern = "single_baseline_multi_path_non_binary",
1014+
n_switcher_groups = 78L, n_realized_groups = 118L,
1015+
n_periods = 13L, seed = 119L, effects = 3, placebo = 1,
1016+
by_path = 3, ci_level = 95),
1017+
results = extract_dcdh_by_path(res19, n_effects = 3, n_placebos = 1)
1018+
)
1019+
}
1020+
9301021
# ---------------------------------------------------------------------------
9311022
# Write output
9321023
# ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)