Skip to content

Commit c6cb864

Browse files
authored
Merge pull request #393 from igerber/dcdh-by-path-trends
dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7)
2 parents a71d8b5 + 41d54b9 commit c6cb864

8 files changed

Lines changed: 2525 additions & 37 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`ChaisemartinDHaultfoeuille.by_path` is now compatible with `trends_linear` (DID^{fd} group-specific linear trends) and `trends_nonparam` (state-set trends).** For `trends_linear`, the first-differencing transform runs once globally before path enumeration, so per-path raw second-differences `DID^{fd}_{path, l}` surface on `path_effects[path]["horizons"][l]` automatically. Per-path **cumulated level effects** `delta_{path, l} = sum_{l'=1..l} DID^{fd}_{path, l'}` (the quantity R returns under `did_multiplegt_dyn(..., by_path, trends_lin)`) surface on the new `results.path_cumulated_event_study[path][l]` field, mirroring the global `linear_trends_effects` cumulation. `to_dataframe(level="by_path")` exposes `cumulated_effect` / `cumulated_se` columns (always present, NaN-when-None — mirrors the `cband_*` convention from PR #374); `summary()` renders a "Cumulated Level Effects (DID^{fd}, trends_linear)" sub-section under each per-path block. SE on the cumulated layer is the conservative upper bound (sum of per-horizon component SEs, NaN-consistent), matching the global `linear_trends_effects` convention. Path enumeration runs on the post-first-differenced `N_mat_fd`: switchers with `F_g==2` fail the window-eligibility check and are dropped from path enumeration entirely (the existing global `F_g >= 3` warning still surfaces the issue), so a path whose switchers all have `F_g < 3` is silently absent from `path_effects` rather than present-with-NaN. Placebo under `trends_linear` returns RAW per-horizon values — there is no per-path placebo cumulation surface in either Python or R. For `trends_nonparam`, the set membership column is validated and stored once globally as `set_ids_arr`; the `set_ids` parameter is now threaded through the four per-path IF helpers (`_compute_path_effects`, `_compute_path_placebos`, `_collect_path_bootstrap_inputs`, `_collect_path_placebo_bootstrap_inputs`) so per-path analytical SE, bootstrap, placebos, and sup-t bands all consume the set-restricted control pool automatically. Per-period effects remain unadjusted under both extensions, consistent with the existing per-period DID contract. Validated against R via two new golden-value scenarios: `single_baseline_multi_path_by_path_trends_lin` (n_periods=13, F_g >= 4, cohort-single-path; per-path cumulated point estimates match R bit-exactly with `POINT_RTOL=1e-9`, cumulated SE within `CUM_SE_RTOL=0.20`) and `multi_path_reversible_by_path_trends_nonparam` (per-path point estimates AND placebos match R bit-exactly with `POINT_RTOL=1e-9`, per-path SE within `SE_RTOL=0.15`). **F_g=3 boundary-case divergence (`by_path + trends_linear`):** `F_g=3` switchers have only 1 valid pre-window Z value after first-differencing, triggering 30%+ relative divergence between Python and R per-path point estimates on paths whose switchers include `F_g=3`. A targeted `UserWarning` fires at fit-time on this regime; R parity is asserted only on the `F_g >= 4` parity fixture. Placebo parity for `trends_linear` is intentionally skipped (R's per-path placebo computation re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture surfaces; placebo + `trends_linear` is exercised via internal regression only). Cross-path cohort-sharing SE deviation from R documented for `path_effects` is inherited unchanged. Gates at `chaisemartin_dhaultfoeuille.py:1014-1023` removed; `by_path` docstring updated to add the two new compatibility paragraphs and remove `trends_linear` / `trends_nonparam` from the incompatible list. R-parity tests at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathTrendsLinear` and `::TestDCDHDynRParityByPathTrendsNonparam`; cross-surface regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathTrendsLinear` and `::TestByPathTrendsNonparam`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path linear-trends DID^{fd}" and "Per-path state-set trends" for the full contract.
1112
- **HAD `trends_lin=True` linear-trend detrending mode** on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)` (paper Eq. 17 / Eq. 18 / page 32 joint-Stute homogeneity-with-trends). Per-group linear-trend slope estimated as `Y[g, F-1] - Y[g, F-2]` and applied as `(t - base) × slope` adjustment to per-event-time outcome evolutions. Requires F ≥ 3 (panel must contain F-2). The "consumed" placebo at our event-time `e=-2` is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (`survey_design` / `survey` / `weights`): raises `NotImplementedError` per `feedback_per_method_survey_element_contract` (weighted slope estimator not derived from paper; tracked in TODO.md as a follow-up). Bit-exact backcompat for `trends_lin=False` (default). Patch-level (additive keyword-only kwarg).
1213
- **HAD R-package end-to-end parity test** vs `DIDHAD` v2.0.0 (`Credible-Answers/did_had`) on the **`design="continuous_at_zero"` (Design 1') surface**. New parity fixture `benchmarks/data/did_had_golden.json` generated by `benchmarks/R/generate_did_had_golden.R` covers 3 paper-derived synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1)) × 5 method combinations (overall, event-study, placebo, yatchew, trends_lin). The harness explicitly forces `HeterogeneousAdoptionDiD(design="continuous_at_zero")` because R `did_had` always evaluates the local-linear at `d=0` regardless of dose distribution; our default `design="auto"` may legitimately choose `continuous_near_d_lower` or `mass_point` on dose distributions with boundary density bounded away from zero (e.g., Beta(2,2)) and thereby diverge from R numerically — that divergence is methodologically defensible but out of scope for this parity test. Python parity test `tests/test_did_had_parity.py` asserts point estimate / SE / CI bounds at `atol=1e-8` and Yatchew T-stat at `atol=1e-10` after a documented `× G/(G-1)` finite-sample convention shift. Two intentional convention deviations from R, documented in `docs/methodology/REGISTRY.md`: (a) we report the bias-corrected point estimate (modern CCF 2018 convention; R's `Estimate` column reports the conventional estimate with the bias-corrected CI separately — our `att` matches R's CI midpoint); (b) Yatchew uses paper Appendix E's literal (1/G) variance-denominator convention while R uses base-R `var()`'s (1/(N-1)) sample-variance convention (parity is bit-exact after the `× G/(G-1)` shift). Yatchew on placebos with R's mean-independence null (`order=0`) is not yet exposed in our `yatchew_hr_test` (we currently only support the linearity null) and is skipped in the parity test; tracked as TODO follow-up.
1314

benchmarks/R/generate_dcdh_dynr_test_values.R

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -753,6 +753,180 @@ scenarios$multi_path_reversible_by_path_controls <- list(
753753
results = extract_dcdh_by_path(res16, n_effects = 3)
754754
)
755755

756+
# Scenario 17: single-baseline multi-path + by_path=3 + trends_lin=TRUE
757+
# (Phase 3 Wave 3 #6: by_path + DID^{fd} group-specific linear trends).
758+
# Custom inline single-baseline DGP — `multi_path_reversible` (Scenarios
759+
# 13-16) concentrates each path on 1-2 F_g values, which after R's
760+
# per-path subset + trends_lin's F_g==2 filter collapses to a single F_g,
761+
# violating the dCDH staggered design restriction inside R's per-path
762+
# `did_multiplegt_main` call. `mixed_single_switch` (Scenario 17's first
763+
# attempt) is MULTI-baseline (joiners + leavers), which triggers the
764+
# documented multi-baseline divergence between Python's global-then-
765+
# disaggregate architecture and R's per-path full-pipeline call (same
766+
# divergence pattern as `controls`; rel diffs of 7-19% on point estimates
767+
# observed). To get clean single-baseline parity AND avoid the trends_lin
768+
# F_g concentration trap, this scenario uses a custom DGP: all groups
769+
# start at D=0 (single baseline), 3 paths span F_g ∈ {3,4,5} (all >= 3
770+
# so trends_lin's F_g==2 filter is a no-op), per-path counts unequal so
771+
# top-k ranking is deterministic. **R returns the cumulated level effect
772+
# delta_l per horizon, NOT the raw second-difference DID^{fd}_l** —
773+
# verified empirically against the existing `joiners_only_trends_lin`
774+
# parity test (`tests/test_chaisemartin_dhaultfoeuille_parity.py:403-409`
775+
# documents this convention). The Python parity test compares Python's
776+
# `path_cumulated_event_study[path][l]` against R's per-path Effect_l.
777+
# Placebos under trends_lin remain RAW per-horizon (no cumulated placebo
778+
# surface in R either), so the Python parity test compares
779+
# `path_placebo_event_study[path][-l]` against R's per-path Placebo_l
780+
# directly. Cumulated SE_RTOL is widened (~0.20 vs 0.12 used for
781+
# non-cumulated by_path parity) because the conservative upper-bound SE
782+
# (sum of per-horizon component SEs) compounds the cross-path
783+
# cohort-sharing deviation under summation.
784+
cat(" Scenario 17: single_baseline_multi_path_by_path_trends_lin\n")
785+
{
786+
# Custom DGP: 80 switchers across 3 paths × 2 distinct F_g per path,
787+
# all single-baseline (D_{g,1}=0). Each F_g maps to exactly ONE path
788+
# (cohort-single-path, eliminating the cross-path cohort-sharing
789+
# deviation from R that PR #360 documented for path_effects). All
790+
# F_g >= 3 so trends_lin's F_g==2 filter is a no-op. Two distinct
791+
# F_g per path satisfies R's staggered-design requirement inside the
792+
# per-path `did_multiplegt_main` call. n_periods=11, L_max=3 gives
793+
# F_g in [2,8] = 7 values; we use F_g in {3..8} = 6 values, two per
794+
# path. Plus 20 never-treated + 20 always-treated controls
795+
# (n_realized_groups = 120).
796+
set.seed(117)
797+
n_periods17 <- 13
798+
L_max17 <- 3
799+
target_paths17 <- list(
800+
c(0L, 1L, 1L, 1L), # path 1, sustained on (rank 1)
801+
c(0L, 1L, 1L, 0L), # path 2, on-then-off (rank 2)
802+
c(0L, 1L, 0L, 0L) # path 3, on briefly (rank 3)
803+
)
804+
# F_g-to-path mapping (each F_g unique to one path).
805+
# All F_g >= 4 to avoid the F_g=3 boundary case under trends_lin —
806+
# F_g=3 leaves only 1 valid pre-window Z value after the time==1
807+
# filter, causing R's per-path call (which re-runs the full pipeline
808+
# on the path subset) to handle pre-window/control eligibility
809+
# differently from Python's global first-differencing. F_g >= 4 gives
810+
# both implementations 2+ valid pre-window Z values, eliminating the
811+
# boundary-case divergence.
812+
# F_g=4 -> path 1, F_g=5 -> path 1 (path 1 = 38)
813+
# F_g=6 -> path 2, F_g=7 -> path 2 (path 2 = 24)
814+
# F_g=8 -> path 3, F_g=9 -> path 3 (path 3 = 18)
815+
# Total switchers = 80
816+
fg_path_counts17 <- list(
817+
list(F_g = 4L, path_idx = 1L, count = 20L),
818+
list(F_g = 5L, path_idx = 1L, count = 18L),
819+
list(F_g = 6L, path_idx = 2L, count = 13L),
820+
list(F_g = 7L, path_idx = 2L, count = 11L),
821+
list(F_g = 8L, path_idx = 3L, count = 11L),
822+
list(F_g = 9L, path_idx = 3L, count = 7L)
823+
)
824+
n_switchers17 <- sum(sapply(fg_path_counts17, function(x) x$count))
825+
stopifnot(n_switchers17 == 80L)
826+
D17 <- matrix(0L, nrow = n_switchers17, ncol = n_periods17)
827+
g17 <- 1L
828+
for (entry in fg_path_counts17) {
829+
F_g <- entry$F_g
830+
target <- target_paths17[[entry$path_idx]]
831+
n_here <- entry$count
832+
for (k in seq_len(n_here)) {
833+
# Pre-baseline [1..F_g-2]: D=0 (single-baseline contract)
834+
if (F_g >= 3L) D17[g17, 1:(F_g - 2L)] <- 0L
835+
# Window [F_g-1..F_g-1+L_max]: target path
836+
for (j in 0:L_max17) D17[g17, F_g - 1L + j] <- target[j + 1L]
837+
# Post-window: stable at path[L_max+1]
838+
if (F_g + L_max17 <= n_periods17) {
839+
D17[g17, (F_g + L_max17):n_periods17] <- target[L_max17 + 1L]
840+
}
841+
g17 <- g17 + 1L
842+
}
843+
}
844+
# Append 20 never-treated and 20 always-treated controls
845+
D17 <- rbind(D17,
846+
matrix(0L, nrow = 20L, ncol = n_periods17),
847+
matrix(1L, nrow = 20L, ncol = n_periods17))
848+
n_total17 <- nrow(D17)
849+
# Generate fixed effects, treatment effects, outcomes (mirror gen_reversible
850+
# parameters: group_fe_sd=2.0, treatment_effect=2.0, time_trend=0.1, noise_sd=0.5)
851+
set.seed(117L)
852+
group_fe17 <- rnorm(n_total17, 0, 2.0)
853+
noise17 <- matrix(rnorm(n_total17 * n_periods17, 0, 0.5),
854+
nrow = n_total17, ncol = n_periods17)
855+
period_arr17 <- 0:(n_periods17 - 1L)
856+
Y17 <- 10.0 +
857+
matrix(group_fe17, nrow = n_total17, ncol = n_periods17) +
858+
matrix(0.1 * period_arr17, nrow = n_total17, ncol = n_periods17, byrow = TRUE) +
859+
2.0 * D17 +
860+
noise17
861+
# Build long data frame
862+
d17 <- data.frame(
863+
group = rep(seq_len(n_total17) - 1L, each = n_periods17),
864+
period = rep(period_arr17, n_total17),
865+
treatment = as.vector(t(D17)),
866+
outcome = as.vector(t(Y17))
867+
)
868+
# Inject per-group linear trends (Scenario 11 pattern)
869+
set.seed(217L)
870+
groups17 <- sort(unique(d17$group))
871+
g_trends17 <- setNames(rnorm(length(groups17), 0, 0.5),
872+
as.character(groups17))
873+
d17$outcome <- d17$outcome +
874+
g_trends17[as.character(d17$group)] * d17$period
875+
res17 <- did_multiplegt_dyn(
876+
df = d17, outcome = "outcome", group = "group", time = "period",
877+
treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
878+
trends_lin = TRUE, ci_level = 95
879+
)
880+
scenarios$single_baseline_multi_path_by_path_trends_lin <- list(
881+
data = list(
882+
group = as.numeric(d17$group),
883+
period = as.numeric(d17$period),
884+
treatment = as.numeric(d17$treatment),
885+
outcome = as.numeric(d17$outcome)
886+
),
887+
params = list(pattern = "single_baseline_multi_path",
888+
n_switcher_groups = 80L, n_realized_groups = 120L,
889+
n_periods = 13L, seed = 117L, effects = 3, placebo = 1,
890+
by_path = 3, trends_lin = TRUE, ci_level = 95),
891+
results = extract_dcdh_by_path(res17, n_effects = 3, n_placebos = 1)
892+
)
893+
}
894+
895+
# Scenario 18: multi_path_reversible + by_path=3 + trends_nonparam="state"
896+
# (Phase 3 Wave 3 #7: by_path + state-set trends). Same deterministic DGP
897+
# and n_periods=10 as Scenarios 16/17, with a 3-state column added
898+
# (deterministic per-group assignment via `((group - 1) %% 3) + 1` so
899+
# within-set controls are guaranteed to exist for each path). **R does
900+
# NOT cumulate or first-difference under trends_nonparam** — Effect_l
901+
# per horizon is a normal DID with set-restricted control pool. The
902+
# Python parity test compares per-path raw DID per (path, l) directly
903+
# against R's per-path Effect_l. Placebos likewise are raw per-horizon.
904+
# Per-path R parity matches exactly on single-baseline panels.
905+
cat(" Scenario 18: multi_path_reversible_by_path_trends_nonparam\n")
906+
d18 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
907+
pattern = "multi_path_reversible", seed = 118,
908+
L_max = 3)
909+
d18$state <- ((d18$group - 1) %% 3) + 1
910+
res18 <- did_multiplegt_dyn(
911+
df = d18, outcome = "outcome", group = "group", time = "period",
912+
treatment = "treatment", effects = 3, placebo = 1, by_path = 3,
913+
trends_nonparam = "state", ci_level = 95
914+
)
915+
scenarios$multi_path_reversible_by_path_trends_nonparam <- list(
916+
data = list(
917+
group = as.numeric(d18$group),
918+
period = as.numeric(d18$period),
919+
treatment = as.numeric(d18$treatment),
920+
outcome = as.numeric(d18$outcome),
921+
state = as.numeric(d18$state)
922+
),
923+
params = list(pattern = "multi_path_reversible",
924+
n_switcher_groups = N_GOLDEN, n_realized_groups = N_GOLDEN + 40L,
925+
n_periods = 10, seed = 118, effects = 3, placebo = 1,
926+
by_path = 3, trends_nonparam = "state", ci_level = 95),
927+
results = extract_dcdh_by_path(res18, n_effects = 3, n_placebos = 1)
928+
)
929+
756930
# ---------------------------------------------------------------------------
757931
# Write output
758932
# ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)