Skip to content

Commit 028e7e1

Browse files
igerberclaude
andcommitted
Address PR #378 R0 P1: multi-baseline R-deviation warning + REGISTRY note
CI reviewer flagged that the new by_path + controls combination silently produces point-estimate divergence from R on multi-baseline switcher panels (R re-runs per-baseline residualization on each path's restricted subsample; we residualize once globally). The parity test docstring documented the deviation but REGISTRY.md and the runtime did not. Fixes: - Emit UserWarning in fit() when by_path + controls is used on a panel with multiple switcher D_{g,1} values (chaisemartin_dhaultfoeuille.py inside the controls residualization block, after _compute_group_switch_metadata) - Update the by_path docstring with an explicit "Deviation from R on multi-baseline switcher panels" paragraph - Update REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to document the point-estimate deviation alongside the existing SE deviation - Update CHANGELOG entry to call out the multi-baseline deviation - Update R-generator scenario 16 comment to correctly describe R's per-path re-residualization (the prior comment misstated R's behavior as "residualize once globally") - Update parity test class docstring to be precise about R's per-path call site (R/R/did_multiplegt_dyn.R lines 393-411) - Add two regression tests: * test_multi_baseline_panel_emits_r_deviation_warning — joiner + leaver + always-treated + never-treated panel triggers the warning * test_single_baseline_panel_does_not_emit_r_deviation_warning — standard 3-path joiners-only fixture does NOT trigger the warning The single-baseline R-parity scenario (multi_path_reversible_by_path_controls) remains exact-match (rtol ~1e-11) because all switchers in the DGP share D_{g,1}=0 and R's per-path control pool reduces to the global control pool we use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 963a409 commit 028e7e1

6 files changed

Lines changed: 195 additions & 30 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` golden-value scenario (per-path point estimates exact match — measured rtol ~1e-11 across all path × horizon cells; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract.
11+
- **`ChaisemartinDHaultfoeuille.by_path` + `controls`** (DID^X residualization) — the per-baseline OLS residualization (Web Appendix Section 1.2) is now compatible with `by_path=k`. The residualization runs once on the first-differenced outcome BEFORE path enumeration, so all four downstream surfaces (analytical per-path SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized `Y_mat` automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing `controls` + per-period DID contract (per-period DID does not support residualization). Failed-stratum baselines (rank-deficient X) zero out `N_mat` for affected groups, which the path enumeration treats as ineligible per its existing convention. **Deviation from R on multi-baseline switcher panels (point estimates):** R `did_multiplegt_dyn(..., by_path, controls)` re-runs the per-baseline OLS residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients vary per path when switchers have different baseline values. Our global-residualization architecture coincides with R on single-baseline switcher panels (every switcher shares the same `D_{g,1}`) — per-path point estimates match R exactly there. On multi-baseline panels, point estimates can diverge; the estimator emits a `UserWarning` at fit-time when this configuration is detected so practitioners do not silently consume estimates that disagree with R. **SE inherits the cross-path cohort-sharing SE deviation from R** documented for `path_effects` — bootstrap SE, placebo SE, and sup-t crit are Monte Carlo / joint-distribution analogs of the same residualized analytical IF and carry the same deviation. R-parity confirmed against `did_multiplegt_dyn(..., by_path=3, controls="X1")` via the new `multi_path_reversible_by_path_controls` single-baseline golden-value scenario (per-path point estimates exact match — measured rtol ~1e-11 across all path × horizon cells; per-path SE within ~6.5% of R, well inside the Phase 2 multi-horizon envelope). Gate at `chaisemartin_dhaultfoeuille.py:988-992` removed; `by_path` docstring updated to add the new compatibility paragraph (with the multi-baseline caveat) and remove `controls` from the incompatible list. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathControls`; cross-surface inheritance + multi-baseline `UserWarning` regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathControls` (analytical + bootstrap + placebo + sup-t + `to_dataframe(level="by_path")` cband columns + multi-baseline warning). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path covariate residualization (DID^X)" for the full contract.
1212
- **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology.
1313
- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
1414
- **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.

benchmarks/R/generate_dcdh_dynr_test_values.R

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -703,16 +703,25 @@ scenarios$multi_path_reversible_by_path_placebo <- list(
703703
# Wave 3 #5: by_path + DID^X residualization). Same deterministic DGP
704704
# and n_periods=10 as scenarios 14/15, with a confounding covariate X1
705705
# added via the same `add_covariate` helper used by scenario 10's
706-
# `joiners_only_controls`. Per-baseline OLS residualization runs once
707-
# globally before path enumeration on both Python and R sides
708-
# (verified against `chaisemartinPackages/did_multiplegt_dyn` source —
709-
# `did_multiplegt_by_path` calls `did_multiplegt_main()` once with the
710-
# global controls residualization, then disaggregates per-path through
711-
# aggregation). Per-path event-study point estimates and switcher
712-
# counts must match R exactly; per-path SE within the documented Phase
713-
# 2 envelope and inherits the cross-path cohort-sharing deviation from
714-
# R documented for `path_effects`. Single covariate keeps the scenario
715-
# tight; multi-covariate is exercised via internal regression tests.
706+
# `joiners_only_controls`. **R re-runs `did_multiplegt_main()` per path**
707+
# with a path-restricted subsample (path's switchers + same-baseline
708+
# not-yet-treated controls), so its per-baseline OLS residualization
709+
# coefficients can vary per path (verified against
710+
# `chaisemartinPackages/did_multiplegt_dyn` source —
711+
# `R/R/did_multiplegt_dyn.R` lines 393-411 dispatch the per-path loop;
712+
# `did_multiplegt_by_path` is a path-classifier preprocessor only).
713+
# Python residualizes once on the full panel before path enumeration,
714+
# then disaggregates per path. **The two strategies coincide on
715+
# single-baseline switcher panels** (every switcher shares D_{g,1}=0)
716+
# because R's per-path control pool then equals the global control pool
717+
# — `multi_path_reversible` is built precisely for this property, so
718+
# per-path event-study point estimates and switcher counts must match R
719+
# exactly. Per-path SE inherits the documented cross-path cohort-sharing
720+
# deviation from R for `path_effects`. On multi-baseline switcher panels
721+
# the residualization coefficients can diverge per path between Python
722+
# and R; the production fit emits a `UserWarning` in that configuration.
723+
# Single covariate keeps the scenario tight; multi-covariate is
724+
# exercised via internal regression tests.
716725
cat(" Scenario 16: multi_path_reversible_by_path_controls\n")
717726
d16 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
718727
pattern = "multi_path_reversible", seed = 116,

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -419,9 +419,21 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
419419
bootstrap SE, per-path placebos, and per-path sup-t bands all
420420
consume the residualized ``Y_mat`` automatically (Frisch-
421421
Waugh-Lovell). Per-period effects remain unadjusted, consistent
422-
with the existing ``controls`` + per-period DID contract. The
423-
cross-path cohort-sharing SE deviation from R documented for
424-
``path_effects`` is inherited unchanged.
422+
with the existing ``controls`` + per-period DID contract.
423+
424+
**Deviation from R on multi-baseline switcher panels:** R
425+
``did_multiplegt_dyn(..., by_path, controls)`` re-runs the
426+
per-baseline residualization on each path's restricted
427+
subsample (path's switchers + same-baseline not-yet-treated
428+
controls), so its residualization coefficients vary per path
429+
when switchers have different baseline values. Our global-
430+
residualization architecture coincides with R on single-
431+
baseline panels (every switcher shares the same ``D_{g,1}``)
432+
and per-path point estimates match exactly. On multi-baseline
433+
panels, point estimates can diverge — a ``UserWarning`` is
434+
emitted at fit-time when this configuration is detected.
435+
SE inherits the cross-path cohort-sharing deviation from R
436+
documented for ``path_effects``.
425437
426438
Compatible with ``n_bootstrap > 0`` -- the top-k paths are
427439
enumerated once on the observed data (paths held fixed across
@@ -1478,6 +1490,39 @@ def fit(
14781490
)
14791491
_switch_metadata_computed = True
14801492

1493+
# by_path + controls multi-baseline deviation from R: R re-runs
1494+
# the per-baseline OLS residualization on each path's restricted
1495+
# subsample (path's switchers + same-baseline not-yet-treated
1496+
# controls), so its residualization coefficients can differ per
1497+
# path. We residualize once on the full panel before path
1498+
# enumeration. On single-baseline switcher panels (every
1499+
# switcher has the same D_{g,1}) the two strategies coincide
1500+
# and per-path point estimates match R exactly. On multi-
1501+
# baseline switcher panels they can diverge — warn the user
1502+
# explicitly so they don't silently consume estimates that
1503+
# disagree with R. SE inheritance (cross-path cohort-sharing)
1504+
# is documented separately in REGISTRY.md.
1505+
if self.by_path is not None:
1506+
_switcher_mask = first_switch_idx_arr >= 0
1507+
if _switcher_mask.any():
1508+
_switcher_baselines = baselines[_switcher_mask]
1509+
if np.unique(_switcher_baselines).size > 1:
1510+
warnings.warn(
1511+
"by_path + controls: switcher baselines D_{g,1} "
1512+
"take multiple values in this panel. Python "
1513+
"residualizes once on the full panel before path "
1514+
"enumeration; R `did_multiplegt_dyn(..., by_path, "
1515+
"controls)` re-runs residualization per path on "
1516+
"the path-restricted subsample, so per-path point "
1517+
"estimates can diverge between Python and R on "
1518+
"this panel. See `docs/methodology/REGISTRY.md` "
1519+
"(`Note (Phase 3 by_path ...)` -> Per-path "
1520+
"covariate residualization) for the full "
1521+
"deviation contract.",
1522+
UserWarning,
1523+
stacklevel=2,
1524+
)
1525+
14811526
Y_mat_residualized, covariate_diagnostics, _failed_baselines = (
14821527
_compute_covariate_residualization(
14831528
Y_mat=Y_mat,

0 commit comments

Comments
 (0)