Skip to content

Commit 41d54b9

Browse files
igerberclaude
andcommitted
Address PR #393 R5 P1: F_g=3 boundary divergence warning + narrowed contract
CI reviewer flagged that the by_path + trends_linear public contract overstated R parity. The shipped contract said "single-baseline panels" broadly, but the parity scenario was restricted to F_g >= 4 because F_g=3 switchers trigger a 30%+ point-estimate divergence between Python's global-then-disaggregate architecture and R's per-path full- pipeline call (only 1 valid pre-window Z value remains after first-differencing + time==1 filter, triggering different control- eligibility treatment). Fix: add a targeted UserWarning fired at fit-time when by_path + trends_linear is set AND any switcher has F_g==3 (predicate: first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion warning but for the boundary case rather than the exclusion case. Documentation: docstring at the by_path paragraph now narrows the parity claim to "single-baseline panels with sufficient pre-window depth (F_g >= 4 for every selected-path switcher)" and explicitly documents the F_g=3 boundary regime + warning. REGISTRY adds the same narrowing + warning text to the per-path linear-trends Note. CHANGELOG adds the divergence note to the Unreleased entry. Regression test test_F_g_three_boundary_case_emits_warning added in TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a single-baseline panel, asserts the new warning fires (named "by_path + trends_linear" + "F_g=3"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ad8f6be commit 41d54b9

4 files changed

Lines changed: 114 additions & 6 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **`ChaisemartinDHaultfoeuille.by_path` is now compatible with `trends_linear` (DID^{fd} group-specific linear trends) and `trends_nonparam` (state-set trends).** For `trends_linear`, the first-differencing transform runs once globally before path enumeration, so per-path raw second-differences `DID^{fd}_{path, l}` surface on `path_effects[path]["horizons"][l]` automatically. Per-path **cumulated level effects** `delta_{path, l} = sum_{l'=1..l} DID^{fd}_{path, l'}` (the quantity R returns under `did_multiplegt_dyn(..., by_path, trends_lin)`) surface on the new `results.path_cumulated_event_study[path][l]` field, mirroring the global `linear_trends_effects` cumulation. `to_dataframe(level="by_path")` exposes `cumulated_effect` / `cumulated_se` columns (always present, NaN-when-None — mirrors the `cband_*` convention from PR #374); `summary()` renders a "Cumulated Level Effects (DID^{fd}, trends_linear)" sub-section under each per-path block. SE on the cumulated layer is the conservative upper bound (sum of per-horizon component SEs, NaN-consistent), matching the global `linear_trends_effects` convention. Path enumeration runs on the post-first-differenced `N_mat_fd`: switchers with `F_g==2` fail the window-eligibility check and are dropped from path enumeration entirely (the existing global `F_g >= 3` warning still surfaces the issue), so a path whose switchers all have `F_g < 3` is silently absent from `path_effects` rather than present-with-NaN. Placebo under `trends_linear` returns RAW per-horizon values — there is no per-path placebo cumulation surface in either Python or R. For `trends_nonparam`, the set membership column is validated and stored once globally as `set_ids_arr`; the `set_ids` parameter is now threaded through the four per-path IF helpers (`_compute_path_effects`, `_compute_path_placebos`, `_collect_path_bootstrap_inputs`, `_collect_path_placebo_bootstrap_inputs`) so per-path analytical SE, bootstrap, placebos, and sup-t bands all consume the set-restricted control pool automatically. Per-period effects remain unadjusted under both extensions, consistent with the existing per-period DID contract. Validated against R via two new golden-value scenarios: `single_baseline_multi_path_by_path_trends_lin` (n_periods=13, F_g >= 4, cohort-single-path; per-path cumulated point estimates match R bit-exactly with `POINT_RTOL=1e-9`, cumulated SE within `CUM_SE_RTOL=0.20`) and `multi_path_reversible_by_path_trends_nonparam` (per-path point estimates AND placebos match R bit-exactly with `POINT_RTOL=1e-9`, per-path SE within `SE_RTOL=0.15`). Placebo parity for `trends_linear` is intentionally skipped (R's per-path placebo computation re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture surfaces; placebo + `trends_linear` is exercised via internal regression only). Cross-path cohort-sharing SE deviation from R documented for `path_effects` is inherited unchanged. Gates at `chaisemartin_dhaultfoeuille.py:1014-1023` removed; `by_path` docstring updated to add the two new compatibility paragraphs and remove `trends_linear` / `trends_nonparam` from the incompatible list. R-parity tests at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathTrendsLinear` and `::TestDCDHDynRParityByPathTrendsNonparam`; cross-surface regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathTrendsLinear` and `::TestByPathTrendsNonparam`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path linear-trends DID^{fd}" and "Per-path state-set trends" for the full contract.
11+
- **`ChaisemartinDHaultfoeuille.by_path` is now compatible with `trends_linear` (DID^{fd} group-specific linear trends) and `trends_nonparam` (state-set trends).** For `trends_linear`, the first-differencing transform runs once globally before path enumeration, so per-path raw second-differences `DID^{fd}_{path, l}` surface on `path_effects[path]["horizons"][l]` automatically. Per-path **cumulated level effects** `delta_{path, l} = sum_{l'=1..l} DID^{fd}_{path, l'}` (the quantity R returns under `did_multiplegt_dyn(..., by_path, trends_lin)`) surface on the new `results.path_cumulated_event_study[path][l]` field, mirroring the global `linear_trends_effects` cumulation. `to_dataframe(level="by_path")` exposes `cumulated_effect` / `cumulated_se` columns (always present, NaN-when-None — mirrors the `cband_*` convention from PR #374); `summary()` renders a "Cumulated Level Effects (DID^{fd}, trends_linear)" sub-section under each per-path block. SE on the cumulated layer is the conservative upper bound (sum of per-horizon component SEs, NaN-consistent), matching the global `linear_trends_effects` convention. Path enumeration runs on the post-first-differenced `N_mat_fd`: switchers with `F_g==2` fail the window-eligibility check and are dropped from path enumeration entirely (the existing global `F_g >= 3` warning still surfaces the issue), so a path whose switchers all have `F_g < 3` is silently absent from `path_effects` rather than present-with-NaN. Placebo under `trends_linear` returns RAW per-horizon values — there is no per-path placebo cumulation surface in either Python or R. For `trends_nonparam`, the set membership column is validated and stored once globally as `set_ids_arr`; the `set_ids` parameter is now threaded through the four per-path IF helpers (`_compute_path_effects`, `_compute_path_placebos`, `_collect_path_bootstrap_inputs`, `_collect_path_placebo_bootstrap_inputs`) so per-path analytical SE, bootstrap, placebos, and sup-t bands all consume the set-restricted control pool automatically. Per-period effects remain unadjusted under both extensions, consistent with the existing per-period DID contract. Validated against R via two new golden-value scenarios: `single_baseline_multi_path_by_path_trends_lin` (n_periods=13, F_g >= 4, cohort-single-path; per-path cumulated point estimates match R bit-exactly with `POINT_RTOL=1e-9`, cumulated SE within `CUM_SE_RTOL=0.20`) and `multi_path_reversible_by_path_trends_nonparam` (per-path point estimates AND placebos match R bit-exactly with `POINT_RTOL=1e-9`, per-path SE within `SE_RTOL=0.15`). **F_g=3 boundary-case divergence (`by_path + trends_linear`):** `F_g=3` switchers have only 1 valid pre-window Z value after first-differencing, triggering 30%+ relative divergence between Python and R per-path point estimates on paths whose switchers include `F_g=3`. A targeted `UserWarning` fires at fit-time on this regime; R parity is asserted only on the `F_g >= 4` parity fixture. Placebo parity for `trends_linear` is intentionally skipped (R's per-path placebo computation re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture surfaces; placebo + `trends_linear` is exercised via internal regression only). Cross-path cohort-sharing SE deviation from R documented for `path_effects` is inherited unchanged. Gates at `chaisemartin_dhaultfoeuille.py:1014-1023` removed; `by_path` docstring updated to add the two new compatibility paragraphs and remove `trends_linear` / `trends_nonparam` from the incompatible list. R-parity tests at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathTrendsLinear` and `::TestDCDHDynRParityByPathTrendsNonparam`; cross-surface regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathTrendsLinear` and `::TestByPathTrendsNonparam`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path linear-trends DID^{fd}" and "Per-path state-set trends" for the full contract.
1212
- **HAD `trends_lin=True` linear-trend detrending mode** on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)` (paper Eq. 17 / Eq. 18 / page 32 joint-Stute homogeneity-with-trends). Per-group linear-trend slope estimated as `Y[g, F-1] - Y[g, F-2]` and applied as `(t - base) × slope` adjustment to per-event-time outcome evolutions. Requires F ≥ 3 (panel must contain F-2). The "consumed" placebo at our event-time `e=-2` is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (`survey_design` / `survey` / `weights`): raises `NotImplementedError` per `feedback_per_method_survey_element_contract` (weighted slope estimator not derived from paper; tracked in TODO.md as a follow-up). Bit-exact backcompat for `trends_lin=False` (default). Patch-level (additive keyword-only kwarg).
1313
- **HAD R-package end-to-end parity test** vs `DIDHAD` v2.0.0 (`Credible-Answers/did_had`) on the **`design="continuous_at_zero"` (Design 1') surface**. New parity fixture `benchmarks/data/did_had_golden.json` generated by `benchmarks/R/generate_did_had_golden.R` covers 3 paper-derived synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1)) × 5 method combinations (overall, event-study, placebo, yatchew, trends_lin). The harness explicitly forces `HeterogeneousAdoptionDiD(design="continuous_at_zero")` because R `did_had` always evaluates the local-linear at `d=0` regardless of dose distribution; our default `design="auto"` may legitimately choose `continuous_near_d_lower` or `mass_point` on dose distributions with boundary density bounded away from zero (e.g., Beta(2,2)) and thereby diverge from R numerically — that divergence is methodologically defensible but out of scope for this parity test. Python parity test `tests/test_did_had_parity.py` asserts point estimate / SE / CI bounds at `atol=1e-8` and Yatchew T-stat at `atol=1e-10` after a documented `× G/(G-1)` finite-sample convention shift. Two intentional convention deviations from R, documented in `docs/methodology/REGISTRY.md`: (a) we report the bias-corrected point estimate (modern CCF 2018 convention; R's `Estimate` column reports the conventional estimate with the bias-corrected CI separately — our `att` matches R's CI midpoint); (b) Yatchew uses paper Appendix E's literal (1/G) variance-denominator convention while R uses base-R `var()`'s (1/(N-1)) sample-variance convention (parity is bit-exact after the `× G/(G-1)` shift). Yatchew on placebos with R's mean-independence null (`order=0`) is not yet exposed in our `yatchew_hr_test` (we currently only support the linearity null) and is skipped in the parity test; tracked as TODO follow-up.
1414

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 53 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -458,10 +458,21 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
458458
``path_effects`` (the existing global ``F_g < 3`` warning
459459
still fires). Per-path R parity matches R
460460
``did_multiplegt_dyn(..., by_path, trends_lin)`` on per-path
461-
cumulated point estimates under single-baseline panels (R
462-
re-runs the per-path full pipeline on each path's restricted
463-
subsample; same multi-baseline divergence pattern as
464-
``controls``). **Placebo under trends_linear returns RAW
461+
cumulated point estimates under single-baseline panels with
462+
sufficient pre-window depth (``F_g >= 4`` for every selected-
463+
path switcher). R re-runs the per-path full pipeline on each
464+
path's restricted subsample; same multi-baseline divergence
465+
pattern as ``controls`` (a ``UserWarning`` fires when switcher
466+
baselines take multiple values). **F_g=3 boundary-case
467+
divergence:** `F_g=3` switchers have only 1 valid pre-window
468+
Z value after first-differencing and the ``time==1`` filter,
469+
which causes Python's global-then-disaggregate architecture
470+
to diverge from R's per-path full-pipeline call (30%+ on
471+
point estimates observed empirically). A separate
472+
``UserWarning`` fires at fit-time when the panel includes any
473+
`F_g=3` switchers and `by_path + trends_linear` is set, so
474+
practitioners hitting this boundary regime see the divergence
475+
flag explicitly. **Placebo under trends_linear returns RAW
465476
per-horizon values, not cumulated** -- there is no per-path
466477
placebo cumulation surface (verified empirically against R
467478
via the existing ``joiners_only_trends_lin`` parity scenario).
@@ -1689,6 +1700,44 @@ def fit(
16891700
UserWarning,
16901701
stacklevel=2,
16911702
)
1703+
# F_g=3 boundary-case divergence (`by_path + trends_linear`).
1704+
# `F_g=3` switchers have exactly 2 pre-switch periods,
1705+
# which after trends_linear's first-difference and
1706+
# `time != 1` filter leaves only 1 valid pre-window Z
1707+
# value. R re-runs the full pipeline on each path's
1708+
# restricted subsample (path's switchers + same-baseline
1709+
# yet-to-treat controls), and this single-pre-period
1710+
# regime triggers different control-eligibility
1711+
# treatment in R's per-path call than Python's global-
1712+
# then-disaggregate architecture. Empirically observed
1713+
# 30-165% rel diff on path 1 of the parity fixture's
1714+
# earlier `F_g=3` variant; the shipped parity scenario
1715+
# uses `F_g >= 4` exclusively. Fire a targeted warning
1716+
# whenever the panel contains any `F_g=3` switchers AND
1717+
# `by_path` is requested, so practitioners hitting this
1718+
# boundary regime see the divergence flag explicitly.
1719+
_f_g_three_count = int((first_switch_idx_arr == 2).sum())
1720+
if _f_g_three_count > 0:
1721+
warnings.warn(
1722+
f"by_path + trends_linear: {_f_g_three_count} "
1723+
f"switching group(s) have F_g=3 (exactly 2 "
1724+
f"pre-switch periods). After first-differencing "
1725+
f"and the time==1 filter, these groups have "
1726+
f"only 1 valid pre-window Z value, which "
1727+
f"triggers a documented boundary-case "
1728+
f"divergence between Python's global-then-"
1729+
f"disaggregate architecture and R's per-path "
1730+
f"full-pipeline call. Per-path point estimates "
1731+
f"on paths whose switchers include F_g=3 can "
1732+
f"diverge from `did_multiplegt_dyn(..., "
1733+
f"by_path, trends_lin)` by 30%+ on point "
1734+
f"estimates. See `docs/methodology/REGISTRY.md` "
1735+
f"(`Note (Phase 3 by_path ...)` -> Per-path "
1736+
f"linear-trends DID^{{fd}}) for the full "
1737+
f"deviation contract.",
1738+
UserWarning,
1739+
stacklevel=2,
1740+
)
16921741
N_mat_orig = N_mat.copy()
16931742
Y_mat, N_mat = _compute_first_differenced_matrix(Y_mat, N_mat)
16941743

0 commit comments

Comments
 (0)