Skip to content

Commit d24ae25

Browse files
igerberclaude
andcommitted
Address PR igerber#378 R3 P1: precise parity condition + heterogeneous-F_g regression
Reviewer flagged that the parity condition documented for by_path + controls (single-baseline switcher panel) might not be sufficient, hypothesizing that R's per-path subset could exclude pre-switch rows of other-path switchers and produce a different first-stage residualization sample. Verified the hypothesis is empirically falsified and analytically incorrect by reading R/R/did_multiplegt_dyn.R lines 401-405 line-by- line. R's per-path subset for path B includes: - Rows where path_XX == B (path-B switchers, all rows) - OR rows where yet_to_switch=1 AND baseline matches (pre-switch rows of any group with matching baseline, regardless of path) So R's per-path first-stage sample equals (pre-switch rows of all switchers with matching baseline + all rows of never-switchers with matching baseline) — bit-identical to our global first-stage sample under single-baseline switcher panels, regardless of how F_g varies across paths or within a path. Empirical confirmation: scenario 16 (`multi_path_reversible_by_path_controls`) has switcher F_g spanning [0..6] across 4 distinct paths under D_{g,1}=0 and Python matches R to rtol ~1e-11 across all (path, horizon) cells. Strengthened the contract: - Expanded the warning code comment to spell out R's per-path subset construction (citing R source line numbers) and why single-baseline- switcher is the precise parity condition (control-pool equivalence via the OR clause), with the empirical scenario reference baked in - Updated REGISTRY.md "Per-path covariate residualization (DID^X)" paragraph to cite R lines 401-405 and clarify never-switcher baselines do not affect parity - New regression test `test_single_baseline_heterogeneous_F_g_no_warning_and_matches_r` uses the golden-value scenario (single-baseline, heterogeneous F_g across paths) to assert: (a) no UserWarning fires, (b) per-path point estimates are produced finite. The numeric R-parity is locked separately in TestDCDHDynRParityByPathControls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1e49dff commit d24ae25

3 files changed

Lines changed: 123 additions & 13 deletions

File tree

diff_diff/chaisemartin_dhaultfoeuille.py

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1490,18 +1490,42 @@ def fit(
14901490
)
14911491
_switch_metadata_computed = True
14921492

1493-
# by_path + controls multi-baseline deviation from R: R re-runs
1494-
# the per-baseline OLS residualization on each path's restricted
1495-
# subsample (path's switchers + same-baseline not-yet-treated
1496-
# controls), so its residualization coefficients can differ per
1497-
# path. We residualize once on the full panel before path
1498-
# enumeration. On single-baseline switcher panels (every
1499-
# switcher has the same D_{g,1}) the two strategies coincide
1500-
# and per-path point estimates match R exactly. On multi-
1501-
# baseline switcher panels they can diverge — warn the user
1502-
# explicitly so they don't silently consume estimates that
1503-
# disagree with R. SE inheritance (cross-path cohort-sharing)
1504-
# is documented separately in REGISTRY.md.
1493+
# by_path + controls residualization-sample deviation from R.
1494+
# R's `did_multiplegt_dyn(..., by_path, controls)` calls
1495+
# `did_multiplegt_main()` once per path with `df_main` filtered
1496+
# to: rows of the path's switchers OR rows where
1497+
# `yet_to_switch=1 AND baseline matches the path's baseline`
1498+
# (R/R/did_multiplegt_dyn.R lines 401-405). Inside the per-path
1499+
# `did_multiplegt_main()` call, the per-baseline first-stage
1500+
# residualization regression uses `(g, t)` cells where g's
1501+
# treatment hasn't changed yet at t. Critically, R's path-
1502+
# restricted subset INCLUDES the pre-switch rows of OTHER-path
1503+
# switchers via the `yet_to_switch=1 AND baseline matches`
1504+
# clause, so the first-stage SAMPLE that R uses for path B
1505+
# equals: pre-switch rows of all switchers with matching
1506+
# baseline + all rows of never-switchers with matching
1507+
# baseline. This is BIT-IDENTICAL to the first-stage sample
1508+
# we use under our global residualization — first-stage
1509+
# coefficients (and therefore residualized outcomes) coincide,
1510+
# and per-path point estimates match R exactly **under single-
1511+
# baseline switcher panels** (every switcher has the same
1512+
# `D_{g,1}`, regardless of how `F_g` varies across paths or
1513+
# within a path). Empirical confirmation: the
1514+
# `multi_path_reversible_by_path_controls` R-parity scenario
1515+
# has 4 paths with switcher `F_g` values spanning [0..6] under
1516+
# `D_{g,1}=0` for every switcher, and Python matches R to
1517+
# rtol ~1e-11 across all `(path, horizon)` cells.
1518+
#
1519+
# On MULTI-baseline switcher panels the per-baseline regression
1520+
# coefficients diverge per path under R (R's per-path subset
1521+
# for path B drops switchers whose baseline differs from B's
1522+
# baseline), so point estimates can diverge between Python and
1523+
# R — warn the user explicitly. The check filters to switcher
1524+
# groups only (never-switchers do not contribute to "switcher
1525+
# baseline" multiplicity even if they appear at multiple
1526+
# `D_{g,1}` values across the never-treated / always-treated
1527+
# control mix). SE inheritance (cross-path cohort-sharing) is
1528+
# documented separately in REGISTRY.md.
15051529
if self.by_path is not None:
15061530
_switcher_mask = first_switch_idx_arr >= 0
15071531
if _switcher_mask.any():

0 commit comments

Comments
 (0)