Skip to content

Commit 93ff96d

Browse files
authored
Merge pull request #430 from igerber/fix-audit-412-r2
Address #412 holistic re-audit residuals (R2)
2 parents c3a732c + e71027f commit 93ff96d

5 files changed

Lines changed: 125 additions & 3 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ Deferred items from PR reviews that were not addressed before merge.
6161
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
6262
| dCDH by_path: negative-baseline path regression (e.g. `(-1, 0, 0, 0)`) is not yet exercised. The existing negative-D test (`test_negative_integer_D_supported`) only covers paths with negative values in non-baseline positions like `(0, -1, -1, -1)`, which does not trigger the R `substr(path, 1, 1)` bug regime (the bug needs a multi-character baseline). Add a switcher fixture with `D_{g,1} = -1` and assert the resulting path tuple key. | `tests/test_chaisemartin_dhaultfoeuille.py` | #419 | Low |
6363
| dCDH by_path: per-path placebo heterogeneity (`predict_het` rows for negative horizons) is currently NaN-filled in `to_dataframe(level="by_path")` `het_*` columns and unpopulated in `path_heterogeneity_effects`. R `did_multiplegt_dyn(..., by_path, predict_het)` forwards `predict_het` into each per-path `did_multiplegt_main` call alongside `placebo`, so R likely emits placebo het rows we do not yet mirror. Validate R's actual placebo predict_het output, then either implement parity or document the deviation explicitly. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `diff_diff/chaisemartin_dhaultfoeuille_results.py`, `tests/test_chaisemartin_dhaultfoeuille_parity.py` | #422 | Medium |
64+
| dCDH heterogeneity: `_compute_heterogeneity_test` passes `df=None` to `safe_inference`, so Python uses the normal Z critical value (~1.96) for `t_stat`-derived `p_value` and `conf_int`. R `did_multiplegt_dyn(..., predict_het)` uses the t-distribution with df = n - k from the OLS regression, producing ~0.1-2% rtol gaps on CIs and p-values vs Python. Documented as a deviation in the heterogeneity R-parity Note; parity tests pin only `beta`, `se`, `t_stat`, and `n_obs`. Either thread the OLS df into `safe_inference` to match R, or formalize a separate inference-tolerance constant for the heterogeneity surface. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `tests/test_chaisemartin_dhaultfoeuille_parity.py` | pilot-412 | Low |
6465
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
6566
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
6667
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |

diff_diff/chaisemartin_dhaultfoeuille_results.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -430,8 +430,9 @@ class ChaisemartinDHaultfoeuilleResults:
430430
``paths_of_interest=[(...), ...]``) is set. Inner dict keyed by
431431
horizon directly (no ``"horizons"`` wrapper); each entry holds
432432
``{"beta", "se", "t_stat", "p_value", "conf_int", "n_obs"}``,
433-
where ``beta`` is the WLS coefficient on the heterogeneity
434-
covariate on the path-restricted switcher subsample. Cohort
433+
where ``beta`` is the heterogeneity coefficient on the path-
434+
restricted switcher subsample - plain OLS on the non-survey
435+
path, WLS-on-pweights under ``survey_design``. Cohort
435436
dummies in the design matrix absorb baseline by construction.
436437
Empty-state contract mirrors ``path_effects``: ``None`` when not
437438
requested; ``{}`` when requested but no path has eligible

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2867,6 +2867,44 @@ def test_heterogeneity_multi_horizon(self):
28672867
assert 1 in r.heterogeneity_effects
28682868
assert 2 in r.heterogeneity_effects
28692869

2870+
def test_heterogeneity_inference_matches_safe_inference(self):
2871+
"""Local invariant: non-survey heterogeneity `t_stat` / `p_value` /
2872+
`conf_int` must equal ``safe_inference(beta, se, df=None)`` on every
2873+
populated horizon. R parity for these fields is intentionally
2874+
skipped (Python uses normal Z, R uses finite-df t — documented in
2875+
REGISTRY); without this local invariant a regression isolated to
2876+
the inference extraction or `_refresh_path_inference` ordering
2877+
could silently drop / mis-route the SE-derived fields while
2878+
beta / se still pass R parity.
2879+
"""
2880+
from diff_diff.utils import safe_inference
2881+
2882+
df = self._make_panel_with_het()
2883+
r = ChaisemartinDHaultfoeuille(seed=1).fit(
2884+
df, "outcome", "group", "period", "treatment",
2885+
L_max=2, heterogeneity="het_x",
2886+
)
2887+
assert r.heterogeneity_effects is not None
2888+
checked = 0
2889+
for l_h, het in r.heterogeneity_effects.items():
2890+
if not (np.isfinite(het["beta"]) and np.isfinite(het["se"])):
2891+
continue
2892+
expected_t, expected_p, expected_ci = safe_inference(
2893+
het["beta"], het["se"], df=None
2894+
)
2895+
assert het["t_stat"] == pytest.approx(expected_t, rel=1e-12), (
2896+
f"l={l_h} t_stat: stored={het['t_stat']} vs "
2897+
f"safe_inference={expected_t}"
2898+
)
2899+
assert het["p_value"] == pytest.approx(expected_p, rel=1e-12), (
2900+
f"l={l_h} p_value: stored={het['p_value']} vs "
2901+
f"safe_inference={expected_p}"
2902+
)
2903+
assert het["conf_int"][0] == pytest.approx(expected_ci[0], rel=1e-12)
2904+
assert het["conf_int"][1] == pytest.approx(expected_ci[1], rel=1e-12)
2905+
checked += 1
2906+
assert checked >= 1, "Expected at least one populated heterogeneity horizon"
2907+
28702908
def test_heterogeneity_missing_column(self):
28712909
df = self._make_panel_with_het()
28722910
with pytest.raises(ValueError, match="not found"):
@@ -10079,6 +10117,56 @@ def test_per_path_heterogeneity_finite_under_known_signal(self):
1007910117
f"(DGP: 5 + 3*het_x), got {horizons[1]['beta']}"
1008010118
)
1008110119

10120+
def test_per_path_heterogeneity_inference_matches_safe_inference(self):
10121+
"""Local invariant: non-survey per-path heterogeneity `t_stat` /
10122+
`p_value` / `conf_int` must equal ``safe_inference(beta, se,
10123+
df=None)`` on every populated (path, horizon) entry. R parity
10124+
for these fields is intentionally skipped (Python uses normal Z,
10125+
R uses finite-df t — documented in REGISTRY); without this local
10126+
invariant a regression isolated to the inference extraction or
10127+
`_refresh_path_inference` ordering could silently drop or
10128+
mis-route the SE-derived fields while beta / se still pass R
10129+
parity. Mirrors the global heterogeneity test of the same name
10130+
in TestHeterogeneityTesting.
10131+
"""
10132+
from diff_diff.utils import safe_inference
10133+
10134+
df = _by_path_het_data()
10135+
est = ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=3)
10136+
with warnings.catch_warnings():
10137+
warnings.simplefilter("ignore", UserWarning)
10138+
res = est.fit(
10139+
df, outcome="outcome", group="group", time="period",
10140+
treatment="treatment", L_max=3, heterogeneity="het_x",
10141+
)
10142+
assert res.path_heterogeneity_effects
10143+
checked = 0
10144+
for path, horizons in res.path_heterogeneity_effects.items():
10145+
for l_h, het in horizons.items():
10146+
if not (np.isfinite(het["beta"]) and np.isfinite(het["se"])):
10147+
continue
10148+
expected_t, expected_p, expected_ci = safe_inference(
10149+
het["beta"], het["se"], df=None
10150+
)
10151+
assert het["t_stat"] == pytest.approx(expected_t, rel=1e-12), (
10152+
f"path={path} l={l_h} t_stat: stored={het['t_stat']} vs "
10153+
f"safe_inference={expected_t}"
10154+
)
10155+
assert het["p_value"] == pytest.approx(expected_p, rel=1e-12), (
10156+
f"path={path} l={l_h} p_value: stored={het['p_value']} vs "
10157+
f"safe_inference={expected_p}"
10158+
)
10159+
assert het["conf_int"][0] == pytest.approx(
10160+
expected_ci[0], rel=1e-12
10161+
)
10162+
assert het["conf_int"][1] == pytest.approx(
10163+
expected_ci[1], rel=1e-12
10164+
)
10165+
checked += 1
10166+
assert checked >= 1, (
10167+
"Expected at least one populated (path, horizon) heterogeneity entry"
10168+
)
10169+
1008210170
def test_per_path_heterogeneity_telescope_to_global_on_single_path(self):
1008310171
"""On a single-path panel, per-path == global heterogeneity.
1008410172
Plain OLS path: bit-exact via path_groups identity."""

tests/test_chaisemartin_dhaultfoeuille_parity.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1382,6 +1382,23 @@ def test_parity_multi_path_reversible_predict_het(self, golden_values):
13821382
assert py_h["se"] == pytest.approx(r_h["se"], rel=self.SE_RTOL), (
13831383
f"h={h} se: py={py_h['se']:.6f} vs r={r_h['se']:.6f}"
13841384
)
1385+
# `t_stat = beta / se` is invariant to the Wald-test
1386+
# critical-value distribution; pin it at SE_RTOL so a
1387+
# regression in beta or se surfaces here too.
1388+
assert py_h["t_stat"] == pytest.approx(r_h["t"], rel=self.SE_RTOL), (
1389+
f"h={h} t_stat: py={py_h['t_stat']:.6f} vs r={r_h['t']:.6f}"
1390+
)
1391+
assert int(py_h["n_obs"]) == int(r_h["n_obs"]), (
1392+
f"h={h} n_obs: py={py_h['n_obs']} vs r={r_h['n_obs']}"
1393+
)
1394+
# NOTE: `p_value` and `conf_int` are NOT pinned to R here. Python's
1395+
# `safe_inference(..., df=None)` uses the normal Z critical value
1396+
# (~1.96 at alpha=0.05); R `did_multiplegt_dyn(..., predict_het)`
1397+
# uses t-distribution with df = n - k from the OLS regression.
1398+
# That structural deviation produces ~0.1-2% rtol gaps on CI
1399+
# bounds and p-values - tracked separately rather than masked by
1400+
# a loose parity tolerance. See REGISTRY Phase 3 heterogeneity
1401+
# Note "Deviation from R (heterogeneity inference critical value)".
13851402

13861403

13871404
class TestDCDHDynRParityByPathHeterogeneity:
@@ -1471,3 +1488,18 @@ def test_parity_multi_path_reversible_by_path_predict_het(
14711488
f"path={path_key} h={h} se: "
14721489
f"py={py_h['se']:.6f} vs r={r_h['se']:.6f}"
14731490
)
1491+
# `t_stat = beta / se` is invariant to the Wald-test
1492+
# critical-value distribution; pin it at SE_RTOL so a
1493+
# regression in beta or se surfaces here too. p_value
1494+
# and conf_int are not pinned - see the global parity
1495+
# class for the Z-vs-t deviation note.
1496+
assert py_h["t_stat"] == pytest.approx(
1497+
r_h["t"], rel=self.SE_RTOL
1498+
), (
1499+
f"path={path_key} h={h} t_stat: "
1500+
f"py={py_h['t_stat']:.6f} vs r={r_h['t']:.6f}"
1501+
)
1502+
assert int(py_h["n_obs"]) == int(r_h["n_obs"]), (
1503+
f"path={path_key} h={h} n_obs: "
1504+
f"py={py_h['n_obs']} vs r={r_h['n_obs']}"
1505+
)

0 commit comments

Comments
 (0)