Skip to content

Commit 71ee0b9

Browse files
igerberclaude
andcommitted
Address residual P1 + P2s from re-audit of PR #412
The restored CI reviewer surfaced findings the degraded reviewer missed across all 5 prior rounds on PR #412: P1 (REGISTRY + code comment): the claim that "R does not ship per-path predict_het on placebos either, so parity is preserved by deferral" contradicts what R's `did_multiplegt_dyn(..., by_path, predict_het)` dispatcher actually does - it forwards `predict_het` into each per-path `did_multiplegt_main` call along with `placebo`, so R may emit per-path placebo heterogeneity rows we do not yet mirror. Rewrite both surfaces (chaisemartin_dhaultfoeuille_results.py code comment and REGISTRY.md DataFrame-integration paragraph) as an explicit Python- side deferral rather than a verified R-parity. Add a TODO row to track validating R's actual placebo predict_het output and either implementing parity or documenting the deviation explicitly. P2 (REGISTRY rtol claim): the per-path heterogeneity R-parity paragraph claimed "rtol ~1e-6 on point estimates AND SE", but the parity tests use BETA_RTOL=1e-6 and SE_RTOL=1e-5 (one decade looser on SE). Split the claim into the two separate tolerances and note the WLS-denominator/cohort-recentering numerical drift that motivates the looser SE bound. P2 (replicate-weight df_survey refresh): the existing test only checked finite SE; it would have passed if the new dedicated heterogeneity refresh loop failed to recompute t_stat / p_value / conf_int at the final df_survey. Strengthen the test to call `safe_inference(beta, se, df=df_survey)` on the first finite entry and assert the stored inference fields match - this anti-regression covers the dedicated post-call refresh added for path_heterogeneity_ effects. P2 (paths_of_interest survey gap): the documented composability of `paths_of_interest + heterogeneity + survey_design` was not regression- locked - all existing survey-specific tests used `by_path=k`. Add test_paths_of_interest_heterogeneity_survey_design_analytical (verify analytical Binder TSL fits, selector ordering preserved, finite SE per populated (path, l)) and test_paths_of_interest_heterogeneity_ survey_n_bootstrap_gate (verify the multiplier-bootstrap gate applies under paths_of_interest too). No estimator behavior, weighting, variance/SE, identification, or default statistical surface changed in source - documentation accuracy plus expanded regression coverage only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d8b0f67 commit 71ee0b9

4 files changed

Lines changed: 137 additions & 4 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Deferred items from PR reviews that were not addressed before merge.
6060
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | PR 2 | Medium |
6161
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
6262
| dCDH by_path: negative-baseline path regression (e.g. `(-1, 0, 0, 0)`) is not yet exercised. The existing negative-D test (`test_negative_integer_D_supported`) only covers paths with negative values in non-baseline positions like `(0, -1, -1, -1)`, which does not trigger the R `substr(path, 1, 1)` bug regime (the bug needs a multi-character baseline). Add a switcher fixture with `D_{g,1} = -1` and assert the resulting path tuple key. | `tests/test_chaisemartin_dhaultfoeuille.py` | #419 | Low |
63+
| dCDH by_path: per-path placebo heterogeneity (`predict_het` rows for negative horizons) is currently NaN-filled in `to_dataframe(level="by_path")` `het_*` columns and unpopulated in `path_heterogeneity_effects`. R `did_multiplegt_dyn(..., by_path, predict_het)` forwards `predict_het` into each per-path `did_multiplegt_main` call alongside `placebo`, so R likely emits placebo het rows we do not yet mirror. Validate R's actual placebo predict_het output, then either implement parity or document the deviation explicitly. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `diff_diff/chaisemartin_dhaultfoeuille_results.py`, `tests/test_chaisemartin_dhaultfoeuille_parity.py` | #421 | Medium |
6364
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
6465
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
6566
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |

diff_diff/chaisemartin_dhaultfoeuille_results.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1887,9 +1887,13 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
18871887
"cband_upper": ph_cband[1] if ph_cband else np.nan,
18881888
"cumulated_effect": np.nan,
18891889
"cumulated_se": np.nan,
1890-
# Heterogeneity is forward-only (R doesn't ship
1891-
# per-path predict_het on placebos); placebo
1892-
# rows always emit NaN here.
1890+
# Heterogeneity is forward-only in this release.
1891+
# Per-path placebo heterogeneity is not exposed
1892+
# yet; R may emit placebo het rows under
1893+
# did_multiplegt_dyn(..., by_path, predict_het)
1894+
# but R-parity for that surface has not been
1895+
# validated, so we emit NaN on placebo rows
1896+
# rather than claim parity. See REGISTRY note.
18931897
"het_beta": np.nan,
18941898
"het_se": np.nan,
18951899
"het_t_stat": np.nan,

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10574,6 +10574,134 @@ def test_per_path_heterogeneity_replicate_weights_propagates_n_valid(self):
1057410574
f"path={path} l={l_h}: replicate SE non-finite"
1057510575
)
1057610576

10577+
# Verify the final df_survey is actually USED to refresh the
10578+
# inference fields on path_heterogeneity_effects (not the
10579+
# compute-time snapshot). Pick the first finite entry, recompute
10580+
# safe_inference at the final df, and require the stored fields
10581+
# to match. Anti-regression for the dedicated refresh loop at
10582+
# chaisemartin_dhaultfoeuille.py R2 P1b: a regression in that
10583+
# loop would leave stale t_stat / p_value / conf_int derived
10584+
# from an earlier (likely larger) df.
10585+
from diff_diff.utils import safe_inference
10586+
10587+
df_final = res.survey_metadata.df_survey
10588+
checked = False
10589+
for path, horizons in res.path_heterogeneity_effects.items():
10590+
for l_h, vals in horizons.items():
10591+
if vals["n_obs"] >= 3 and np.isfinite(vals["se"]):
10592+
expected_t, expected_p, expected_ci = safe_inference(
10593+
vals["beta"], vals["se"], df=df_final
10594+
)
10595+
assert vals["t_stat"] == pytest.approx(
10596+
expected_t, rel=1e-12, nan_ok=True
10597+
), (
10598+
f"path={path} l={l_h}: t_stat not refreshed at "
10599+
f"df={df_final} (have {vals['t_stat']}, expected "
10600+
f"{expected_t})"
10601+
)
10602+
assert vals["p_value"] == pytest.approx(
10603+
expected_p, rel=1e-12, nan_ok=True
10604+
), (
10605+
f"path={path} l={l_h}: p_value not refreshed at "
10606+
f"df={df_final} (have {vals['p_value']}, expected "
10607+
f"{expected_p})"
10608+
)
10609+
assert vals["conf_int"][0] == pytest.approx(
10610+
expected_ci[0], rel=1e-12, nan_ok=True
10611+
)
10612+
assert vals["conf_int"][1] == pytest.approx(
10613+
expected_ci[1], rel=1e-12, nan_ok=True
10614+
)
10615+
checked = True
10616+
break
10617+
if checked:
10618+
break
10619+
assert checked, (
10620+
"Expected at least one finite (path, l) entry to refresh-"
10621+
"check; fixture is degenerate."
10622+
)
10623+
10624+
@pytest.mark.slow
10625+
def test_paths_of_interest_heterogeneity_survey_design_analytical(self):
10626+
"""Mirror of the by_path+heterogeneity+survey_design analytical
10627+
path using paths_of_interest. Anti-regression: the docs claim
10628+
both selectors compose with heterogeneity under survey_design,
10629+
but the existing TestByPathHeterogeneity survey tests only
10630+
exercise by_path=. This test pins the reciprocal selector under
10631+
analytical Binder TSL.
10632+
"""
10633+
from diff_diff.survey import SurveyDesign
10634+
10635+
df = self._by_path_het_data_with_survey()
10636+
sd = SurveyDesign(weights="survey_weights", strata="strata", psu="psu")
10637+
# Three observed paths in the fixture; pick two in non-frequency
10638+
# order so we can verify selector ordering is preserved.
10639+
est = ChaisemartinDHaultfoeuille(
10640+
drop_larger_lower=False,
10641+
paths_of_interest=[(0, 1, 1, 0), (0, 1, 1, 1)],
10642+
)
10643+
with warnings.catch_warnings():
10644+
warnings.simplefilter("ignore", UserWarning)
10645+
res = est.fit(
10646+
df,
10647+
outcome="outcome",
10648+
group="group",
10649+
time="period",
10650+
treatment="treatment",
10651+
L_max=3,
10652+
heterogeneity="het_x",
10653+
survey_design=sd,
10654+
)
10655+
assert res.path_heterogeneity_effects, (
10656+
"paths_of_interest + heterogeneity + survey_design must "
10657+
"populate path_heterogeneity_effects"
10658+
)
10659+
# Selector keys are preserved in the user-specified order
10660+
# (not frequency-ranked like by_path).
10661+
keys = list(res.path_heterogeneity_effects.keys())
10662+
assert keys == [(0, 1, 1, 0), (0, 1, 1, 1)], (
10663+
f"paths_of_interest order not preserved: got {keys}"
10664+
)
10665+
# Every populated (path, l) entry yields finite analytical SE.
10666+
for path, horizons in res.path_heterogeneity_effects.items():
10667+
for l_h, vals in horizons.items():
10668+
if vals["n_obs"] >= 3:
10669+
assert np.isfinite(vals["se"]), (
10670+
f"path={path} l={l_h}: analytical survey SE "
10671+
f"non-finite under paths_of_interest"
10672+
)
10673+
10674+
@pytest.mark.slow
10675+
def test_paths_of_interest_heterogeneity_survey_n_bootstrap_gate(self):
10676+
"""The by_path + survey_design + n_bootstrap > 0 gate (PR #408)
10677+
also fires under paths_of_interest + heterogeneity. Anti-
10678+
regression: the multiplier-bootstrap-survey gate must apply to
10679+
both selectors.
10680+
"""
10681+
from diff_diff.survey import SurveyDesign
10682+
10683+
df = self._by_path_het_data_with_survey()
10684+
sd = SurveyDesign(weights="survey_weights", strata="strata", psu="psu")
10685+
est = ChaisemartinDHaultfoeuille(
10686+
drop_larger_lower=False,
10687+
paths_of_interest=[(0, 1, 1, 1)],
10688+
n_bootstrap=10,
10689+
seed=1,
10690+
)
10691+
with warnings.catch_warnings():
10692+
warnings.simplefilter("ignore", UserWarning)
10693+
with pytest.raises(NotImplementedError, match="multiplier"):
10694+
est.fit(
10695+
df,
10696+
outcome="outcome",
10697+
group="group",
10698+
time="period",
10699+
treatment="treatment",
10700+
L_max=3,
10701+
heterogeneity="het_x",
10702+
survey_design=sd,
10703+
)
10704+
1057710705
@pytest.mark.slow
1057810706
def test_survey_design_plus_n_bootstrap_with_heterogeneity_still_raises(
1057910707
self,

0 commit comments

Comments
 (0)