Skip to content

Commit e304c10

Browse files
igerberclaude
andcommitted
Address PR #408 R2 review (1 P2 strengthen replicate-df regression)
R2 P2: the previous regression test asserted per-path inference matches `safe_inference(..., df=results.survey_metadata.df_survey)`, but under uniform-valid replicate fixtures every IF site reports the same `n_valid` so the snapshot df and final df happen to coincide and the assertion passes vacuously even when the bug is present (per-path uses a stale snapshot df that incidentally equals the final df). Add `test_refresh_path_inference_called_from_final_block`: wraps the helper with `mock.patch.object` to capture call_args, asserts (a) helper is invoked exactly once, (b) the `df_final` it received equals `results.survey_metadata.df_survey` — a relationship that holds by construction when invoked from the final R2 P1b block (which uses `_final_eff_df = _effective_df_survey(resolved_survey, _replicate_n_valid_list)` AFTER all appends), but can only coincide by chance when invoked from an earlier block on a fixture where snapshot equals final. Update the existing test's docstring to acknowledge it documents the contract on a uniform-valid fixture and points readers at the new mock-based test for direct call-site verification. Use `importlib.import_module` to access the dCDH module: the top-level `diff_diff` package re-exports the convenience function `chaisemartin_dhaultfoeuille`, shadowing the module of the same name in attribute lookup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 905e887 commit e304c10

1 file changed

Lines changed: 86 additions & 1 deletion

File tree

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9467,6 +9467,86 @@ def test_per_path_replicate_se_finite(self):
94679467
any_finite = True
94689468
assert any_finite
94699469

9470+
@pytest.mark.slow
9471+
def test_refresh_path_inference_called_from_final_block(self):
9472+
"""Pin the helper's call site to the final R2 P1b block.
9473+
9474+
Regression for PR #408 R1 P1: an earlier implementation
9475+
invoked ``_refresh_path_inference`` immediately after per-path
9476+
runs, BEFORE the global overall / joiners / leavers /
9477+
heterogeneity IF sites appended their ``n_valid`` contributions
9478+
— leaving per-path inference using a stale snapshot df that
9479+
could exceed the final ``survey_metadata.df_survey``.
9480+
9481+
Pure-fixture detection is unreliable: under uniform-valid
9482+
replicate weights, every IF site reports the same ``n_valid``,
9483+
so the snapshot df and the final df happen to coincide and a
9484+
match-against-final-df assertion would pass even with the bug
9485+
present. Instead we wrap the helper with ``mock.patch.object``
9486+
and assert the ``df_final`` it receives equals the final
9487+
``survey_metadata.df_survey`` — a relationship that holds by
9488+
construction when invoked from the final block (which uses
9489+
``_final_eff_df = _effective_df_survey(resolved_survey,
9490+
_replicate_n_valid_list)`` AFTER all appends), but can only
9491+
coincide by chance from an earlier block.
9492+
"""
9493+
import importlib
9494+
import unittest.mock as _mock
9495+
9496+
from diff_diff.survey import SurveyDesign
9497+
9498+
# The top-level `diff_diff` package re-exports
9499+
# `chaisemartin_dhaultfoeuille` as the convenience function,
9500+
# shadowing the module of the same name. Use importlib to
9501+
# access the module object explicitly so mock.patch.object
9502+
# operates on the correct namespace.
9503+
_cd_mod = importlib.import_module("diff_diff.chaisemartin_dhaultfoeuille")
9504+
9505+
df = _by_path_survey_data()
9506+
n_obs = len(df)
9507+
rng = np.random.default_rng(3)
9508+
rep_cols = [f"rep_{i}" for i in range(10)]
9509+
for col in rep_cols:
9510+
df[col] = df["survey_weights"] * (1.0 + 0.05 * rng.standard_normal(n_obs))
9511+
sd = SurveyDesign(
9512+
weights="survey_weights",
9513+
replicate_weights=rep_cols,
9514+
replicate_method="JK1",
9515+
replicate_scale=1.0,
9516+
)
9517+
est = ChaisemartinDHaultfoeuille(by_path=2, drop_larger_lower=False)
9518+
9519+
with _mock.patch.object(
9520+
_cd_mod,
9521+
"_refresh_path_inference",
9522+
wraps=_cd_mod._refresh_path_inference,
9523+
) as m:
9524+
with warnings.catch_warnings():
9525+
warnings.simplefilter("ignore", UserWarning)
9526+
res = est.fit(
9527+
df, outcome="outcome", group="group", time="period",
9528+
treatment="treatment", L_max=3, survey_design=sd,
9529+
)
9530+
9531+
# Helper called exactly once from the final R2 P1b block.
9532+
assert m.call_count == 1, (
9533+
f"_refresh_path_inference should be called exactly once "
9534+
f"under replicate-weight + by_path; got {m.call_count}"
9535+
)
9536+
# Under replicate variance with defined effective df,
9537+
# _inference_df returns the effective df unchanged, and
9538+
# survey_metadata.df_survey persists the same value. Equality
9539+
# proves the helper received the FINAL df, not an earlier
9540+
# snapshot taken before the global IF sites appended.
9541+
df_final_passed = m.call_args.kwargs["df_final"]
9542+
assert res.survey_metadata is not None
9543+
assert df_final_passed == res.survey_metadata.df_survey, (
9544+
f"Helper invoked with df_final={df_final_passed!r}, but "
9545+
f"results.survey_metadata.df_survey={res.survey_metadata.df_survey!r}. "
9546+
f"This indicates the helper ran from a stale earlier "
9547+
f"call site instead of the final R2 P1b block."
9548+
)
9549+
94709550
@pytest.mark.slow
94719551
def test_per_path_inference_uses_final_df_after_all_appends(self):
94729552
"""Per-path t/p/CI must use `results.survey_metadata.df_survey`.
@@ -9481,7 +9561,12 @@ def test_per_path_inference_uses_final_df_after_all_appends(self):
94819561
their ``t_stat`` / ``p_value`` / ``conf_int`` agree with
94829562
``results.survey_metadata.df_survey`` and the global event-
94839563
study / placebo surfaces (which the same final block already
9484-
refreshes). Regression for PR #408 R1 P1.
9564+
refreshes). Companion test
9565+
``test_refresh_path_inference_called_from_final_block`` pins
9566+
the helper's call site directly via mock.patch (the
9567+
match-against-final-df assertion below is satisfied vacuously
9568+
under uniform-valid replicates where snapshot df coincides
9569+
with final df). Regression for PR #408 R1 P1.
94859570
"""
94869571
from diff_diff.survey import SurveyDesign
94879572
from diff_diff.utils import safe_inference

0 commit comments

Comments
 (0)