Skip to content

Commit eb27bf5

Browse files
igerberclaude
andcommitted
Address PR #392 R5 review (3 P3, all non-blocking)
R5 was ✅ Looks good — only P3 polish remained. All addressed: P3 #1 — exact-pin nprobust: The parity contract runs through nprobust numerical paths (DIDHAD's local-linear bandwidth + bias-correction calls), so a fresh regeneration could drift if CRAN serves a newer nprobust. Pin nprobust == 0.5.0 in both the R generator's stopifnot guard and the parity test's metadata assertion alongside DIDHAD and YatchewTest. P3 #2 — workflow docstring: did_had_pretest_workflow's top-level docstring still said "Eq 18 linear-trend detrending is a Phase 4 follow-up" which contradicts the shipped trends_lin behavior. Updated to describe the forwarding contract (trends_lin → joint_pretrends_test + joint_homogeneity_test, consumed-placebo skip path on minimal panels). Same fix on the StuteJointResult class docstring. P3 #3 — parity test horizon-shape assertions: Added an explicit "missing in Python" assertion in _zip_r_python: every R-mapped event time must be present in Python's event_times (catches future horizon-shape regressions where Python silently drops a horizon R requested). Added an effects+placebo row-count sanity check in test_yatchew_t_stat_parity (uses the previously- unused effects/placebo parametrize values to catch fixture drift). Stats: 540 tests pass, 0 regressions. No estimator/methodology changes — all P3 polish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b7b7eb3 commit eb27bf5

3 files changed

Lines changed: 65 additions & 13 deletions

File tree

benchmarks/R/generate_did_had_golden.R

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,16 @@ library(jsonlite)
2525
library(DIDHAD)
2626
library(YatchewTest)
2727

28-
# PR #392 R4 P3: pin exact upstream versions so future regeneration
29-
# does not silently re-anchor the goldens to a newer CRAN release
30-
# while CHANGELOG / REGISTRY / parity test still cite v2.0.0 / SHA
31-
# `edc09197`. Bump these pins (here AND in the parity test's
32-
# `test_metadata_versions_match`) when intentionally re-anchoring.
28+
# PR #392 R4 P3 / R5 P3: pin exact upstream versions so future
29+
# regeneration does not silently re-anchor the goldens to a newer
30+
# CRAN release while CHANGELOG / REGISTRY / parity test still cite
31+
# v2.0.0 / SHA `edc09197`. The parity contract runs through
32+
# `nprobust` numerical paths so we pin it too. Bump these pins
33+
# (here AND in the parity test's `test_metadata_versions_match`)
34+
# when intentionally re-anchoring.
3335
stopifnot(packageVersion("DIDHAD") == "2.0.0")
3436
stopifnot(packageVersion("YatchewTest") == "1.1.1")
37+
stopifnot(packageVersion("nprobust") == "0.5.0")
3538

3639
# -------------------------------------------------------------------------
3740
# Panel builder: 5-period panel with F=4 (treatment onset at t=4).

diff_diff/had_pretests.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -433,8 +433,10 @@ class StuteJointResult:
433433
:func:`joint_pretrends_test` (mean-independence: ``E[Y_t - Y_base | D]
434434
= mu_t``, design matrix ``[1]``) and :func:`joint_homogeneity_test`
435435
(linearity: ``E[Y_t - Y_base | D_t] = beta_{0,t} + beta_{fe,t} * D``,
436-
design matrix ``[1, D]``). Eq 18 linear-trend detrending (paper
437-
Section 5.2 Pierce-Schott application) is a Phase 4 follow-up.
436+
design matrix ``[1, D]``). Both wrappers accept a ``trends_lin:
437+
bool = False`` keyword-only flag (PR #392): when ``True``, applies
438+
paper Eq 17 / Eq 18 linear-trend detrending before the joint CvM
439+
using per-group slope ``Y[g, F-1] - Y[g, F-2]``.
438440
439441
Attributes
440442
----------
@@ -4322,9 +4324,17 @@ def did_had_pretest_workflow(
43224324
users who need Yatchew robustness under multi-period data should
43234325
call :func:`yatchew_hr_test` on each (base, post) pair manually.
43244326
4325-
Eq 18 linear-trend detrending (paper Section 5.2 Pierce-Schott
4326-
application) is a Phase 4 follow-up; the event-study path here
4327-
implements the simpler mean-independence / linearity nulls.
4327+
Eq 17 / Eq 18 linear-trend detrending (paper Section 5.2 Pierce-
4328+
Schott application) is now SHIPPED on the event-study path via
4329+
the ``trends_lin`` keyword-only parameter (PR #392 / Phase 4
4330+
R-parity). When ``trends_lin=True``, this workflow forwards the
4331+
flag to both :func:`joint_pretrends_test` and
4332+
:func:`joint_homogeneity_test`; the consumed placebo at
4333+
``base_period - 1`` is auto-dropped from step 2 and the workflow
4334+
skips step 2 (``pretrends_joint=None``) if no earlier placebo
4335+
survives. Mirrors R ``DIDHAD::did_had(..., trends_lin=TRUE)``.
4336+
Mutually exclusive with ``aggregate="overall"`` (raises
4337+
``NotImplementedError``).
43284338
43294339
Parameters
43304340
----------

tests/test_did_had_parity.py

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -169,22 +169,38 @@ def _zip_r_python(
169169
r_result: Dict[str, Any], py_result: Any, trends_lin: bool
170170
) -> List[Tuple[int, int, str]]:
171171
"""Build (r_row_idx, py_event_idx, r_rowname) tuples zipping R rows
172-
to Python event-time positions for parity assertions."""
172+
to Python event-time positions for parity assertions.
173+
174+
PR #392 R5 P3: also asserts the EXACT mapped event-time set is a
175+
subset of Python's ``event_times`` and that the mapping is total
176+
over R's reported rows (no R row maps to a missing Python
177+
horizon). This catches future horizon-shape regressions where
178+
Python silently drops an event-time the R fixture lists."""
173179
py_event_times = py_result.event_times.tolist()
174180
py_idx_by_event_time = {int(e): i for i, e in enumerate(py_event_times)}
175181
pairs = []
176182
r_event_ids = _as_list(r_result["event_id"])
177183
r_rownames = _as_list(r_result["rownames"])
184+
expected_event_times = []
178185
for i, (r_id, rowname) in enumerate(zip(r_event_ids, r_rownames)):
179186
e = _r_id_to_event_time(int(r_id), trends_lin)
187+
expected_event_times.append(e)
180188
if e not in py_idx_by_event_time:
181-
# Should not happen for valid fixtures; surface explicit
182-
# diagnostic if R reports a row our Python event_times lacks.
183189
raise AssertionError(
184190
f"R row {rowname!r} (ID={r_id}) maps to our e={e}, but "
185191
f"Python event_times = {py_event_times}. Mapping bug?"
186192
)
187193
pairs.append((i, py_idx_by_event_time[e], rowname))
194+
# Exact-shape assertion: every R-mapped event time must be present
195+
# in Python's event_times. Length-equality is too strict (Python
196+
# may emit additional horizons R didn't request, e.g. e=0 anchor),
197+
# but every R row must find a Python counterpart.
198+
missing_in_python = set(expected_event_times) - set(py_event_times)
199+
assert not missing_in_python, (
200+
f"event_times mismatch: R requested {sorted(expected_event_times)} "
201+
f"(mapped from R IDs); Python emitted {sorted(py_event_times)}; "
202+
f"missing in Python: {sorted(missing_in_python)}."
203+
)
188204
return pairs
189205

190206

@@ -355,6 +371,20 @@ def test_yatchew_t_stat_parity(
355371
):
356372
r_combo = fixture["fixtures"][dgp_name]["combos"][combo_name]
357373
r_result = r_combo["result"]
374+
# PR #392 R5 P3: assert R's reported (effects + placebo) row
375+
# count matches the parametrize spec — catches future fixture
376+
# drift where R's effects/placebo args don't actually drive
377+
# the row count we expect.
378+
n_yatchew_rows = len(_as_list(r_result["yatchew_t"]))
379+
# Under trends_lin, R drops one placebo (consumed). Otherwise
380+
# rows = effects + placebo (the auto-truncation cap from R is
381+
# capped at the panel's max via did_het_adoption_main).
382+
expected_rows = effects + placebo - (1 if trends_lin else 0)
383+
assert n_yatchew_rows == expected_rows, (
384+
f"R fixture row count for {combo_name} = {n_yatchew_rows}, "
385+
f"expected effects+placebo{'-1' if trends_lin else ''} = "
386+
f"{expected_rows}; fixture/combo spec drift?"
387+
)
358388
if "yatchew_t" not in r_result:
359389
pytest.fail(
360390
f"{combo_name} expected to have yatchew_t in fixture; "
@@ -446,6 +476,15 @@ def test_metadata_versions_match(self, fixture):
446476
f"{meta['yatchewtest_version']!r}; the parity test pins exactly "
447477
f"1.1.1. Regenerate after bumping the pin."
448478
)
479+
# PR #392 R5 P3: nprobust is on the parity contract path
480+
# (DIDHAD's local-linear bandwidth + bias-correction calls go
481+
# through it), so pin it exactly too. Bump in lockstep with
482+
# the generator's stopifnot guards.
483+
assert meta["nprobust_version"] == "0.5.0", (
484+
f"Fixture was generated against nprobust="
485+
f"{meta['nprobust_version']!r}; the parity test pins exactly "
486+
f"0.5.0. Regenerate after bumping the pin."
487+
)
449488

450489
def test_metadata_n_dgps(self, fixture):
451490
meta = fixture["metadata"]

0 commit comments

Comments
 (0)