Skip to content

Commit 8225ba0

Browse files
committed
PR #457 R4 polish: narrow always-treated carve-out to U-bucket only
R4 verdict was Looks good with 1 P3 informational item: the per-component parity test skipped the ENTIRE always_treated_remapped fixture, leaving the 6 timing-vs-timing rows (Earlier/Later vs Earlier/Later Treated between cohorts 3/4/5) without direct per-component parity assertions. Per memory feedback_test_coverage_gap_treat_as_actionable, this is the "test exists but doesn't directly exercise the surface" pattern and should be actionable. Narrowed the carve-out: instead of skipping the whole fixture, drop only the treated_vs_never keys from both Python and R sides (the actual U-bucket convention divergence), and keep direct atol=1e-6 parity assertions on the 6 timing-vs-timing keys. Also refined _classify_r_type to canonicalize R's "Later vs Always Treated" type string to treated_vs_never (Python folds those rows into the U bucket per paper footnote 11, so they belong to the U comparison set semantically even though R numbers them by the always-treated cohort), keeping the narrow carve-out simple. Tests: 34/34 pass in test_methodology_bacon.py (+6 directly asserted timing-vs-timing comparisons in the remap fixture vs prior coverage).
1 parent 780d502 commit 8225ba0

1 file changed

Lines changed: 40 additions & 27 deletions

File tree

tests/test_methodology_bacon.py

Lines changed: 40 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -372,13 +372,21 @@ def _canonical_control(ctype: str, group):
372372

373373
def _classify_r_type(c: dict, fixture_name: str) -> str:
374374
# R bacondecomp's `type` strings vary across versions
375-
# ("Treated vs Untreated", "Earlier vs Later Treated", ...).
376-
# Fall back to inferring from the control_group: U sentinel
377-
# (0, np.inf, or "never"-containing string) -> treated_vs_never;
378-
# otherwise treated_group < control_group is earlier-vs-later.
375+
# ("Treated vs Untreated", "Earlier vs Later Treated",
376+
# "Later vs Always Treated", ...). Fall back to inferring from
377+
# the control_group: U sentinel (0, np.inf, or "never"-containing
378+
# string) -> treated_vs_never; otherwise treated_group <
379+
# control_group is earlier-vs-later. Note: ``Later vs Always
380+
# Treated`` is canonicalized to ``treated_vs_never`` here because
381+
# Python's paper-footnote-11 convention folds always-treated
382+
# units into the U bucket — semantically these R rows belong
383+
# to the U comparison set even though R numbers them by the
384+
# always-treated cohort (typically first_treat=1).
379385
t = c.get("type") or ""
380386
if "never" in t.lower() or "untreated" in t.lower():
381387
return "treated_vs_never"
388+
if "always" in t.lower():
389+
return "treated_vs_never"
382390
ctrl = c["control_group"]
383391
if isinstance(ctrl, str) and "never" in ctrl.lower():
384392
return "treated_vs_never"
@@ -398,30 +406,17 @@ def _classify_r_type(c: dict, fixture_name: str) -> str:
398406
for fixture_name, fix in golden.items():
399407
if fixture_name == "meta":
400408
continue
401-
# ``always_treated_remapped``: R keeps ``first_treat=1`` as a
402-
# separate cohort and emits ``Later vs Always Treated`` (and
403-
# ``Treated vs Untreated``) comparisons against it. Python's
404-
# paper-footnote-11 convention remaps those units to U,
405-
# folding R's two columns of components into single
406-
# ``treated_vs_never`` cells per treated cohort. The aggregate
407-
# (TWFE coefficient + weights-sum) is invariant to this
408-
# re-bucketing and is locked by ``test_twfe_coef_matches_r``
409-
# and ``test_weights_sum_matches_r`` above, but the
410-
# per-component set differs **structurally** under the two
411-
# conventions. Skip this fixture's per-component assertion
412-
# while keeping the aggregate parity. See REGISTRY note on
413-
# always-treated remap for the convention rationale.
414-
if fixture_name == "always_treated_remapped":
415-
continue
416409
panel = pd.DataFrame(fix["panel"])
417-
results = bacon_decompose(
418-
panel,
419-
outcome="y",
420-
unit="unit",
421-
time="time",
422-
first_treat="first_treat",
423-
weights="exact",
424-
)
410+
with warnings.catch_warnings():
411+
warnings.simplefilter("ignore", category=UserWarning)
412+
results = bacon_decompose(
413+
panel,
414+
outcome="y",
415+
unit="unit",
416+
time="time",
417+
first_treat="first_treat",
418+
weights="exact",
419+
)
425420
py_estimates = {}
426421
py_weights = {}
427422
for c in results.comparisons:
@@ -443,6 +438,24 @@ def _classify_r_type(c: dict, fixture_name: str) -> str:
443438
)
444439
r_estimates[key] = c["estimate"]
445440
r_weights[key] = c["weight"]
441+
# ``always_treated_remapped`` carves out only the U-bucket rows,
442+
# which R and Python decompose under different conventions
443+
# (R: separate ``Later vs Always Treated`` + ``Treated vs
444+
# Untreated``; Python: single ``treated_vs_never`` per cohort
445+
# via paper-footnote-11 remap). The aggregated fold-back is
446+
# asserted in ``test_always_treated_remapped_fold_back_matches_r``.
447+
# The 6 timing-vs-timing rows in that fixture are NOT affected
448+
# by the convention split and must satisfy direct per-component
449+
# parity at atol=1e-6 — narrow the carve-out to U-bucket keys
450+
# only so regressions in timing-vs-timing decomposition are
451+
# caught directly, not just through aggregate parity.
452+
if fixture_name == "always_treated_remapped":
453+
# Drop only treated_vs_never keys from both sides; keep
454+
# earlier_vs_later + later_vs_earlier for direct parity.
455+
py_estimates = {k: v for k, v in py_estimates.items() if k[0] != "treated_vs_never"}
456+
py_weights = {k: v for k, v in py_weights.items() if k[0] != "treated_vs_never"}
457+
r_estimates = {k: v for k, v in r_estimates.items() if k[0] != "treated_vs_never"}
458+
r_weights = {k: v for k, v in r_weights.items() if k[0] != "treated_vs_never"}
446459
# Full-set equality: no Python component missing from R, no R
447460
# component missing from Python. A dropped β̂_{kU} term or an
448461
# extra spurious comparison would fail here.

0 commit comments

Comments
 (0)