Skip to content

Commit 25d5ed4

Browse files
authored
Merge pull request #457 from igerber/feature/bacon-r-parity-goldens
BaconDecomposition R parity goldens
2 parents 9aedd33 + 86c0389 commit 25d5ed4

8 files changed

Lines changed: 430 additions & 63 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Large diffs are not rendered by default.

METHODOLOGY_REVIEW.md

Lines changed: 27 additions & 24 deletions
Large diffs are not rendered by default.

TODO.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,6 @@ Deferred items from PR reviews that were not addressed before merge.
7474

7575
| Issue | Location | PR | Priority |
7676
|-------|----------|----|----------|
77-
| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). | `benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR` | follow-up | Medium |
7877
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
7978
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
8079
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |

benchmarks/R/generate_bacon_golden.R

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,21 @@
77
#
88
# The diff-diff BaconDecomposition implementation (`diff_diff/bacon.py`) with
99
# the default ``weights="exact"`` is expected to match the values in this JSON
10-
# to atol=1e-6 on the per-component (treated, control, type) tuples, and to
11-
# match the TWFE coefficient to the same tolerance. The ``weights="approximate"``
12-
# path is a library-only optimization and is NOT covered by this parity harness.
10+
# at atol=1e-6 along a three-tier contract:
11+
# (1) aggregate TWFE coefficient + weights-sum on all 3 fixtures;
12+
# (2) direct per-component (treated, control, type) parity on the 2
13+
# non-remap fixtures AND on the 6 timing-vs-timing rows of
14+
# `always_treated_remapped`;
15+
# (3) cohort-level fold-back parity for the U bucket on
16+
# `always_treated_remapped` — Python's paper-footnote-11 remap folds
17+
# R's separate `Later vs Always Treated` + `Treated vs Untreated`
18+
# rows into a single `treated_vs_never` cell per cohort, so the
19+
# aggregate is invariant per Theorem 1 but the per-component
20+
# breakdown differs by convention. See REGISTRY notes:
21+
# `**Note (R parity convention divergence on always-treated)**` and
22+
# `**Deviation (first-period boundary extension on always-treated remap)**`.
23+
# The ``weights="approximate"`` path is a library-only optimization and is
24+
# NOT covered by this parity harness.
1325
#
1426
# Three fixtures:
1527
# 1. uniform_3groups_with_never_treated — 3 timing groups + never-treated U;
@@ -18,8 +30,8 @@
1830
# 2. two_groups_no_never_treated — 2 timing groups only; tests the
1931
# timing-only decomposition where the s_{kU} terms drop.
2032
# 3. always_treated_remapped — 3 timing groups + 1 always-treated cohort
21-
# (first_treat = 1). Validates that Python's warn+remap of t_i < 1 into
22-
# U matches R bacondecomp's native behavior.
33+
# (first_treat = 1). Validates the convention-divergent U-bucket
34+
# fold-back on Python's warn+remap of always-treated units into U.
2335
#
2436
# Run:
2537
# cd benchmarks/R && Rscript generate_bacon_golden.R
@@ -193,11 +205,21 @@ df2 <- build_panel(
193205
fixture_2 <- extract_bacon(df2, "two_groups_no_never_treated")
194206

195207
cat("Building fixture 3: always_treated_remapped...\n")
196-
# 3 timing-cohorts + 5 always-treated units (first_treat = 1, i.e., treated
197-
# in every observable period) + 30 never-treated. R's bacondecomp natively
198-
# groups the first_treat=1 cohort with U (since they are treated throughout
199-
# every observable period and never serve as a within-window control), which
200-
# matches what diff-diff's warn+remap does in Python.
208+
# 3 timing-cohorts (3, 4, 5) + 5 always-treated units (first_treat = 1, i.e.,
209+
# treated in every observable period) + 25 never-treated. R's bacondecomp
210+
# keeps the first_treat=1 cohort as a *separate* timing cohort (not in U) and
211+
# emits a `Later vs Always Treated` comparison row for each later cohort
212+
# alongside the standard `Treated vs Untreated` row. Python's paper-footnote-11
213+
# convention remaps these units into the U bucket and folds R's two columns
214+
# of components into a single `treated_vs_never` cell per treated cohort.
215+
# The aggregate (TWFE coefficient + weights-sum) is invariant per Theorem 1,
216+
# but the per-component breakdown differs by convention — see REGISTRY
217+
# `**Note (R parity convention divergence on always-treated)**` and
218+
# `**Deviation (first-period boundary extension on always-treated remap)**`.
219+
# `tests/test_methodology_bacon.py::TestBaconParityR` carves out the U-bucket
220+
# rows for direct per-component parity (keeping the 6 timing-vs-timing rows
221+
# under direct parity) and asserts the U-bucket fold-back separately via
222+
# `test_always_treated_remapped_fold_back_matches_r` at atol=1e-6.
201223
df3 <- build_panel(
202224
n_units_per_cohort = 25L,
203225
n_periods = 6L,
@@ -220,8 +242,18 @@ out <- list(
220242
r_version = R.version.string,
221243
description = paste(
222244
"Goodman-Bacon (2021) decomposition parity goldens for diff-diff",
223-
"BaconDecomposition. Parity target: atol=1e-6 on per-component",
224-
"(treated, control, type) tuples plus the TWFE coefficient."
245+
"BaconDecomposition. Parity target at atol=1e-6:",
246+
"(1) aggregate TWFE coefficient + weights-sum across all 3 fixtures;",
247+
"(2) direct per-component (treated, control, type) parity on the 2",
248+
"non-remap fixtures AND on the 6 timing-vs-timing rows of",
249+
"always_treated_remapped;",
250+
"(3) cohort-level fold-back parity for the U bucket on",
251+
"always_treated_remapped (Python's paper-footnote-11 remap folds",
252+
"R's separate Later-vs-Always-Treated + Treated-vs-Untreated rows",
253+
"into a single treated_vs_never cell per cohort; aggregate is",
254+
"invariant per Theorem 1, breakdown differs by convention).",
255+
"See REGISTRY Note (R parity convention divergence on always-treated)",
256+
"+ Deviation (first-period boundary extension)."
225257
)
226258
),
227259
uniform_3groups_with_never_treated = fixture_1,

benchmarks/data/r_bacondecomp_golden.json

Lines changed: 211 additions & 0 deletions
Large diffs are not rendered by default.

diff_diff/bacon.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,15 @@ def fit(
475475
excluding the never-treated sentinels ``0`` and ``np.inf``)
476476
are automatically remapped to the ``U`` (untreated) bucket
477477
per Goodman-Bacon (2021) footnote 11, with a
478-
``UserWarning``. Detection uses ordered-time logic on the
478+
``UserWarning``. **Library boundary extension:** the paper
479+
uses the strict inequality ``t_i < 1`` (units treated
480+
*before* the first observable period); the library uses the
481+
**inclusive** ``first_treat <= min(time)`` rule, additionally
482+
folding units treated *at* the first observable period
483+
(``first_treat == min(time)``) into ``U`` because such units
484+
have no untreated cell in-panel. See REGISTRY's
485+
``**Deviation (first-period boundary extension on
486+
always-treated remap)**`` block for the full contract. Detection uses ordered-time logic on the
479487
**time axis** so panels whose ``time`` column contains
480488
negative or zero-crossing labels (e.g. event-time
481489
``time ∈ [-2,..,3]``) are handled correctly; the ``0``
@@ -1302,9 +1310,16 @@ def bacon_decompose(
13021310
>>> from diff_diff import bacon_decompose
13031311
>>>
13041312
>>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights
1305-
>>> # (weights="exact"); intended to match R bacondecomp::bacon() at
1306-
>>> # atol=1e-6 (R parity goldens pending — see TODO.md "R parity
1307-
>>> # goldens generation" for the deferred validation step).
1313+
>>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on
1314+
>>> # the aggregate (TWFE coefficient + weights-sum) across all panels,
1315+
>>> # and on the per-component breakdown when there are no
1316+
>>> # always-treated / first-period-treated cohorts (i.e. all
1317+
>>> # non-sentinel first_treat values are strictly greater than
1318+
>>> # min(time)). For panels with always-treated units, the
1319+
>>> # per-component breakdown diverges by convention (Python remaps
1320+
>>> # to U per paper footnote 11; R emits `Later vs Always Treated`);
1321+
>>> # see REGISTRY note on R parity convention divergence. Validated
1322+
>>> # via tests/test_methodology_bacon.py::TestBaconParityR.
13081323
>>> results = bacon_decompose(
13091324
... data=panel_df,
13101325
... outcome='earnings',

docs/methodology/REGISTRY.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2616,7 +2616,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in
26162616

26172617
*Assumption checks / warnings:*
26182618
- Requires variation in treatment timing (staggered adoption)
2619-
- Always-treated units (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`; paper footnote 11) are automatically remapped to the `U` (untreated) bucket with a `UserWarning`; see the `**Note (always-treated remap)**` below for the full ordered-time / sentinel contract
2619+
- Always-treated units (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`; per paper footnote 11 with a library-convention extension on the first-period boundary case, see `**Deviation (first-period boundary extension)**` below) are automatically remapped to the `U` (untreated) bucket with a `UserWarning`; see the `**Note (always-treated remap)**` below for the full ordered-time / sentinel contract
26202620
- Unbalanced panels are accepted with a `UserWarning`; the paper's Appendix A proof assumes balanced panels
26212621
- Falls back to timing-only comparisons when no never-treated units are present (no untreated group → `s_{kU}` terms drop, weights rescale to sum to 1; **VWCT and ΔATT can still bias the result** — see paper Eqs. 14-15)
26222622

@@ -2668,7 +2668,7 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_
26682668
- Always-treated units: see `**Note (always-treated remap)**` below
26692669

26702670
**Reference implementation(s):**
2671-
- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens pending follow-up R install (see TODO.md).
2671+
- R: `bacondecomp::bacon()` (CRAN). Parity script at `benchmarks/R/generate_bacon_golden.R`; goldens committed at `benchmarks/data/r_bacondecomp_golden.json` (generated against `bacondecomp` 0.1.1 + R 4.5.2). Parity validated at `atol=1e-6` via `tests/test_methodology_bacon.py::TestBaconParityR` (4 tests: TWFE coefficient + weights-sum match across 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket convention divergence on `always_treated_remapped` is pinned by a dedicated fold-back test).
26722672
- Stata: `bacondecomp` (SSC). Authors: Goodman-Bacon, Goldring, Nichols (2019).
26732673

26742674
**Requirements checklist:**
@@ -2678,11 +2678,13 @@ Where `n_k` is the sample share of timing group `k`, `n_{kℓ} = n_k / (n_k + n_
26782678
- [x] Visualization shows weight vs. estimate by comparison type
26792679
- [x] Always-treated remap to U per Goodman-Bacon (2021) footnote 11 (PR-B audit)
26802680
- [x] Hand-calculable Theorem 1 verification: `tests/test_methodology_bacon.py::TestBaconHandCalculation` (7 tests, atol=1e-10)
2681-
- [ ] R `bacondecomp::bacon()` parity at atol=1e-6 (R generator script committed; JSON goldens pending follow-up R install — `tests/test_methodology_bacon.py::TestBaconParityR` skips when missing)
2681+
- [x] R `bacondecomp::bacon()` parity at atol=1e-6 (3 fixtures; TWFE coefficient + weights-sum match across all 3; per-component parity locked on the 2 non-remap fixtures and on the 6 timing-vs-timing rows of `always_treated_remapped`; the U-bucket fold-back is asserted by a dedicated `test_always_treated_remapped_fold_back_matches_r` — see `**Note (R parity convention divergence)**` below)
26822682
- [x] Survey design support (Phase 3): weighted cell means, weighted within-transform, weighted group shares
2683-
- **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default with the **intent** to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated by hand-calculation (atol=1e-10) and TWFE-vs-weighted-sum identity (atol=1e-10) but the direct R bit-by-bit parity at atol=1e-6 is still pending the R `bacondecomp` install — see Test Coverage checklist above. The approximate path is retained for backward compatibility; numerical output may differ from R.
2684-
- **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`.
2683+
- **Note (weight modes):** `weights="exact"` (default, paper-faithful Eqs. 7-9 + 10e-g) vs `weights="approximate"` (simplified variance, opt-in for speed-sensitive diagnostic loops). The PR-A paper review (#451) and PR-B audit established `"exact"` as the default to match R `bacondecomp::bacon()` and the paper's Theorem 1 contract; R parity is validated at `atol=1e-6` (see `**Note (R parity convention divergence)**` below for the one structural convention difference). Hand-calculation + TWFE-vs-weighted-sum identity hold at `atol=1e-10`. The approximate path is retained for backward compatibility; numerical output may differ from R.
2684+
- **Note (always-treated remap):** Units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning` — per paper footnote 11 (with a library boundary extension on `first_treat == min(time)`; see `**Deviation (first-period boundary extension)**` below). Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (e.g. event-time `time ∈ [-2,..,3]`) are handled correctly: a cohort at `first_treat=-1` on such a panel is a valid timing group; a cohort at `first_treat=-3` is remapped to U. The user's original `first_treat` column on the input `data` frame is preserved unchanged. The count of remapped units is surfaced via `BaconDecompositionResults.n_always_treated_remapped`. **Sentinel restriction:** `first_treat ∈ {0, np.inf}` is reserved as the never-treated marker and is not configurable today; a real treatment cohort with `first_treat == 0` would be folded into `U` and should be re-labeled to a non-sentinel value before fitting. The `0` reservation applies to `first_treat` only, not to `time`.
26852685
- **Note (Bacon survey diagnostic):** Bacon decomposition with survey weights is diagnostic; exact-sum guarantee holds at machine precision under `weights="exact"` **on balanced panels**. `weights="exact"` requires within-unit-constant survey columns (approximate path accepts time-varying weights).
2686+
- **Note (R parity convention divergence on always-treated):** R `bacondecomp::bacon()` keeps `first_treat=1` (the always-treated cohort) as a separate timing cohort and emits an additional comparison type `Later vs Always Treated` (cohort k vs the always-treated cell) alongside the standard `Treated vs Untreated` row. Python's footnote-11 convention remaps these units to the `U` bucket and folds those R-side rows into a single `treated_vs_never` cell per treated cohort. The aggregate (TWFE coefficient + sum of weights) is invariant to this re-bucketing — Theorem 1's identity holds identically because the U bucket's total weight gets re-allocated across nested 2x2 cells but the total weight on `{cohort_k vs U}` is the same. The per-component breakdown, however, differs structurally between the two conventions. The R parity test (`tests/test_methodology_bacon.py::TestBaconParityR::test_component_estimates_match_r`) asserts per-component parity at `atol=1e-6` on the 2 fixtures without always-treated (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`) AND on the 6 timing-vs-timing rows of `always_treated_remapped` — the carve-out is narrowed to U-bucket rows only (R's `Later vs Always Treated` rows canonicalize to `treated_vs_never` and are dropped alongside the matching Python rows). The R→Python U-bucket fold-back is pinned separately by `test_always_treated_remapped_fold_back_matches_r`, which aggregates R's split `Later vs Always Treated` + `Treated vs Untreated` rows per treated cohort and asserts the combined weight + weight-averaged estimate match Python's single `treated_vs_never` cell at `atol=1e-6`. Aggregate parity (`test_twfe_coef_matches_r`, `test_weights_sum_matches_r`) is locked across all 3 fixtures.
2687+
- **Deviation (first-period boundary extension on always-treated remap):** Paper footnote 11 (Goodman-Bacon 2021) uses the strict inequality `t_i < 1` (units treated *before* the first observable period) for the always-treated bucket. The library applies the **inclusive** `first_treat <= min(time)` rule, which additionally folds units treated *at* the first observable period (`first_treat == min(time)`) into `U`. This is a library boundary convention, not a paper-faithful rule: such units have no untreated cell in the observed panel and so cannot contribute to any 2x2 DD as a treated cohort, so folding them into the U bucket mirrors the always-treated handling rather than dropping them silently. R `bacondecomp::bacon()` does not apply this boundary fold-back — it keeps `first_treat == min(time)` cohorts in their own bucket and emits `Later vs Always Treated` comparisons (see the **Note (R parity convention divergence on always-treated)** above for how the parity tests handle the resulting structural breakdown difference; aggregate Theorem 1 identity remains invariant). When `min(time)` is strictly greater than 1 (no first-period-treated cohorts), the library rule reduces to the paper's strict rule and the two conventions coincide.
26862688
- **Deviation (unbalanced-panel library extension):** Unbalanced panels are accepted with a `UserWarning` ("Unbalanced panel detected. Bacon decomposition assumes balanced panels. Results may be inaccurate."). Goodman-Bacon (2021) Appendix A's proof assumes a balanced panel; under unbalance, the Theorem 1 identity holds only approximately. The decomposition still returns finite, well-defined outputs but `weights="exact"` does NOT achieve the machine-precision algebraic identity that the balanced-panel claims above describe.
26872689

26882690
---

0 commit comments

Comments
 (0)