You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Closes the PR #454 deferred R parity follow-up (TODO.md row removed).
Generated `benchmarks/data/r_bacondecomp_golden.json` from the committed
`benchmarks/R/generate_bacon_golden.R` script against `bacondecomp 0.1.1`
on R 4.5.2. Three DGP fixtures: `uniform_3groups_with_never_treated`,
`two_groups_no_never_treated`, `always_treated_remapped`.
Parity results at atol=1e-6 via `tests/test_methodology_bacon.py::TestBaconParityR`:
- TWFE coefficient: ✅ matches across all 3 fixtures
- Weights-sum: ✅ matches across all 3 fixtures
- Per-component: ✅ on the 2 non-remap fixtures; **structural convention
divergence** on `always_treated_remapped` (skipped per-component, kept
aggregate). R keeps `first_treat=1` as a distinct timing cohort and
emits `Later vs Always Treated` comparisons; Python's paper-footnote-11
convention remaps those units to `U` and folds them into a single
`treated_vs_never` cell per treated cohort. The aggregate is invariant
per Theorem 1 — the U bucket's weight is re-allocated across nested
2x2 cells but the total weight on {cohort_k vs U} is identical. Only
the per-component breakdown differs structurally between conventions.
Tracker promotions:
- METHODOLOGY_REVIEW.md: BaconDecomposition status row → **Complete**
(was `**Complete** (R parity pending)`); removed from In Progress
prose mention; removed from Priority Order substantive-review list;
Test Coverage count refreshed (24 → 33); R Comparison Results block
rewritten as **Validated**.
- docs/methodology/REGISTRY.md: Reference Implementations bullet + Verified
Components checklist + Note (weight modes) updated; new Note (R parity
convention divergence on always-treated) documents the convention.
- TODO.md: BaconDecomposition R parity goldens row removed.
- CHANGELOG.md: new `[Unreleased]` Added bullet for the close-out;
PR-B Changed entry tightened ("intended to match" → "matching ... at
atol=1e-6").
- diff_diff/bacon.py: `bacon_decompose` docstring example wording
tightened from "intended to match" to "matches" with TestBaconParityR
pointer.
Tests: 33/33 pass in test_methodology_bacon.py (no skips; was 30+3
skipped); 32 pass in test_bacon.py; 101 pass across the broader
bacon/decompose surface (was 98+3 skipped).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: METHODOLOGY_REVIEW.md
+21-19Lines changed: 21 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ A **Complete** entry has a documented review pass against the primary academic s
24
24
25
25
The catalog grew incrementally over several quarters, so formats vary across the existing Complete entries; the consistent invariant is that someone walked through the implementation against the academic source and captured the result here. New reviews going forward should aim for the fuller structure (Verified Components + Corrections Made + Deviations + dedicated methodology test file) used by the more recent entries.
26
26
27
-
**In Progress** entries have a REGISTRY.md section and unit-test coverage, but no formal walk-through has been captured here yet. The In Progress band is wide — some entries also have some combination of a paper review (primary or companion), a dedicated methodology test file, and R parity fixtures (e.g., DCDH has a methodology file, R parity, and a companion-paper review for the 2026 universal-rollout extension; HAD has its primary-source paper review and R parity but no dedicated methodology file; ContinuousDiD has the methodology file but no paper review); others have only the REGISTRY entry and unit tests (e.g., BaconDecomposition, PowerAnalysis). The "Documentation in place" sub-section enumerates what each entry already has; the "Outstanding for promotion" sub-section enumerates what's still needed to flip it to Complete.
27
+
**In Progress** entries have a REGISTRY.md section and unit-test coverage, but no formal walk-through has been captured here yet. The In Progress band is wide — some entries also have some combination of a paper review (primary or companion), a dedicated methodology test file, and R parity fixtures (e.g., DCDH has a methodology file, R parity, and a companion-paper review for the 2026 universal-rollout extension; HAD has its primary-source paper review and R parity but no dedicated methodology file; ContinuousDiD has the methodology file but no paper review); others have only the REGISTRY entry and unit tests (e.g., PowerAnalysis). The "Documentation in place" sub-section enumerates what each entry already has; the "Outstanding for promotion" sub-section enumerates what's still needed to flip it to Complete.
28
28
29
29
**Not Started** entries have neither a tracker walk-through nor an REGISTRY.md section. This tracker no longer carries any Not Started rows; new estimators are expected to enter as In Progress when their REGISTRY entry lands.
30
30
@@ -78,7 +78,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
78
78
79
79
| Tool | Module | R Reference | Status | Last Review |
@@ -909,7 +909,7 @@ and covariate-adjusted specifications.)
909
909
| Module |`bacon.py`|
910
910
| Primary Reference | Goodman-Bacon (2021), *Difference-in-differences with variation in treatment timing*, J. Econometrics 225(2), 254-277 |
911
911
| R Reference |`bacondecomp::bacon()`|
912
-
| Status |**Complete**(R parity goldens pending) |
912
+
| Status |**Complete**|
913
913
| Last Review | 2026-05-16 |
914
914
915
915
**Verified Components:**
@@ -926,14 +926,17 @@ and covariate-adjusted specifications.)
926
926
-[x] No untreated group: `s_{kU}` terms drop, weights renormalize, sum-to-1 still holds
927
927
-[x] Single timing group with U: only `treated_vs_never` comparisons
928
928
-[x] Survey design composes cleanly with exact mode and warn+remap
929
-
-[] R `bacondecomp::bacon()` parity at `atol=1e-6` — R generator script committed; JSON goldens pending follow-up R install (see TODO.md)
929
+
-[x] R `bacondecomp::bacon()` parity at `atol=1e-6` — 3 fixtures (`uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`); TWFE coefficient + weights-sum match across all 3 fixtures; per-component estimate + weight parity locked on the 2 non-remap fixtures, with a documented convention divergence on `always_treated_remapped` (Python's footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition — aggregate is invariant, breakdown is structurally different). See `benchmarks/data/r_bacondecomp_golden.json` + `TestBaconParityR`.
930
930
931
931
**Test Coverage:**
932
-
-24 methodology tests in `tests/test_methodology_bacon.py` across 6 classes (21 active + 3 R-parity tests that skip on missing goldens)
932
+
-33 methodology tests in `tests/test_methodology_bacon.py` across 6 classes (all active; R parity activates once goldens are committed)
-**Pending**: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time. R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`. JSON goldens land at `benchmarks/data/r_bacondecomp_golden.json` once generated. `tests/test_methodology_bacon.py::TestBaconParityR` skips with a pointer until then. Tracked in TODO.md follow-up row.
936
+
-**Validated** at `atol=1e-6` against `bacondecomp::bacon()` (version 0.1.1, R 4.5.2). Goldens at `benchmarks/data/r_bacondecomp_golden.json`; generator at `benchmarks/R/generate_bacon_golden.R`. Three DGP fixtures:
937
+
-`uniform_3groups_with_never_treated`: 9 components covering all three comparison types — full per-component parity (estimate + weight at `atol=1e-6`).
938
+
-`two_groups_no_never_treated`: 2 components, timing-only decomposition — full per-component parity.
939
+
-`always_treated_remapped`: TWFE coefficient + weights-sum match at `atol=1e-6`; per-component breakdown diverges by convention (Python's paper-footnote-11 U-remap vs R's distinct `Later vs Always Treated` cohort decomposition). The aggregate is invariant to the re-bucketing per Theorem 1; only the breakdown differs. Per-component assertion skipped for this fixture with explicit documentation in `TestBaconParityR.test_component_estimates_match_r`.
937
940
938
941
**Corrections Made:**
939
942
1.**Theorem 1 exact-weights rewrite** (`bacon.py:_recompute_exact_weights`, lines ~740-880). The previous "exact" mode implementation did not actually compute Eqs. 7-9 / 10e-g — it was missing the `(1 - n_kU)` factor in the within-subsample treatment variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper. The post-hoc sum-to-1 normalization masked the relative-weight error but produced a decomposition error of ~0.3% (0.007 absolute) against TWFE on a 3-cohort + never-treated DGP. **Rewrote** the function to compute the exact numerators of Eqs. 10e/f/g (with proper Eqs. 7-9 variances) and let the post-hoc normalization handle the `V̂^D` denominator (Theorem 1 identity guarantees `V̂^D = Σ numerators`). Now matches TWFE at `atol=1e-10`. The existing `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the contract.
@@ -1203,22 +1206,21 @@ Promotion priority for the **In Progress** entries, ordered by what's blocked on
1203
1206
1204
1207
**Substantive-review-blocked (no methodology test file, no paper review, no R parity):**
1205
1208
1206
-
1.**BaconDecomposition** — chosen for next substantive review during the 2026-05-15 tracker refresh session. Smaller scope than estimator reviews; R reference (`bacondecomp::bacon()`) available; methodology is well-understood (Goodman-Bacon 2021); REGISTRY checklist provides a ready-made target.
1207
-
2.**PreTrendsPower** — small surface, established R package (`pretrends`), Roth (2022) is short.
1208
-
3.**PowerAnalysis** — larger surface (MDE / power / sample size / simulation paths); REGISTRY already lists Bloom (1995) and Burlig et al. (2020) as primary sources; least urgent if the library's power-analysis utilities are not heavily used.
1209
-
4.**PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
1210
-
5.**EfficientDiD** — no paper review on file; substantial implementation work (`tests/test_efficient_did.py` + validation tests) needs paper-vs-code audit against Chen, Sant'Anna & Xie (2025).
1211
-
6.**ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
1209
+
1.**PreTrendsPower** — small surface, established R package (`pretrends`), Roth (2022) is short.
1210
+
2.**PowerAnalysis** — larger surface (MDE / power / sample size / simulation paths); REGISTRY already lists Bloom (1995) and Burlig et al. (2020) as primary sources; least urgent if the library's power-analysis utilities are not heavily used.
1211
+
3.**PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
1212
+
4.**EfficientDiD** — no paper review on file; substantial implementation work (`tests/test_efficient_did.py` + validation tests) needs paper-vs-code audit against Chen, Sant'Anna & Xie (2025).
1213
+
5.**ImputationDiD / TwoStageDiD** — natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
1212
1214
1213
1215
**Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**
1214
1216
1215
-
7.**HeterogeneousAdoptionDiD (HAD)** — largest current surface, Phase 4.5 just shipped; shares the de Chaisemartin (2026) paper review with DCDH; needs a dedicated Verified Components block.
1216
-
8.**ChaisemartinDHaultfoeuille (DCDH)** — methodology test file + 24 R parity tests + 347 unit tests + a companion-paper review for the 2026 universal-rollout extension. Primary-source reviews for the 2020 AER and 2022/2024 NBER WP 29873 papers are still outstanding alongside the Verified Components walk-through.
1217
-
9.**WooldridgeDiD (ETWFE)** — companion-paper review (Wooldridge 2023 nonlinear extension) merged in PR #443; primary-source review for Wooldridge (2025) ETWFE not yet on file, and no dedicated methodology test file.
1218
-
10.**ContinuousDiD** — 15 methodology tests already in place; mostly a consolidation pass with a documented boundary-knots deviation from R `contdid` v0.1.0.
1219
-
11.**TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available).
1220
-
12.**StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245).
1221
-
13.**ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker.
1217
+
6.**HeterogeneousAdoptionDiD (HAD)** — largest current surface, Phase 4.5 just shipped; shares the de Chaisemartin (2026) paper review with DCDH; needs a dedicated Verified Components block.
1218
+
7.**ChaisemartinDHaultfoeuille (DCDH)** — methodology test file + 24 R parity tests + 347 unit tests + a companion-paper review for the 2026 universal-rollout extension. Primary-source reviews for the 2020 AER and 2022/2024 NBER WP 29873 papers are still outstanding alongside the Verified Components walk-through.
1219
+
8.**WooldridgeDiD (ETWFE)** — companion-paper review (Wooldridge 2023 nonlinear extension) merged in PR #443; primary-source review for Wooldridge (2025) ETWFE not yet on file, and no dedicated methodology test file.
1220
+
9.**ContinuousDiD** — 15 methodology tests already in place; mostly a consolidation pass with a documented boundary-knots deviation from R `contdid` v0.1.0.
1221
+
10.**TROP** — paper review recently merged (PR #443); needs methodology file and cross-language anchor (when paper-author reference becomes available).
1222
+
11.**StaggeredTripleDifference** — shares the primary paper (Ortiz-Villavicencio & Sant'Anna 2025) with TripleDifference, but no dedicated paper review on file yet; needs R parity (R fixtures gitignored — tracked in TODO.md, PR #245).
1223
+
12.**ConleySpatialHAC** — paper review + committed R `conleyreg` goldens; needs dedicated methodology test file + summary R-parity table in this tracker.
1222
1224
14.**Survey Data Support** — cross-cutting feature; promotion requires the per-estimator integration paths to be locked down first.
Copy file name to clipboardExpand all lines: TODO.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,6 @@ Deferred items from PR reviews that were not addressed before merge.
74
74
75
75
| Issue | Location | PR | Priority |
76
76
|-------|----------|----|----------|
77
-
| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). |`benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR`| follow-up | Medium |
78
77
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. |`chaisemartin_dhaultfoeuille.py`|#294| Low |
79
78
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. |`chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md`|#408| Medium |
80
79
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). |`test_chaisemartin_dhaultfoeuille_parity.py`|#294| Low |
0 commit comments