Skip to content

Wave 4 Tier A drain: EfficientDiD anticipation note, generate_ddd_panel_data, TROP data-setup helper#455

Merged
igerber merged 6 commits into
mainfrom
wave-4-tier-a-drain
May 16, 2026
Merged

Wave 4 Tier A drain: EfficientDiD anticipation note, generate_ddd_panel_data, TROP data-setup helper#455
igerber merged 6 commits into
mainfrom
wave-4-tier-a-drain

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 16, 2026

Summary

  • Item 2efficient_did: align REGISTRY note with last_cohort × anticipation trim (efficient_did.py:470 already trims at last_g - anticipation; REGISTRY now describes the anticipation>0 case; docstring cross-references added; regression test in TestLastCohortControl).
  • Item 3prep_dgp: add generate_ddd_panel_data for panel-structured DDD power-analysis DGPs (balanced panel of n_units × n_periods; unit-level time-invariant group/partition; derived post; DDD-CPT preserved by construction). Exported from diff_diff and diff_diff.prep; autofunction stub in docs/api/prep.rst; 14 new tests including a deterministic recovery test (noise_sd=0, ATT recovery to ~1e-15); CHANGELOG entry under [Unreleased] ### Added. Cross-sectional generate_ddd_data unchanged.
  • Item 4trop: extract shared _setup_trop_data(...) helper from TROP.fit() (local path) and _fit_global() — ~85 LoC of near-identical data-setup logic deduped into one helper in trop_local.py. The global-method staggered-adoption check stays in _fit_global as a post-helper validation. Helper returns all four index mappings uniformly (unit_to_idx/period_to_idx/idx_to_unit/idx_to_period); parity regression test verifies round-trip + shape consistency. Behavior-preserving (84 non-slow TROP tests green).
  • TODO.md: 3 Tier-A backlog rows drained; 2 follow-up rows added (TROP bootstrap-loop dedup; TripleDifference power auto-routing).

Methodology references (required if estimator / math changes)

  • Method name(s): EfficientDiD (REGISTRY-only update, no estimator math touched); generate_ddd_panel_data is a new utility DGP, not an estimator; TROP refactor is data-setup extraction, not methodology.
  • Paper / source link(s): No new citations introduced. EfficientDiD's last_g - anticipation trim is the existing code's behavior (PR EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning #230); this PR aligns REGISTRY to that documented behavior. The DDD-CPT identifying assumption for generate_ddd_panel_data follows the standard triple-difference identification (see generate_ddd_data for the cross-sectional analog).
  • Any intentional deviations from the source (and why): None. REGISTRY now matches code; anticipation-contaminated periods are excluded from the pseudo-control's pre-treatment window (methodologically correct).

Validation

  • Tests added/updated:
    • tests/test_efficient_did.py::TestLastCohortControl::test_last_cohort_with_anticipation_trims_at_last_g_minus_anticipation (regression guard for the anticipation trim).
    • tests/test_prep.py::TestGenerateDddPanelData (14 tests including deterministic ATT recovery, finite-sample ATT recovery, validation, balance, time-invariance).
    • tests/test_trop.py::TestTROPModuleSplit::test_setup_trop_data_internal_contract (parity regression for the shared helper's return contract).
  • Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial updates.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • P1: generate_ddd_panel_data() can return datasets missing required DDD cells under valid inputs, so the advertised TripleDifference.fit(..., time="post") path can fail immediately.
  • EfficientDiD change is documentation-only: code already trims last-cohort controls at last_g - anticipation, and the REGISTRY/docstring now match that behavior.
  • TROP refactor looks behavior-preserving; estimator math, identification checks, and inference paths are unchanged.
  • The remaining TROP bootstrap dedup and TripleDifference power auto-routing gaps are explicitly tracked in TODO.md, so they are informational only.
  • Static review only: I could not run Python tests here because the environment is missing numpy.

Methodology

Code Quality
No findings.

Performance
No findings.

Maintainability

  • P3 Impact: the remaining TROP bootstrap-loop dedup is explicitly deferred in TODO.md:L87. Concrete fix: none required for this PR; the follow-up item is properly tracked.

Tech Debt

  • P3 Impact: simulation auto-routing to the new panel DGP is still deferred, so simulate_power/simulate_mde keep using the old fixed 2×2×2 DGP for TripleDifference; this is explicitly tracked in TODO.md:L88. Concrete fix: none required for this PR; the follow-up item is properly tracked.

Security
No findings.

Documentation/Tests
No additional findings beyond the P1 above. The new DDD-panel tests cover only large balanced/happy-path cases and do not exercise the empty-cell edge case described above (tests/test_prep.py:L1139-L1308). Validation note: I could not execute the Python tests in this environment because python is missing numpy.

Path to Approval

  1. Change generate_ddd_panel_data() so it never returns a dataset missing any (group, partition) cell: either deterministically allocate the 2×2 unit cells, or validate/resample and raise a targeted ValueError when the rounded marginals make that impossible.
  2. Add regression tests for an impossible configuration that currently passes validation (n_units=4, group_frac=0.25, partition_frac=0.25) and for the smallest feasible configuration where the generator must still return all four unit cells and TripleDifference.fit(..., time="post") succeeds.
  3. Update the generator’s docstring/error text so the public contract matches the enforced behavior, especially the current n_units >= 4 claim and the direct TripleDifference.fit(..., time="post") compatibility statement.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 03fce79fc5204b273ee731e24a42d0be5bbc1b2c


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior generate_ddd_panel_data() blocker is resolved. The generator now fails fast when rounded cell counts would leave any (group, partition) cell empty and includes regression coverage for both infeasible and smallest-feasible configurations (diff_diff/prep_dgp.py:L1264-L1367, tests/test_prep.py:L1237-L1341).
  • EfficientDiD’s control_group="last_cohort" docs and Methodology Registry now match the existing last_g - anticipation trim, and the PR adds a direct regression test for that behavior (diff_diff/efficient_did.py:L161-L180, diff_diff/efficient_did.py:L463-L480, docs/methodology/REGISTRY.md:L875-L880, docs/methodology/REGISTRY.md:L934-L939, tests/test_efficient_did.py:L1128-L1158).
  • The TROP refactor centralizes shared fit-time setup in _setup_trop_data(), and both local/global paths now consume the same validated context; I did not find a behavior-changing estimator or inference regression in the changed surface (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L472-L486, diff_diff/trop_global.py:L587-L605, tests/test_trop.py:L4149-L4190).
  • Remaining TROP bootstrap dedup and TripleDifference power auto-routing work is explicitly tracked in TODO.md, so those items are informational only (TODO.md:L87-L88).
  • Validation note: this was a static review only. I could not run the targeted tests because python in this environment lacks numpy, and pytest is not installed.

Methodology

No findings. The previous DDD generator issue is fixed, and the new generator now enforces the 8-cell contract that TripleDifference.fit() requires (diff_diff/prep_dgp.py:L1264-L1367, diff_diff/triple_diff.py:L776-L798, tests/test_prep.py:L1237-L1341).

Code Quality

No findings. The TROP extraction is a straightforward consolidation of duplicated setup logic into one helper used by both fit paths (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L472-L486, diff_diff/trop_global.py:L587-L605).

Performance

No findings.

Maintainability

P3 informational — Impact: the PR improves TROP maintainability by deduplicating fit-time setup, but intentionally leaves bootstrap-loop dedup for follow-up. Concrete fix: none required for this PR; the remaining work is already tracked in TODO.md:L87-L87.

Tech Debt

P3 informational — Impact: TripleDifference power simulation is not yet auto-routed to generate_ddd_panel_data() when n_periods > 2, but the limitation is explicitly tracked and disclosed. Concrete fix: none required for this PR; follow-up is tracked in TODO.md:L88-L88.

Security

No findings.

Documentation/Tests

No blocking findings. Public API exposure and docs for generate_ddd_panel_data() were updated consistently across diff_diff/prep.py:L18-L23, diff_diff/__init__.py:L120-L128, diff_diff/__init__.py:L409-L413, and docs/api/prep.rst:L73-L78, and the new tests cover the previously missing empty-cell edge case plus the smallest feasible success case (tests/test_prep.py:L1237-L1341). Validation note: I could not execute the test suite here because python lacks numpy and pytest is unavailable.

igerber and others added 5 commits May 16, 2026 13:02
The `control_group="last_cohort"` path in EfficientDiD.fit (line 470) trims
periods at `last_g - anticipation`, excluding anticipation-contaminated
periods from the pseudo-control's pre-treatment window. REGISTRY.md
previously described only the `anticipation=0` case. Aligns both the
Edge Cases bullet and the algorithm Note to the code's actual behavior,
cross-references the interaction in the `control_group` and `anticipation`
docstring entries, and adds a regression test asserting the trim cuts at
`last_g - anticipation` rather than `last_g`.

Closes Tier-A backlog item 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new public DGP function that generates a balanced panel of n_units
observed over n_periods with two unit-level binary dimensions (`group`,
`partition`, time-invariant) and a derived `post` indicator. The DDD-CPT
identifying assumption holds because `group_partition_interaction` enters
only as a unit-level (time-invariant) effect, leaving the triple
interaction `treatment_effect * group * partition * post` as the sole
source of differential group × partition trend.

The existing cross-sectional `generate_ddd_data` remains unchanged.
Compatible with `TripleDifference.fit(..., time="post")` directly; the
binary 2×2×2 estimator surface is unchanged. Auto-routing of
`power.simulate_power` to the panel DGP for `n_periods > 2` is deferred
to a follow-up (TODO.md row added).

Exports: top-level `diff_diff` and `diff_diff.prep` re-export; new
autofunction stub in docs/api/prep.rst. Tests in
tests/test_prep.py::TestGenerateDddPanelData (14 tests) including a
deterministic recovery test (noise_sd=0, ATT recovery to ~1e-15) and a
finite-sample recovery test.

Closes Tier-A backlog item 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`TROP.fit()` (local path) and `_fit_global()` previously duplicated ~85
lines of data-setup logic each: panel pivoting, absorbing-state
validation, treated/control unit identification, first-treatment-period
detection, and pre/post period counting. The duplicated blocks were
near-identical line-by-line, differing only in which index mappings
the caller built (local built all four; global built only the forward
maps).

Extracts `_setup_trop_data(...)` in trop_local.py (alongside the existing
`_validate_and_pivot_treatment` helper). Both callers now invoke it and
unpack only the fields they consume. The helper returns all four index
mappings (`unit_to_idx`, `period_to_idx`, `idx_to_unit`, `idx_to_period`)
uniformly, eliminating a class of subtle drift bug; `_fit_global` gains
two unused locals as a trade-off.

The global-method-specific staggered-adoption check stays in
`_fit_global` as a post-helper validation (it depends on estimator
semantics, not data preparation). Bootstrap-loop dedup (~40 LoC across
`_bootstrap_variance` / `_bootstrap_variance_global`) is deferred to a
follow-up (TODO.md row added).

Adds a parity regression test `TestTROPModuleSplit::test_setup_trop_data_internal_contract`
that round-trip-verifies the index mappings, shape consistency, and
treated/control partition disjointness.

Behavior-preserving: TROP test suite (84 non-slow tests) is the safety
net.

Closes Tier-A backlog item 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes three Methodology/Correctness rows (EfficientDiD anticipation
trim, generate_ddd_panel_data, TROP fit/_fit_global dedup) and the
three corresponding Tier A bullets, addressed in this PR. Adds two
follow-up rows for the deferred scope (TROP bootstrap-loop dedup,
TripleDifference power auto-routing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l_data

The R1 review identified that independent marginal sampling of `group` and
`partition` could leave one of the four (group, partition) cells empty
under valid inputs — e.g., `n_units=4, group_frac=partition_frac=0.25`
rounds to `n_group_1=1` and `n_p1_g1=0`, so the (1, 1) cell collapses
before `TripleDifference.fit`'s 2x2x2 cell-presence check.

Switches to stratified allocation: assign `group` to its requested
fraction, then within each group stratum, draw `partition=1` at
`partition_frac`. Adds a targeted `ValueError` when the rounded cell
counts would leave any (group, partition) cell empty (with the four
cell counts in the error message so users can pick a feasible config).

Adds two regression tests:
- `test_infeasible_cell_counts_raise` exercises both the `n_units=4`
  small-marginal case and an `n_units=10, group_frac=0.1` case.
- `test_smallest_feasible_config_populates_all_cells` verifies the
  smallest feasible config (`n_units=4, fracs=0.5`) yields one unit per
  cell and that `TripleDifference.fit(..., time="post")` succeeds on
  it (the contract the docstring advertises).

Updates the `group_frac` / `partition_frac` docstring entries to describe
the stratified allocation guarantee, and the `[Unreleased]` CHANGELOG
entry to mention the cell-coverage invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the wave-4-tier-a-drain branch from 03fce79 to 4aa3009 Compare May 16, 2026 17:05
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4aa3009308294665d03d16544ce453dccc7bbf2b


Overall Assessment

⚠️ Needs changes

One unmitigated P1 remains in the new generate_ddd_panel_data() guidance: it advertises panel-power usage with direct TripleDifference().fit(..., time="post"), but the DDD estimator is the repeated-cross-section (panel=FALSE) implementation and defaults to row-level inference unless the caller clusters.

Executive Summary

  • The prior blocker on empty DDD cells is resolved: generate_ddd_panel_data() now rejects rounded (group, partition) allocations that would leave any cell empty, and the new tests cover both infeasible and smallest-feasible cases (diff_diff/prep_dgp.py:L1279-L1306, tests/test_prep.py:L1237-L1288).
  • EfficientDiD’s control_group="last_cohort" documentation now matches the existing last_g - anticipation trim, and the PR adds a direct regression guard for that behavior (diff_diff/efficient_did.py:L161-L180, diff_diff/efficient_did.py:L463-L479, docs/methodology/REGISTRY.md:L875-L880, docs/methodology/REGISTRY.md:L934-L939, tests/test_efficient_did.py:L1128-L1159).
  • The TROP _setup_trop_data() extraction looks behavior-preserving for the local/global fit paths; I did not find a new validation or inference regression in that refactor (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L471-L486, diff_diff/trop_global.py:L587-L626, tests/test_trop.py:L4149-L4190).
  • P1 [Newly identified]: generate_ddd_panel_data() is documented as suitable for panel power analysis and shows direct TripleDifference().fit(..., time="post") usage, but TripleDifference explicitly implements the repeated-cross-section panel=FALSE surface and uses row-level IF/df = n_obs - 8 inference unless cluster is provided (diff_diff/prep_dgp.py:L1176-L1179, diff_diff/prep_dgp.py:L1256-L1262, diff_diff/triple_diff.py:L16-L17, diff_diff/triple_diff.py:L697-L702, docs/methodology/REGISTRY.md:L1719-L1745). With the new unit FE / within-unit serial structure, that guidance can silently understate SEs and overstate power.
  • Validation note: this was a static review only. I could not run the targeted tests here because the environment lacks numpy, pandas, and pytest.

Methodology

  • P1 [Newly identified] Impact: the new DGP adds unit fixed effects and within-unit serial structure, then markets itself as a panel-power DGP and demonstrates unclustered direct use with TripleDifference().fit(..., time="post") (diff_diff/prep_dgp.py:L1176-L1179, diff_diff/prep_dgp.py:L1256-L1262). That conflicts with the documented DDD methodology surface, which is the repeated-cross-section panel=FALSE implementation with individual-level default SE and only optional cluster-robust inference (diff_diff/triple_diff.py:L16-L17, docs/methodology/REGISTRY.md:L1719-L1745, diff_diff/triple_diff.py:L697-L702). Concrete fix: either document the supported panel path as TripleDifference(cluster="unit").fit(...) and add a Note:/deviation entry to the methodology docs, or remove the current “panel power analysis / direct fit” framing until cluster-aware support is explicit.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • P3 informational — Impact: the TROP fit-time setup dedup is a net maintainability improvement, and the remaining bootstrap-loop dedup is explicitly tracked rather than silently deferred (TODO.md:L87-L87). Concrete fix: none required for this PR.

Tech Debt

  • P3 informational — Impact: simulate_power still routes TripleDifference to the fixed 2×2×2 cross-sectional DGP; the new panel DGP is not auto-selected yet, but that limitation is explicitly tracked (diff_diff/power.py:L649-L685, diff_diff/power.py:L825-L829, TODO.md:L88-L88). Concrete fix: none required for this PR.

Security

  • No findings.

Documentation/Tests

  • P2 Impact: the new generate_ddd_panel_data() tests lock cell-population and ATT-recovery behavior, but they do not lock the inference contract for the panel-generated path, so the unclustered usage above could regress unnoticed (tests/test_prep.py:L1260-L1341). Concrete fix: add a regression that uses the documented panel path, e.g. TripleDifference(cluster="unit"), on panel-generated data.

Path to Approval

  1. Update the generate_ddd_panel_data() public guidance so the supported example uses TripleDifference(cluster="unit").fit(..., time="post"), or explicitly state that unclustered direct fit is point-estimate-only and not valid for panel inference/power.
  2. Add a Note: to docs/methodology/REGISTRY.md or equivalent methodology-facing docs documenting that this DGP generates repeated unit rows with serial correlation and therefore requires unit clustering when used with TripleDifference.
  3. Add a regression test that exercises the clustered panel path on generate_ddd_panel_data() output.

R3 review flagged that `generate_ddd_panel_data` advertises direct
`TripleDifference().fit(..., time="post")` usage, but `TripleDifference`
is the repeated-cross-section `panel=FALSE` estimator whose default
analytical SE treats each row as iid (df = n_obs - 8). With the new
panel-shaped output (unit FE + within-unit serial correlation),
unclustered SE understates variance and overstates power.

Updates:
- DGP docstring gains a `.. warning::` block stating the panel + cluster
  requirement; the `Examples` block now demonstrates
  `TripleDifference(cluster="unit").fit(...)` as the recommended pattern.
- REGISTRY.md `## TripleDifference` SE section gains a "Note (panel-shaped
  input, `generate_ddd_panel_data`)" paragraph documenting the
  repeated-cross-section semantics and the cluster contract.
- CHANGELOG `[Unreleased]` entry tightened to mention the cluster
  requirement explicitly.
- New test `test_recommended_clustered_panel_path` locks the documented
  pattern: clustered fit succeeds, `n_clusters == n_units`, ATT point
  estimate is invariant to clustering, but SE differs between clustered
  and unclustered fits (within-unit correlation is non-trivial).

Point estimate semantics unchanged. Fix is documentation + invariance
test only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f564ce902ee82456c32d10a2cbf45d05bac4eb8c


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings remain in the changed code. The prior rerun P1 is addressed: generate_ddd_panel_data() now documents the required clustered-by-unit inference path, the TripleDifference REGISTRY entry records that panel-shaped-input contract, and a regression test locks the clustered path (diff_diff/prep_dgp.py:L1181-L1192, docs/methodology/REGISTRY.md:L1750-L1758, tests/test_prep.py:L1290-L1332).

Executive Summary

  • Previous P1 resolved: the new panel DDD DGP now explicitly requires TripleDifference(cluster="unit"), and that contract is mirrored in REGISTRY and tests (diff_diff/prep_dgp.py:L1181-L1192, docs/methodology/REGISTRY.md:L1750-L1758, tests/test_prep.py:L1290-L1332).
  • EfficientDiD’s last_cohort + anticipation documentation now matches the implemented t >= last_g - anticipation trim, with a direct regression guard (diff_diff/efficient_did.py:L163-L180, diff_diff/efficient_did.py:L463-L479, docs/methodology/REGISTRY.md:L875-L880, docs/methodology/REGISTRY.md:L934-L939, tests/test_efficient_did.py:L1128-L1159).
  • The new DDD panel generator now fails fast on empty (group, partition) cell allocations, and the tests cover both infeasible and smallest-feasible cases (diff_diff/prep_dgp.py:L1294-L1321, tests/test_prep.py:L1237-L1288).
  • The TROP _setup_trop_data() extraction looks behavior-preserving: shared preprocessing moved into one helper, while the global-only staggered-adoption validation remains in _fit_global() and the helper contract is regression-tested (diff_diff/trop_local.py:L74-L194, diff_diff/trop.py:L472-L486, diff_diff/trop_global.py:L587-L626, tests/test_trop.py:L4149-L4190).
  • Validation note: this was a static review only; I could not execute the touched tests because pytest is unavailable in this environment.

Methodology

  • No findings. Methods touched here are EfficientDiD documentation/REGISTRY, TripleDifference inference-contract documentation via generate_ddd_panel_data, and a non-methodological TROP refactor. I did not find any undocumented deviation from REGISTRY, missing assumption check, or incorrect variance/SE path in the changed code.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3-informational. Impact: the TROP data-setup dedup is a net maintainability improvement, and the remaining bootstrap-loop duplication is explicitly tracked rather than silently deferred (TODO.md:L87-L87). Concrete fix: none required for this PR; follow the tracked bootstrap-helper extraction separately.

Tech Debt

  • Severity: P3-informational. Impact: simulate_power still does not auto-route TripleDifference to generate_ddd_panel_data when n_periods > 2, but that limitation is explicitly disclosed and tracked (CHANGELOG.md:L11-L11, TODO.md:L88-L88). Concrete fix: none required for this PR; implement the tracked estimator-profile routing follow-up separately.

Security

  • No findings.

Documentation/Tests

  • No findings. Documentation and tests now directly cover the two methodology-sensitive changes in this PR: the clustered DDD panel path and the EfficientDiD last_g - anticipation trim (tests/test_prep.py:L1290-L1332, tests/test_efficient_did.py:L1128-L1159).
  • Validation note: I could not run the touched tests locally because pytest is not installed in this environment.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
@igerber igerber merged commit 9aedd33 into main May 16, 2026
33 of 34 checks passed
@igerber igerber deleted the wave-4-tier-a-drain branch May 16, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant