Refactor: tech debt paydown - module split, warning fixes, deprecation updates by igerber · Pull Request #95 · igerber/diff-diff

igerber · 2026-01-21T12:45:01Z

Summary

Split prep.py (1,993 lines) into prep.py (1,241 lines) and prep_dgp.py (777 lines) for better maintainability
Updated deprecation warning for bootstrap_weight_type from "v2.0" to "v3.0"
Added np.errstate context managers in staggered_aggregation.py and staggered_bootstrap.py to suppress RuntimeWarnings during weight influence function computation
Fixed 6 Rust Clippy warnings: type complexity annotation in linalg.rs, too_many_arguments in trop.rs, needless range loop in weights.rs
Updated README.md and tutorial 02_staggered_did.ipynb to use bootstrap_weights instead of deprecated bootstrap_weight_type
Updated TODO.md and CLAUDE.md documentation to reflect changes

Methodology references (required if estimator / math changes)

Method name(s): N/A - no methodology changes (only code organization and warnings)
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes (all 894 Python tests pass, 52 Rust backend tests pass)
Backtest / simulation / notebook evidence (if applicable): Tutorial 02_staggered_did.ipynb updated to use new parameter name
Additional verification: cargo clippy reports 0 warnings

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…n updates - Split prep.py into prep.py (core utilities) and prep_dgp.py (data generation) - Update deprecation warning for bootstrap_weight_type to v3.0 - Add np.errstate wrappers to suppress RuntimeWarnings in staggered modules - Fix Rust Clippy warnings (type complexity, too_many_args, needless range loop) - Update README and tutorial to use bootstrap_weights parameter - Update TODO.md and CLAUDE.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-21T12:54:29Z

Overall assessment: ⚠️ Needs changes

Methodology

Severity: P3 | Impact: CallawaySantAnna IF-based SEs and multiplier bootstrap logic appear unchanged and remain aligned with docs/methodology/REGISTRY.md; no estimator/weighting deviations observed. | Concrete fix: None. (diff_diff/staggered_aggregation.py:275, diff_diff/staggered_bootstrap.py:352)

Code Quality

Severity: P2 | Impact: np.errstate suppresses overflow/invalid warnings during IF aggregation and bootstrap perturbations, which can silently propagate non-finite SEs/CIs in edge cases (small samples/extreme weights). | Concrete fix: After each errstate-protected matmul, add explicit np.isfinite checks and warn/raise with context before using results. (diff_diff/staggered_aggregation.py:285, diff_diff/staggered_bootstrap.py:376)

Performance

Severity: P3 | Impact: No meaningful performance regression expected from the added np.errstate contexts. | Concrete fix: None.

Maintainability

Severity: P3 | Impact: No maintainability regressions from the prep module split; re-exports preserve compatibility. | Concrete fix: None. (diff_diff/prep.py:19, diff_diff/prep_dgp.py:1)

Tech Debt

Severity: P3 | Impact: No new tech-debt concerns identified in this diff. | Concrete fix: None.

Security

Severity: P3 | Impact: No security-sensitive changes or secrets observed. | Concrete fix: None.

Documentation/Tests

Severity: P3 | Impact: No tests added or run for the new warning-suppression behavior in CallawaySantAnna inference; potential gap if non-finite outputs occur. | Concrete fix: Add a regression test that forces extreme weights and asserts a warning/error or finite outputs. (diff_diff/staggered_bootstrap.py:376)

Open Questions

Should warning suppression be paired with explicit non-finite checks (or logging) so invalid inference is still observable to users?

Executive Summary

P2: Suppressed RuntimeWarnings can hide non-finite IF/bootstrap results; add explicit finite checks after matmul operations.
CallawaySantAnna estimator/weighting formulas appear unchanged and aligned with docs/methodology/REGISTRY.md.
DGP functions moved to prep_dgp.py with re-exports; no API regression observed.
No tests added or run to cover the new warning-suppression behavior.

…ions Address PR #95 feedback: add explicit np.isfinite checks after matmul operations to detect and warn on non-finite SEs/CIs from edge cases (small samples, extreme weights). Changes: - staggered_aggregation.py: Check wif_contrib after aggregation matmul - staggered_bootstrap.py: Add _check_and_fix_nonfinite helper method - staggered_bootstrap.py: Check results after 4 bootstrap matmul operations - test_staggered.py: Add test_extreme_weights_warning regression test Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-21T13:03:23Z

/ai-review

github-actions · 2026-01-21T13:07:38Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 854e2a89d0fcccadb1ae64b540c14c0a481e9205

Overall Assessment: ⚠️ Needs changes

Executive Summary

CallawaySantAnna analytic SE now zeros non‑finite weight influence contributions, which deviates from the Methodology Registry/paper and can understate uncertainty.
Multiplier bootstrap now zeros non‑finite perturbations/aggregations, altering the bootstrap distribution and inference in edge cases.
Added “extreme weights” test doesn’t assert the new warning behavior, so regressions could slip through.
No other issues found in the refactor or documentation updates.

Methodology

P1 | Impact: The analytic influence‑function SE for CallawaySantAnna is no longer faithful to the registry/paper because non‑finite WIF contributions are zeroed, which can bias SEs/p‑values downward instead of signaling invalid inference. | Fix: Align with the Callaway‑Sant’Anna IF formula by propagating NaNs or raising when WIF terms are non‑finite; alternatively, drop invalid groups/units with explicit documentation and counters. | Ref: diff_diff/staggered_aggregation.py:L259-L300
P1 | Impact: The multiplier bootstrap distribution is altered by replacing non‑finite perturbations/aggregations with zeros, which can bias bootstrap SEs/CIs/p‑values and deviates from the Methodology Registry’s multiplier bootstrap description. | Fix: Re‑draw or drop invalid bootstrap draws (and compute stats on the remaining finite draws with warnings), or surface an error when non‑finite values occur. | Ref: diff_diff/staggered_bootstrap.py:L352-L432, diff_diff/staggered_bootstrap.py:L668-L693

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 | Impact: The new “extreme weights” path is not validated for warning emission, so the intended behavior could regress without test failures. | Fix: Use pytest.warns(RuntimeWarning) (or message match) in the test to assert the warning is raised when non‑finite values occur. | Ref: tests/test_staggered.py:L684-L735

Address PR #95 reviewer feedback: - Analytic SE: return NaN instead of zeroing to signal invalid inference - Bootstrap: drop invalid samples and warn, preserving valid distribution - Update test to verify methodology-aligned behavior (finite or NaN, not biased) Per docs/methodology/REGISTRY.md: "Missing group-time cells: ATT(g,t) set to NaN" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-21T13:26:16Z

/ai-review

github-actions · 2026-01-21T13:35:52Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f3c9f059a1425d9a202907062d41ba65dcfcc104

Methodology

Severity: P1; Impact: CallawaySantAnna analytical SE now returns NaN when wif_contrib is non-finite, changing variance behavior without a corresponding rule in the Methodology Registry or cited references, which risks undocumented inference changes in edge cases; Concrete fix: verify expected handling against Callaway & Sant’Anna (2021) / did::att_gt, then either align the implementation or document the NaN-return policy in docs/methodology/REGISTRY.md and estimator docs. diff_diff/staggered_aggregation.py:L282-L299 docs/methodology/REGISTRY.md
Severity: P1; Impact: Bootstrap inference drops non-finite draws and uses the original n_bootstrap for the minimum p-value, which can understate p-values when draws are filtered and is undocumented relative to the registry’s bootstrap description; Concrete fix: track n_valid after filtering, use 1/(n_valid+1) in _compute_bootstrap_pvalue, guard for n_valid < 2, and document the drop/NaN policy in docs/methodology/REGISTRY.md. diff_diff/staggered_bootstrap.py:L604-L695 docs/methodology/REGISTRY.md

Code Quality

Severity: P3; Impact: _mask_nonfinite_samples is unused, adding dead code and a second implementation of warning logic; Concrete fix: remove it or refactor _compute_effect_bootstrap_stats to reuse it. diff_diff/staggered_bootstrap.py:L697-L723

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

Severity: P3; Impact: The new test docstring claims behavior is specified in the Methodology Registry, but the registry doesn’t mention NaN SE or bootstrap sample dropping, which misleads reviewers and future contributors; Concrete fix: update docs/methodology/REGISTRY.md to reflect the new edge-case inference policy or revise the docstring to match the documented sources. tests/test_staggered.py:L684-L691 docs/methodology/REGISTRY.md

Open Questions

Should the non-finite inference handling be explicitly aligned to did::att_gt defaults, and if so, what is the desired policy for SE/CI/p-value when draws are dropped?

Overall assessment: ⚠️ Needs changes

Executive Summary

CallawaySantAnna inference now returns NaN for non-finite WIF contributions, but this policy is not documented in the Methodology Registry.
Bootstrap stats drop non-finite draws; p-value flooring still uses original n_bootstrap, which can bias p-values when draws are removed.
Test docstring asserts registry guidance that doesn’t exist yet and should be reconciled with documentation.
No security issues found.
Tests not run in this review.

…licy - Fix bootstrap p-value floor to use valid sample count (not n_bootstrap) - Remove unused _mask_nonfinite_samples method (dead code) - Document non-finite inference handling in Methodology Registry - Fix misleading test docstring that claimed undocumented behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-21T13:49:20Z

/ai-review

github-actions · 2026-01-21T13:57:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ba0eee613d51cd7384c5b06718aa0bbdd1b507fe

Overall assessment: ⚠️ Needs changes

Executive summary:

CallawaySantAnna inference now drops non‑finite bootstrap samples and can return NaN SEs; registry is updated but there’s no explicit validation against did::att_gt/csdid.
WIF warning suppression only wraps the matmul; divide‑by‑zero/invalid ops can still occur earlier in the WIF matrix construction.
Bootstrap p‑value logic is duplicated inline, which risks future drift from _compute_bootstrap_pvalue.
New edge‑case test doesn’t assert the warning/threshold behavior it describes.

Methodology

Severity: P2; Impact: CallawaySantAnna inference behavior changes (drop non‑finite bootstrap draws, NaN SE when <50% valid) alter default variance/SE handling and could diverge from did::att_gt/csdid without explicit validation; Concrete fix: add a short citation/justification in the registry or a regression test comparing edge‑case inference with the reference implementations, and mark as an intentional deviation if results differ. Location: diff_diff/staggered_aggregation.py:L275-L299, diff_diff/staggered_bootstrap.py:L631-L704, docs/methodology/REGISTRY.md:L199-L210.

Code Quality

Severity: P2; Impact: np.errstate only wraps the matmul; divide‑by‑zero/invalid operations can still trigger warnings and create infs during if1_matrix/if2_matrix construction, which defeats the intended suppression and may leak non‑finite values earlier; Concrete fix: guard sum_pg_keepers before division or wrap the entire WIF matrix computation in np.errstate. Location: diff_diff/staggered_aggregation.py:L275-L286.

Performance

No findings.

Maintainability

Severity: P3; Impact: p‑value logic is duplicated inline in _compute_effect_bootstrap_stats, increasing drift risk from _compute_bootstrap_pvalue; Concrete fix: refactor _compute_bootstrap_pvalue to accept an optional n_bootstrap_override (or pass the valid count) and reuse it from _compute_effect_bootstrap_stats. Location: diff_diff/staggered_bootstrap.py:L604-L704.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3; Impact: the new edge‑case test doesn’t assert that warnings are raised or that the <50% validity threshold triggers NaNs, so the intended behavior can regress silently; Concrete fix: assert at least one warning in the bootstrap path and add a small fixture that forces <50% valid samples to verify NaN outputs. Location: tests/test_staggered.py:L684-L749.

… tests - Refactor _compute_bootstrap_pvalue to accept n_valid parameter, eliminating duplicated p-value logic in _compute_effect_bootstrap_stats - Extend np.errstate coverage in staggered_aggregation.py to wrap all WIF division operations (not just matrix multiplication) - Add deviation note to Methodology Registry documenting defensive enhancement over R/Stata reference implementations - Strengthen test assertions: verify warnings are captured and NaN SE is accompanied by validity warnings - Add test_validity_threshold_nan_se for edge case coverage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-21T14:14:01Z

/ai-review

github-actions · 2026-01-21T14:23:18Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 903edc781ae8d73dde3a1d9a7b41660a7beead31

Methodology

No findings. Affected method: CallawaySantAnna inference (analytic WIF SE + bootstrap). Behavior matches the updated registry note on non-finite handling and the documented defensive deviation from reference implementations (docs/methodology/REGISTRY.md:L200-L211, diff_diff/staggered_aggregation.py:L259-L311, diff_diff/staggered_bootstrap.py:L604-L716).

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 Impact: The new "<50% valid bootstrap samples => NaN" path is not deterministically exercised; current tests accept either finite or NaN outcomes, so regressions in the threshold logic could pass unnoticed (tests/test_staggered.py:L684-L813, diff_diff/staggered_bootstrap.py:L649-L705). Fix: add a deterministic unit test that feeds a synthetic boot_dist with controlled np.nan/np.inf to force the threshold path and assert the warning plus NaN outputs.

Open Questions

None.

Overall Assessment

✅ Looks good

Executive Summary

CallawaySantAnna inference updates (WIF SE + bootstrap non-finite handling) align with the methodology registry update; no methodology mismatches found.
No code quality, performance, maintainability, or security issues identified in the diff.
P3 test gap: the <50% valid bootstrap-sample threshold is not deterministically exercised.

Integrate Methodology Registry into development workflow: - Add docs/methodology/REGISTRY.md to documentation section - Add new "Implementing Methodology-Critical Code" checklist - Strengthen "Adding Warning/Error/Fallback Handling" checklist - Update bug fix requirements to include methodology updates These changes ensure PR #95 issues are prevented by requiring: - Consultation of REGISTRY.md before methodology changes - Documentation of deviations from reference implementations - Behavioral tests that assert warnings AND outcomes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber merged commit 1eea78b into main Jan 21, 2026
4 checks passed

igerber deleted the refactor/tech-debt-paydown branch January 21, 2026 14:51

igerber mentioned this pull request Jan 21, 2026

Add methodology workflow checklist to CLAUDE.md #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: tech debt paydown - module split, warning fixes, deprecation updates#95

Refactor: tech debt paydown - module split, warning fixes, deprecation updates#95
igerber merged 5 commits into
mainfrom
refactor/tech-debt-paydown

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jan 21, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant