Skip to content

Refactor: tech debt paydown - module split, warning fixes, deprecation updates#95

Merged
igerber merged 5 commits into
mainfrom
refactor/tech-debt-paydown
Jan 21, 2026
Merged

Refactor: tech debt paydown - module split, warning fixes, deprecation updates#95
igerber merged 5 commits into
mainfrom
refactor/tech-debt-paydown

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 21, 2026

Summary

  • Split prep.py (1,993 lines) into prep.py (1,241 lines) and prep_dgp.py (777 lines) for better maintainability
  • Updated deprecation warning for bootstrap_weight_type from "v2.0" to "v3.0"
  • Added np.errstate context managers in staggered_aggregation.py and staggered_bootstrap.py to suppress RuntimeWarnings during weight influence function computation
  • Fixed 6 Rust Clippy warnings: type complexity annotation in linalg.rs, too_many_arguments in trop.rs, needless range loop in weights.rs
  • Updated README.md and tutorial 02_staggered_did.ipynb to use bootstrap_weights instead of deprecated bootstrap_weight_type
  • Updated TODO.md and CLAUDE.md documentation to reflect changes

Methodology references (required if estimator / math changes)

  • Method name(s): N/A - no methodology changes (only code organization and warnings)
  • Paper / source link(s): N/A
  • Any intentional deviations from the source (and why): None

Validation

  • Tests added/updated: No test changes (all 894 Python tests pass, 52 Rust backend tests pass)
  • Backtest / simulation / notebook evidence (if applicable): Tutorial 02_staggered_did.ipynb updated to use new parameter name
  • Additional verification: cargo clippy reports 0 warnings

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…n updates

- Split prep.py into prep.py (core utilities) and prep_dgp.py (data generation)
- Update deprecation warning for bootstrap_weight_type to v3.0
- Add np.errstate wrappers to suppress RuntimeWarnings in staggered modules
- Fix Rust Clippy warnings (type complexity, too_many_args, needless range loop)
- Update README and tutorial to use bootstrap_weights parameter
- Update TODO.md and CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment: ⚠️ Needs changes

Methodology

  • Severity: P3 | Impact: CallawaySantAnna IF-based SEs and multiplier bootstrap logic appear unchanged and remain aligned with docs/methodology/REGISTRY.md; no estimator/weighting deviations observed. | Concrete fix: None. (diff_diff/staggered_aggregation.py:275, diff_diff/staggered_bootstrap.py:352)

Code Quality

  • Severity: P2 | Impact: np.errstate suppresses overflow/invalid warnings during IF aggregation and bootstrap perturbations, which can silently propagate non-finite SEs/CIs in edge cases (small samples/extreme weights). | Concrete fix: After each errstate-protected matmul, add explicit np.isfinite checks and warn/raise with context before using results. (diff_diff/staggered_aggregation.py:285, diff_diff/staggered_bootstrap.py:376)

Performance

  • Severity: P3 | Impact: No meaningful performance regression expected from the added np.errstate contexts. | Concrete fix: None.

Maintainability

  • Severity: P3 | Impact: No maintainability regressions from the prep module split; re-exports preserve compatibility. | Concrete fix: None. (diff_diff/prep.py:19, diff_diff/prep_dgp.py:1)

Tech Debt

  • Severity: P3 | Impact: No new tech-debt concerns identified in this diff. | Concrete fix: None.

Security

  • Severity: P3 | Impact: No security-sensitive changes or secrets observed. | Concrete fix: None.

Documentation/Tests

  • Severity: P3 | Impact: No tests added or run for the new warning-suppression behavior in CallawaySantAnna inference; potential gap if non-finite outputs occur. | Concrete fix: Add a regression test that forces extreme weights and asserts a warning/error or finite outputs. (diff_diff/staggered_bootstrap.py:376)

Open Questions

  • Should warning suppression be paired with explicit non-finite checks (or logging) so invalid inference is still observable to users?

Executive Summary

  • P2: Suppressed RuntimeWarnings can hide non-finite IF/bootstrap results; add explicit finite checks after matmul operations.
  • CallawaySantAnna estimator/weighting formulas appear unchanged and aligned with docs/methodology/REGISTRY.md.
  • DGP functions moved to prep_dgp.py with re-exports; no API regression observed.
  • No tests added or run to cover the new warning-suppression behavior.

…ions

Address PR #95 feedback: add explicit np.isfinite checks after matmul
operations to detect and warn on non-finite SEs/CIs from edge cases
(small samples, extreme weights).

Changes:
- staggered_aggregation.py: Check wif_contrib after aggregation matmul
- staggered_bootstrap.py: Add _check_and_fix_nonfinite helper method
- staggered_bootstrap.py: Check results after 4 bootstrap matmul operations
- test_staggered.py: Add test_extreme_weights_warning regression test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 21, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 854e2a89d0fcccadb1ae64b540c14c0a481e9205


Overall Assessment: ⚠️ Needs changes

Executive Summary

  • CallawaySantAnna analytic SE now zeros non‑finite weight influence contributions, which deviates from the Methodology Registry/paper and can understate uncertainty.
  • Multiplier bootstrap now zeros non‑finite perturbations/aggregations, altering the bootstrap distribution and inference in edge cases.
  • Added “extreme weights” test doesn’t assert the new warning behavior, so regressions could slip through.
  • No other issues found in the refactor or documentation updates.

Methodology

  • P1 | Impact: The analytic influence‑function SE for CallawaySantAnna is no longer faithful to the registry/paper because non‑finite WIF contributions are zeroed, which can bias SEs/p‑values downward instead of signaling invalid inference. | Fix: Align with the Callaway‑Sant’Anna IF formula by propagating NaNs or raising when WIF terms are non‑finite; alternatively, drop invalid groups/units with explicit documentation and counters. | Ref: diff_diff/staggered_aggregation.py:L259-L300
  • P1 | Impact: The multiplier bootstrap distribution is altered by replacing non‑finite perturbations/aggregations with zeros, which can bias bootstrap SEs/CIs/p‑values and deviates from the Methodology Registry’s multiplier bootstrap description. | Fix: Re‑draw or drop invalid bootstrap draws (and compute stats on the remaining finite draws with warnings), or surface an error when non‑finite values occur. | Ref: diff_diff/staggered_bootstrap.py:L352-L432, diff_diff/staggered_bootstrap.py:L668-L693

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 | Impact: The new “extreme weights” path is not validated for warning emission, so the intended behavior could regress without test failures. | Fix: Use pytest.warns(RuntimeWarning) (or message match) in the test to assert the warning is raised when non‑finite values occur. | Ref: tests/test_staggered.py:L684-L735

Address PR #95 reviewer feedback:
- Analytic SE: return NaN instead of zeroing to signal invalid inference
- Bootstrap: drop invalid samples and warn, preserving valid distribution
- Update test to verify methodology-aligned behavior (finite or NaN, not biased)

Per docs/methodology/REGISTRY.md: "Missing group-time cells: ATT(g,t) set to NaN"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 21, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f3c9f059a1425d9a202907062d41ba65dcfcc104


Methodology

  • Severity: P1; Impact: CallawaySantAnna analytical SE now returns NaN when wif_contrib is non-finite, changing variance behavior without a corresponding rule in the Methodology Registry or cited references, which risks undocumented inference changes in edge cases; Concrete fix: verify expected handling against Callaway & Sant’Anna (2021) / did::att_gt, then either align the implementation or document the NaN-return policy in docs/methodology/REGISTRY.md and estimator docs. diff_diff/staggered_aggregation.py:L282-L299 docs/methodology/REGISTRY.md
  • Severity: P1; Impact: Bootstrap inference drops non-finite draws and uses the original n_bootstrap for the minimum p-value, which can understate p-values when draws are filtered and is undocumented relative to the registry’s bootstrap description; Concrete fix: track n_valid after filtering, use 1/(n_valid+1) in _compute_bootstrap_pvalue, guard for n_valid < 2, and document the drop/NaN policy in docs/methodology/REGISTRY.md. diff_diff/staggered_bootstrap.py:L604-L695 docs/methodology/REGISTRY.md

Code Quality

  • Severity: P3; Impact: _mask_nonfinite_samples is unused, adding dead code and a second implementation of warning logic; Concrete fix: remove it or refactor _compute_effect_bootstrap_stats to reuse it. diff_diff/staggered_bootstrap.py:L697-L723

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P3; Impact: The new test docstring claims behavior is specified in the Methodology Registry, but the registry doesn’t mention NaN SE or bootstrap sample dropping, which misleads reviewers and future contributors; Concrete fix: update docs/methodology/REGISTRY.md to reflect the new edge-case inference policy or revise the docstring to match the documented sources. tests/test_staggered.py:L684-L691 docs/methodology/REGISTRY.md

Open Questions

  • Should the non-finite inference handling be explicitly aligned to did::att_gt defaults, and if so, what is the desired policy for SE/CI/p-value when draws are dropped?

Overall assessment: ⚠️ Needs changes

Executive Summary

  • CallawaySantAnna inference now returns NaN for non-finite WIF contributions, but this policy is not documented in the Methodology Registry.
  • Bootstrap stats drop non-finite draws; p-value flooring still uses original n_bootstrap, which can bias p-values when draws are removed.
  • Test docstring asserts registry guidance that doesn’t exist yet and should be reconciled with documentation.
  • No security issues found.
  • Tests not run in this review.

…licy

- Fix bootstrap p-value floor to use valid sample count (not n_bootstrap)
- Remove unused _mask_nonfinite_samples method (dead code)
- Document non-finite inference handling in Methodology Registry
- Fix misleading test docstring that claimed undocumented behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 21, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ba0eee613d51cd7384c5b06718aa0bbdd1b507fe


Overall assessment: ⚠️ Needs changes

Executive summary:

  • CallawaySantAnna inference now drops non‑finite bootstrap samples and can return NaN SEs; registry is updated but there’s no explicit validation against did::att_gt/csdid.
  • WIF warning suppression only wraps the matmul; divide‑by‑zero/invalid ops can still occur earlier in the WIF matrix construction.
  • Bootstrap p‑value logic is duplicated inline, which risks future drift from _compute_bootstrap_pvalue.
  • New edge‑case test doesn’t assert the warning/threshold behavior it describes.

Methodology

  • Severity: P2; Impact: CallawaySantAnna inference behavior changes (drop non‑finite bootstrap draws, NaN SE when <50% valid) alter default variance/SE handling and could diverge from did::att_gt/csdid without explicit validation; Concrete fix: add a short citation/justification in the registry or a regression test comparing edge‑case inference with the reference implementations, and mark as an intentional deviation if results differ. Location: diff_diff/staggered_aggregation.py:L275-L299, diff_diff/staggered_bootstrap.py:L631-L704, docs/methodology/REGISTRY.md:L199-L210.

Code Quality

  • Severity: P2; Impact: np.errstate only wraps the matmul; divide‑by‑zero/invalid operations can still trigger warnings and create infs during if1_matrix/if2_matrix construction, which defeats the intended suppression and may leak non‑finite values earlier; Concrete fix: guard sum_pg_keepers before division or wrap the entire WIF matrix computation in np.errstate. Location: diff_diff/staggered_aggregation.py:L275-L286.

Performance

  • No findings.

Maintainability

  • Severity: P3; Impact: p‑value logic is duplicated inline in _compute_effect_bootstrap_stats, increasing drift risk from _compute_bootstrap_pvalue; Concrete fix: refactor _compute_bootstrap_pvalue to accept an optional n_bootstrap_override (or pass the valid count) and reuse it from _compute_effect_bootstrap_stats. Location: diff_diff/staggered_bootstrap.py:L604-L704.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P3; Impact: the new edge‑case test doesn’t assert that warnings are raised or that the <50% validity threshold triggers NaNs, so the intended behavior can regress silently; Concrete fix: assert at least one warning in the bootstrap path and add a small fixture that forces <50% valid samples to verify NaN outputs. Location: tests/test_staggered.py:L684-L749.

… tests

- Refactor _compute_bootstrap_pvalue to accept n_valid parameter,
  eliminating duplicated p-value logic in _compute_effect_bootstrap_stats
- Extend np.errstate coverage in staggered_aggregation.py to wrap all
  WIF division operations (not just matrix multiplication)
- Add deviation note to Methodology Registry documenting defensive
  enhancement over R/Stata reference implementations
- Strengthen test assertions: verify warnings are captured and NaN SE
  is accompanied by validity warnings
- Add test_validity_threshold_nan_se for edge case coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 21, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 903edc781ae8d73dde3a1d9a7b41660a7beead31


Methodology

  • No findings. Affected method: CallawaySantAnna inference (analytic WIF SE + bootstrap). Behavior matches the updated registry note on non-finite handling and the documented defensive deviation from reference implementations (docs/methodology/REGISTRY.md:L200-L211, diff_diff/staggered_aggregation.py:L259-L311, diff_diff/staggered_bootstrap.py:L604-L716).

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 Impact: The new "<50% valid bootstrap samples => NaN" path is not deterministically exercised; current tests accept either finite or NaN outcomes, so regressions in the threshold logic could pass unnoticed (tests/test_staggered.py:L684-L813, diff_diff/staggered_bootstrap.py:L649-L705). Fix: add a deterministic unit test that feeds a synthetic boot_dist with controlled np.nan/np.inf to force the threshold path and assert the warning plus NaN outputs.

Open Questions

  • None.

Overall Assessment

  • ✅ Looks good

Executive Summary

  • CallawaySantAnna inference updates (WIF SE + bootstrap non-finite handling) align with the methodology registry update; no methodology mismatches found.
  • No code quality, performance, maintainability, or security issues identified in the diff.
  • P3 test gap: the <50% valid bootstrap-sample threshold is not deterministically exercised.

@igerber igerber merged commit 1eea78b into main Jan 21, 2026
4 checks passed
@igerber igerber deleted the refactor/tech-debt-paydown branch January 21, 2026 14:51
igerber added a commit that referenced this pull request Jan 21, 2026
Integrate Methodology Registry into development workflow:
- Add docs/methodology/REGISTRY.md to documentation section
- Add new "Implementing Methodology-Critical Code" checklist
- Strengthen "Adding Warning/Error/Fallback Handling" checklist
- Update bug fix requirements to include methodology updates

These changes ensure PR #95 issues are prevented by requiring:
- Consultation of REGISTRY.md before methodology changes
- Documentation of deviations from reference implementations
- Behavioral tests that assert warnings AND outcomes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant