diff --git a/.claude/commands/pre-merge-check.md b/.claude/commands/pre-merge-check.md new file mode 100644 index 00000000..8fc94f8c --- /dev/null +++ b/.claude/commands/pre-merge-check.md @@ -0,0 +1,174 @@ +--- +description: Run pre-merge checks before submitting a PR +argument-hint: "" +--- + +# Pre-Merge Check + +Run automated checks and display the pre-merge checklist before submitting a PR. + +## Instructions + +### 1. Identify Changed Files + +Get the list of files that will be included in the PR: + +```bash +# Get all changed files (tracked modifications + staged + untracked) +git diff --name-only HEAD +git diff --cached --name-only +git ls-files --others --exclude-standard +``` + +Categorize files into: +- **Methodology files**: `diff_diff/*.py` (excluding `__init__.py`) +- **Test files**: `tests/*.py` +- **Documentation files**: `*.md`, `*.rst`, `docs/**` + +### 2. Run Automated Pattern Checks + +#### 2.1 NaN Handling Pattern Check (for methodology files) + +If any methodology files changed, run these pattern checks: + +```bash +# Check for bad t-stat pattern (should use np.nan, not 0.0 or 0) +grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py || true + +# Check for potential division issues without NaN handling +grep -E -n "/[[:space:]]*se\>" diff_diff/*.py | grep -v "np.nan" | grep -v "#" || true +``` + +**Report findings**: If matches found, warn about potential NaN handling issues. + +#### 2.2 Test Existence Check + +For each changed methodology file, check if corresponding test file was also changed: + +| Methodology File | Expected Test File | +|------------------|-------------------| +| `diff_diff/staggered.py` | `tests/test_staggered.py` | +| `diff_diff/estimators.py` | `tests/test_estimators.py` | +| `diff_diff/twfe.py` | `tests/test_estimators.py` | +| `diff_diff/synthetic_did.py` | `tests/test_estimators.py` | +| `diff_diff/sun_abraham.py` | `tests/test_sun_abraham.py` | +| `diff_diff/triple_diff.py` | `tests/test_triple_diff.py` | +| `diff_diff/trop.py` | `tests/test_trop.py` | +| `diff_diff/bacon.py` | `tests/test_bacon.py` | +| `diff_diff/linalg.py` | `tests/test_linalg.py` | +| `diff_diff/utils.py` | `tests/test_utils.py` | +| `diff_diff/diagnostics.py` | `tests/test_diagnostics.py` | +| `diff_diff/prep.py` | `tests/test_prep.py` | +| `diff_diff/visualization.py` | `tests/test_visualization.py` | +| `diff_diff/honest_did.py` | `tests/test_honest_did.py` | +| `diff_diff/power.py` | `tests/test_power.py` | +| `diff_diff/pretrends.py` | `tests/test_pretrends.py` | + +**Report**: List any methodology files without corresponding test changes. + +#### 2.3 Docstring Check (heuristic) + +For changed `.py` files, run a quick check for functions without docstrings: +```bash +# Find public function definitions (heuristic check) +grep -n "^def [^_]" | head -10 +grep -n "^ def [^_]" | head -10 +``` + +Note: This is a heuristic. Verify docstrings exist for new public functions. + +### 3. Display Context-Specific Checklist + +Based on what changed, display the appropriate checklist items: + +#### Always Show (Core Checklist) +``` +## Pre-Merge Checklist + +Based on your changes to: + +### Behavioral Completeness +- [ ] Happy path tested +- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions) +- [ ] Error/warning paths tested with behavioral assertions +``` + +#### If Methodology Files Changed +``` +### Inference Field Consistency +- [ ] If SE can be 0/undefined, ALL inference fields (t-stat, p-value, CI) return NaN +- [ ] Aggregation methods propagate NaN correctly +- [ ] Bootstrap methods handle NaN in base estimates + +### Control Group Logic (if adding new modes/code paths) +- [ ] Control group composition verified for new code paths +- [ ] "Not-yet-treated" excludes the treatment cohort itself +- [ ] Parameter interactions tested with all aggregation methods +``` + +#### If Documentation Files Changed +``` +### Documentation Sync +- [ ] Docstrings updated for changed function signatures +- [ ] README updated if user-facing behavior changes +- [ ] REGISTRY.md updated if methodology edge cases change +``` + +#### If This Appears to Be a Bug Fix +``` +### Pattern Consistency (Bug Fix) +- [ ] Grepped for similar patterns across codebase before fixing +- [ ] Fixed ALL occurrences, not just the one that was reported +- [ ] Verified fix doesn't break other code paths +``` + +### 4. Ask About Running Tests + +Use AskUserQuestion to offer test options: + +``` +Would you like to run tests now? + +Options: +1. Yes - run full test suite (pytest) +2. Yes - run only tests for changed files +3. No - skip tests for now +``` + +If user chooses option 1: +```bash +pytest +``` + +If user chooses option 2, run targeted tests based on changed files: +```bash +pytest tests/test_staggered.py tests/test_estimators.py # (example) +``` + +### 5. Report Summary + +``` +## Pre-Merge Check Complete + +### Automated Checks +- Pattern checks: [PASS/WARN - N potential issues found] +- Test coverage: [PASS/WARN - N methodology files without test changes] + +### Manual Checklist +Review the checklist items above before running /submit-pr + +### Findings to Address + + +### Next Steps +- Address any warnings above +- Complete manual checklist items +- When ready: /submit-pr "Your PR title" +``` + +## Notes + +- This skill is read-only - it only analyzes and reports, doesn't modify files +- Run this BEFORE `/submit-pr` to catch issues early +- Pattern checks are heuristics - review flagged items manually to confirm +- If pattern checks find issues, fix them before submitting diff --git a/.github/codex/prompts/pr_review.md b/.github/codex/prompts/pr_review.md index a7c0945f..f98ad8e2 100644 --- a/.github/codex/prompts/pr_review.md +++ b/.github/codex/prompts/pr_review.md @@ -8,12 +8,44 @@ TOP PRIORITY: Methodology adherence to source material. 3) Flag any mismatch, missing assumption check, incorrect variance/SE, or undocumented deviation as P0/P1. SECONDARY PRIORITIES (in order): -2) Code quality -3) Performance -4) Maintainability -5) Minimization of tech debt -6) Security (including accidental secrets) -7) Documentation + tests +2) Edge case coverage (see checklist below) +3) Code quality +4) Performance +5) Maintainability +6) Minimization of tech debt +7) Security (including accidental secrets) +8) Documentation + tests + +## Edge Case Review (learned from PR #97 analysis) + +When reviewing new features or code paths, specifically check: + +1. **Empty Result Sets**: + - Does the code handle when filters produce no matching data? + - Example: `base_period="varying"` with no valid pre-treatment periods + - Flag as P1 if new code paths lack empty-data handling + +2. **NaN/Inf Propagation**: + - If SE can be 0 or undefined, are ALL inference fields (t-stat, p-value, CI) set to NaN? + - Search for patterns: `if se > 0 else 0.0` → should be `else np.nan` + - Check ALL occurrences of this pattern in affected files + - Flag as P0 if statistical output could be misleading (e.g., t_stat=0.0 instead of NaN) + +3. **Parameter Interactions**: + - Does new parameter interact correctly with all aggregation methods? + - Does new parameter interact correctly with bootstrap/inference? + - Example: `anticipation` parameter must affect group aggregation filtering + - Flag as P1 if new parameter isn't tested with all existing code paths + +4. **Control/Comparison Group Logic**: + - For new code paths, is the control group defined correctly? + - Example: "not-yet-treated" should exclude the treatment cohort itself + - Flag as P0 if control group composition could bias estimates + +5. **Pattern Consistency**: + - If the PR fixes a pattern bug, verify ALL occurrences were fixed + - Command to check: `grep -n "pattern" diff_diff/*.py` + - Flag as P1 if only partial fixes were made Rules: - Review ONLY the changes introduced by this PR (diff) and the minimum surrounding context needed. diff --git a/CLAUDE.md b/CLAUDE.md index e4f47a00..5ad843c7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -514,3 +514,120 @@ When adding code that emits warnings or handles errors: 4. **Protect arithmetic operations**: - [ ] Wrap ALL related operations in `np.errstate()`, not just the final one - [ ] Include division, matrix multiplication, and any operation that can overflow/underflow + +### Reviewing New Features or Code Paths + +When reviewing PRs that add new features, modes, or code paths (learned from PR #97 analysis): + +1. **Edge Case Coverage**: + - [ ] Empty result sets (no matching data for a filter condition) + - [ ] NaN/Inf propagation through ALL inference fields (SE, t-stat, p-value, CI) + - [ ] Parameter interactions (e.g., new param × existing aggregation methods) + - [ ] Control/comparison group composition for all code paths + +2. **Documentation Completeness**: + - [ ] All new parameters have docstrings with type, default, and description + - [ ] Methodology docs match implementation behavior (equations, edge cases) + - [ ] Edge cases documented in `docs/methodology/REGISTRY.md` + +3. **Logic Audit for New Code Paths**: + - [ ] When adding new modes (like `base_period="varying"`), trace ALL downstream effects + - [ ] Check aggregation methods handle the new mode correctly + - [ ] Check bootstrap/inference methods handle the new mode correctly + - [ ] Explicitly test control group composition in new code paths + +4. **Pattern Consistency**: + - [ ] Search for similar patterns in codebase (e.g., `t_stat = x / se if se > 0 else ...`) + - [ ] Ensure new code follows established patterns or updates ALL instances + - [ ] If fixing a pattern, grep for ALL occurrences first: + ```bash + grep -n "if.*se.*> 0.*else" diff_diff/*.py + ``` + +### Fixing Bugs Across Multiple Locations + +When a bug fix involves a pattern that appears in multiple places (learned from PR #97 analysis): + +1. **Find All Instances First**: + - [ ] Use grep/search to find ALL occurrences of the pattern before fixing + - [ ] Document the locations found (file:line) + - [ ] Example: `t_stat = effect / se if se > 0 else 0.0` appeared in 7 locations + +2. **Fix Comprehensively in One Round**: + - [ ] Fix ALL instances in the same PR/commit + - [ ] Create a test that covers all locations + - [ ] Don't fix incrementally across multiple review rounds + +3. **Regression Test the Fix**: + - [ ] Verify fix doesn't break other code paths + - [ ] For early-return fixes: ensure downstream code still runs when needed + - [ ] Example: Bootstrap early return must still compute per-effect SEs + +4. **Common Patterns to Watch For**: + - `if se > 0 else 0.0` → should be `else np.nan` for undefined statistics + - `if len(data) > 0 else return` → check what downstream code expects + - `mask = (condition)` → verify mask logic for all parameter combinations + +### Pre-Merge Review Checklist + +Final checklist before approving a PR: + +1. **Behavioral Completeness**: + - [ ] Happy path tested + - [ ] Edge cases tested (empty data, NaN inputs, boundary conditions) + - [ ] Error/warning paths tested with behavioral assertions + +2. **Inference Field Consistency**: + - [ ] If one inference field (SE, t-stat, p-value) can be NaN, all related fields handle it + - [ ] Aggregation methods propagate NaN correctly + - [ ] Bootstrap methods handle NaN in base estimates + +3. **Documentation Sync**: + - [ ] Docstrings updated for all changed signatures + - [ ] README updated if user-facing behavior changes + - [ ] REGISTRY.md updated if methodology edge cases change + +## Task Implementation Workflow + +When implementing features or fixes, follow this workflow to ensure quality and catch issues early: + +### Phase 1: Planning +- Use `EnterPlanMode` for non-trivial tasks +- Consult `docs/methodology/REGISTRY.md` for methodology-critical code +- Identify all files and code paths that will be affected +- For bug fixes: grep for the pattern first to find ALL occurrences + +### Phase 2: Implementation +- Follow the relevant development checklists above +- Write tests alongside implementation (not after) +- For bug fixes: fix ALL occurrences in the same commit +- For new parameters: ensure all aggregation/bootstrap paths handle them + +### Phase 3: Pre-Merge Review +**Run `/pre-merge-check` before submitting**. This skill will: +1. Run automated pattern checks on changed files (NaN handling, etc.) +2. Check for missing test coverage +3. Display context-specific checklist items based on what changed +4. Optionally run the test suite + +Address any warnings before proceeding. + +### Phase 4: Submit +- Use `/submit-pr` to create the PR +- Automated AI review will run additional methodology and edge case checks +- The PR template will prompt for methodology references if applicable + +### Quick Reference: Common Patterns to Check + +Before submitting methodology changes, verify these patterns: + +```bash +# Find potential NaN handling issues (should use np.nan, not 0.0) +grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py + +# Find all t_stat calculations to ensure consistency +grep -n "t_stat.*=" diff_diff/*.py + +# Find all inference field assignments +grep -n "\(se\|t_stat\|p_value\|ci_lower\|ci_upper\).*=" diff_diff/*.py | head -30 +```