Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions .claude/commands/pre-merge-check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
description: Run pre-merge checks before submitting a PR
argument-hint: ""
---

# Pre-Merge Check

Run automated checks and display the pre-merge checklist before submitting a PR.

## Instructions

### 1. Identify Changed Files

Get the list of files that will be included in the PR:

```bash
# Get all changed files (tracked modifications + staged + untracked)
git diff --name-only HEAD
git diff --cached --name-only
git ls-files --others --exclude-standard
```

Categorize files into:
- **Methodology files**: `diff_diff/*.py` (excluding `__init__.py`)
- **Test files**: `tests/*.py`
- **Documentation files**: `*.md`, `*.rst`, `docs/**`

### 2. Run Automated Pattern Checks

#### 2.1 NaN Handling Pattern Check (for methodology files)

If any methodology files changed, run these pattern checks:

```bash
# Check for bad t-stat pattern (should use np.nan, not 0.0 or 0)
grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py || true

# Check for potential division issues without NaN handling
grep -E -n "/[[:space:]]*se\>" diff_diff/*.py | grep -v "np.nan" | grep -v "#" || true
```

**Report findings**: If matches found, warn about potential NaN handling issues.

#### 2.2 Test Existence Check

For each changed methodology file, check if corresponding test file was also changed:

| Methodology File | Expected Test File |
|------------------|-------------------|
| `diff_diff/staggered.py` | `tests/test_staggered.py` |
| `diff_diff/estimators.py` | `tests/test_estimators.py` |
| `diff_diff/twfe.py` | `tests/test_estimators.py` |
| `diff_diff/synthetic_did.py` | `tests/test_estimators.py` |
| `diff_diff/sun_abraham.py` | `tests/test_sun_abraham.py` |
| `diff_diff/triple_diff.py` | `tests/test_triple_diff.py` |
| `diff_diff/trop.py` | `tests/test_trop.py` |
| `diff_diff/bacon.py` | `tests/test_bacon.py` |
| `diff_diff/linalg.py` | `tests/test_linalg.py` |
| `diff_diff/utils.py` | `tests/test_utils.py` |
| `diff_diff/diagnostics.py` | `tests/test_diagnostics.py` |
| `diff_diff/prep.py` | `tests/test_prep.py` |
| `diff_diff/visualization.py` | `tests/test_visualization.py` |
| `diff_diff/honest_did.py` | `tests/test_honest_did.py` |
| `diff_diff/power.py` | `tests/test_power.py` |
| `diff_diff/pretrends.py` | `tests/test_pretrends.py` |

**Report**: List any methodology files without corresponding test changes.

#### 2.3 Docstring Check (heuristic)

For changed `.py` files, run a quick check for functions without docstrings:
```bash
# Find public function definitions (heuristic check)
grep -n "^def [^_]" <changed-py-files> | head -10
grep -n "^ def [^_]" <changed-py-files> | head -10
```

Note: This is a heuristic. Verify docstrings exist for new public functions.

### 3. Display Context-Specific Checklist

Based on what changed, display the appropriate checklist items:

#### Always Show (Core Checklist)
```
## Pre-Merge Checklist

Based on your changes to: <list of changed files>

### Behavioral Completeness
- [ ] Happy path tested
- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
- [ ] Error/warning paths tested with behavioral assertions
```

#### If Methodology Files Changed
```
### Inference Field Consistency
- [ ] If SE can be 0/undefined, ALL inference fields (t-stat, p-value, CI) return NaN
- [ ] Aggregation methods propagate NaN correctly
- [ ] Bootstrap methods handle NaN in base estimates

### Control Group Logic (if adding new modes/code paths)
- [ ] Control group composition verified for new code paths
- [ ] "Not-yet-treated" excludes the treatment cohort itself
- [ ] Parameter interactions tested with all aggregation methods
```

#### If Documentation Files Changed
```
### Documentation Sync
- [ ] Docstrings updated for changed function signatures
- [ ] README updated if user-facing behavior changes
- [ ] REGISTRY.md updated if methodology edge cases change
```

#### If This Appears to Be a Bug Fix
```
### Pattern Consistency (Bug Fix)
- [ ] Grepped for similar patterns across codebase before fixing
- [ ] Fixed ALL occurrences, not just the one that was reported
- [ ] Verified fix doesn't break other code paths
```

### 4. Ask About Running Tests

Use AskUserQuestion to offer test options:

```
Would you like to run tests now?

Options:
1. Yes - run full test suite (pytest)
2. Yes - run only tests for changed files
3. No - skip tests for now
```

If user chooses option 1:
```bash
pytest
```

If user chooses option 2, run targeted tests based on changed files:
```bash
pytest tests/test_staggered.py tests/test_estimators.py # (example)
```

### 5. Report Summary

```
## Pre-Merge Check Complete

### Automated Checks
- Pattern checks: [PASS/WARN - N potential issues found]
- Test coverage: [PASS/WARN - N methodology files without test changes]

### Manual Checklist
Review the checklist items above before running /submit-pr

### Findings to Address
<list any warnings from pattern checks>

### Next Steps
- Address any warnings above
- Complete manual checklist items
- When ready: /submit-pr "Your PR title"
```

## Notes

- This skill is read-only - it only analyzes and reports, doesn't modify files
- Run this BEFORE `/submit-pr` to catch issues early
- Pattern checks are heuristics - review flagged items manually to confirm
- If pattern checks find issues, fix them before submitting
44 changes: 38 additions & 6 deletions .github/codex/prompts/pr_review.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,44 @@ TOP PRIORITY: Methodology adherence to source material.
3) Flag any mismatch, missing assumption check, incorrect variance/SE, or undocumented deviation as P0/P1.

SECONDARY PRIORITIES (in order):
2) Code quality
3) Performance
4) Maintainability
5) Minimization of tech debt
6) Security (including accidental secrets)
7) Documentation + tests
2) Edge case coverage (see checklist below)
3) Code quality
4) Performance
5) Maintainability
6) Minimization of tech debt
7) Security (including accidental secrets)
8) Documentation + tests

## Edge Case Review (learned from PR #97 analysis)

When reviewing new features or code paths, specifically check:

1. **Empty Result Sets**:
- Does the code handle when filters produce no matching data?
- Example: `base_period="varying"` with no valid pre-treatment periods
- Flag as P1 if new code paths lack empty-data handling

2. **NaN/Inf Propagation**:
- If SE can be 0 or undefined, are ALL inference fields (t-stat, p-value, CI) set to NaN?
- Search for patterns: `if se > 0 else 0.0` → should be `else np.nan`
- Check ALL occurrences of this pattern in affected files
- Flag as P0 if statistical output could be misleading (e.g., t_stat=0.0 instead of NaN)

3. **Parameter Interactions**:
- Does new parameter interact correctly with all aggregation methods?
- Does new parameter interact correctly with bootstrap/inference?
- Example: `anticipation` parameter must affect group aggregation filtering
- Flag as P1 if new parameter isn't tested with all existing code paths

4. **Control/Comparison Group Logic**:
- For new code paths, is the control group defined correctly?
- Example: "not-yet-treated" should exclude the treatment cohort itself
- Flag as P0 if control group composition could bias estimates

5. **Pattern Consistency**:
- If the PR fixes a pattern bug, verify ALL occurrences were fixed
- Command to check: `grep -n "pattern" diff_diff/*.py`
- Flag as P1 if only partial fixes were made

Rules:
- Review ONLY the changes introduced by this PR (diff) and the minimum surrounding context needed.
Expand Down
117 changes: 117 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -514,3 +514,120 @@ When adding code that emits warnings or handles errors:
4. **Protect arithmetic operations**:
- [ ] Wrap ALL related operations in `np.errstate()`, not just the final one
- [ ] Include division, matrix multiplication, and any operation that can overflow/underflow

### Reviewing New Features or Code Paths

When reviewing PRs that add new features, modes, or code paths (learned from PR #97 analysis):

1. **Edge Case Coverage**:
- [ ] Empty result sets (no matching data for a filter condition)
- [ ] NaN/Inf propagation through ALL inference fields (SE, t-stat, p-value, CI)
- [ ] Parameter interactions (e.g., new param × existing aggregation methods)
- [ ] Control/comparison group composition for all code paths

2. **Documentation Completeness**:
- [ ] All new parameters have docstrings with type, default, and description
- [ ] Methodology docs match implementation behavior (equations, edge cases)
- [ ] Edge cases documented in `docs/methodology/REGISTRY.md`

3. **Logic Audit for New Code Paths**:
- [ ] When adding new modes (like `base_period="varying"`), trace ALL downstream effects
- [ ] Check aggregation methods handle the new mode correctly
- [ ] Check bootstrap/inference methods handle the new mode correctly
- [ ] Explicitly test control group composition in new code paths

4. **Pattern Consistency**:
- [ ] Search for similar patterns in codebase (e.g., `t_stat = x / se if se > 0 else ...`)
- [ ] Ensure new code follows established patterns or updates ALL instances
- [ ] If fixing a pattern, grep for ALL occurrences first:
```bash
grep -n "if.*se.*> 0.*else" diff_diff/*.py
```

### Fixing Bugs Across Multiple Locations

When a bug fix involves a pattern that appears in multiple places (learned from PR #97 analysis):

1. **Find All Instances First**:
- [ ] Use grep/search to find ALL occurrences of the pattern before fixing
- [ ] Document the locations found (file:line)
- [ ] Example: `t_stat = effect / se if se > 0 else 0.0` appeared in 7 locations

2. **Fix Comprehensively in One Round**:
- [ ] Fix ALL instances in the same PR/commit
- [ ] Create a test that covers all locations
- [ ] Don't fix incrementally across multiple review rounds

3. **Regression Test the Fix**:
- [ ] Verify fix doesn't break other code paths
- [ ] For early-return fixes: ensure downstream code still runs when needed
- [ ] Example: Bootstrap early return must still compute per-effect SEs

4. **Common Patterns to Watch For**:
- `if se > 0 else 0.0` → should be `else np.nan` for undefined statistics
- `if len(data) > 0 else return` → check what downstream code expects
- `mask = (condition)` → verify mask logic for all parameter combinations

### Pre-Merge Review Checklist

Final checklist before approving a PR:

1. **Behavioral Completeness**:
- [ ] Happy path tested
- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
- [ ] Error/warning paths tested with behavioral assertions

2. **Inference Field Consistency**:
- [ ] If one inference field (SE, t-stat, p-value) can be NaN, all related fields handle it
- [ ] Aggregation methods propagate NaN correctly
- [ ] Bootstrap methods handle NaN in base estimates

3. **Documentation Sync**:
- [ ] Docstrings updated for all changed signatures
- [ ] README updated if user-facing behavior changes
- [ ] REGISTRY.md updated if methodology edge cases change

## Task Implementation Workflow

When implementing features or fixes, follow this workflow to ensure quality and catch issues early:

### Phase 1: Planning
- Use `EnterPlanMode` for non-trivial tasks
- Consult `docs/methodology/REGISTRY.md` for methodology-critical code
- Identify all files and code paths that will be affected
- For bug fixes: grep for the pattern first to find ALL occurrences

### Phase 2: Implementation
- Follow the relevant development checklists above
- Write tests alongside implementation (not after)
- For bug fixes: fix ALL occurrences in the same commit
- For new parameters: ensure all aggregation/bootstrap paths handle them

### Phase 3: Pre-Merge Review
**Run `/pre-merge-check` before submitting**. This skill will:
1. Run automated pattern checks on changed files (NaN handling, etc.)
2. Check for missing test coverage
3. Display context-specific checklist items based on what changed
4. Optionally run the test suite

Address any warnings before proceeding.

### Phase 4: Submit
- Use `/submit-pr` to create the PR
- Automated AI review will run additional methodology and edge case checks
- The PR template will prompt for methodology references if applicable

### Quick Reference: Common Patterns to Check

Before submitting methodology changes, verify these patterns:

```bash
# Find potential NaN handling issues (should use np.nan, not 0.0)
grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py

# Find all t_stat calculations to ensure consistency
grep -n "t_stat.*=" diff_diff/*.py

# Find all inference field assignments
grep -n "\(se\|t_stat\|p_value\|ci_lower\|ci_upper\).*=" diff_diff/*.py | head -30
```