Skip to content

Commit ced4095

Browse files
authored
Merge pull request #160 from igerber/update-todos
Update TODO.md and ROADMAP.md for accuracy post-v2.4.0
2 parents 890f414 + 00c4616 commit ced4095

2 files changed

Lines changed: 71 additions & 41 deletions

File tree

ROADMAP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
88

99
## Current Status
1010

11-
diff-diff v2.3.0 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:
11+
diff-diff v2.4.0 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:
1212

1313
- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022)
1414
- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance

TODO.md

Lines changed: 70 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ Current limitations that may affect users:
1212

1313
| Issue | Location | Priority | Notes |
1414
|-------|----------|----------|-------|
15-
| MultiPeriodDiD wild bootstrap not supported | `estimators.py:1068-1074` | Low | Edge case |
16-
| `predict()` raises NotImplementedError | `estimators.py:532-554` | Low | Rarely needed |
15+
| MultiPeriodDiD wild bootstrap not supported | `estimators.py:779-785` | Low | Edge case |
16+
| `predict()` raises NotImplementedError | `estimators.py:568-587` | Low | Rarely needed |
1717

1818
## Code Quality
1919

@@ -23,61 +23,55 @@ Target: < 1000 lines per module for maintainability.
2323

2424
| File | Lines | Action |
2525
|------|-------|--------|
26-
| ~~`staggered.py`~~ | ~~2301~~ 1066 | ✅ Split into staggered.py, staggered_bootstrap.py, staggered_aggregation.py, staggered_results.py |
26+
| ~~`staggered.py`~~ | ~~2301~~ 1120 | ✅ Split into staggered.py, staggered_bootstrap.py, staggered_aggregation.py, staggered_results.py |
2727
| ~~`prep.py`~~ | ~~1993~~ 1241 | ✅ Split: DGP functions moved to `prep_dgp.py` (777 lines) |
28-
| `trop.py` | 1703 | Monitor size |
29-
| `visualization.py` | 1627 | Acceptable but growing |
30-
| `honest_did.py` | 1493 | Acceptable |
31-
| `utils.py` | 1481 | Acceptable |
28+
| `trop.py` | 2904 | **Needs split** -- nearly 3x the 1000-line target |
29+
| `imputation.py` | 2480 | **Needs split** -- results, bootstrap, aggregation like staggered |
30+
| `two_stage.py` | 2209 | **Needs split** -- same pattern as imputation |
31+
| `utils.py` | 1879 | **Needs split** -- legacy placebo functions could move to diagnostics |
32+
| `visualization.py` | 1678 | Monitor -- growing but cohesive |
33+
| `linalg.py` | 1537 | Monitor -- unified backend, splitting would hurt cohesion |
34+
| `honest_did.py` | 1511 | Acceptable |
3235
| `power.py` | 1350 | Acceptable |
33-
| `triple_diff.py` | 1291 | Acceptable |
34-
| `sun_abraham.py` | 1176 | Acceptable |
35-
| `pretrends.py` | 1160 | Acceptable |
36-
| `bacon.py` | 1027 | OK |
36+
| `triple_diff.py` | 1322 | Acceptable |
37+
| `sun_abraham.py` | 1227 | Acceptable |
38+
| `estimators.py` | 1161 | Acceptable |
39+
| `pretrends.py` | 1104 | Acceptable |
3740

38-
### NaN Handling for Undefined t-statistics
41+
### ~~NaN Handling for Undefined t-statistics~~ -- DONE
3942

40-
Several estimators return `0.0` for t-statistic when SE is 0 or undefined. This is incorrect—a t-stat of 0 implies a null effect, whereas `np.nan` correctly indicates undefined inference.
43+
All 7 t_stat locations fixed (diagnostics.py, sun_abraham.py, triple_diff.py) -- all now use `np.nan` or `np.isfinite()` guards. Fixed in PR #118 and follow-up PRs.
4144

42-
**Pattern to fix**: `t_stat = effect / se if se > 0 else 0.0``t_stat = effect / se if se > 0 else np.nan`
43-
44-
| Location | Line | Current Code |
45-
|----------|------|--------------|
46-
| `diagnostics.py` | 665 | `t_stat = original_att / se if se > 0 else 0.0` |
47-
| `diagnostics.py` | 786 | `t_stat = mean_effect / se if se > 0 else 0.0` |
48-
| `sun_abraham.py` | 603 | `overall_t = overall_att / overall_se if overall_se > 0 else 0.0` |
49-
| `sun_abraham.py` | 626 | `overall_t = overall_att / overall_se if overall_se > 0 else 0.0` |
50-
| `sun_abraham.py` | 643 | `eff_val / se_val if se_val > 0 else 0.0` |
51-
| `sun_abraham.py` | 881 | `t_stat = agg_effect / agg_se if agg_se > 0 else 0.0` |
52-
| `triple_diff.py` | 601 | `t_stat = att / se if se > 0 else 0.0` |
53-
54-
**Priority**: Medium - affects inference reporting in edge cases.
55-
56-
**Note**: CallawaySantAnna was fixed in PR #97 to use `np.nan`. These other estimators should follow the same pattern.
45+
**Remaining nuance**: `diagnostics.py:785` still has `se = ... else 0.0` for the SE variable itself (not t_stat). The downstream t_stat line correctly returns `np.nan`, so inference is safe, but the SE value of 0.0 is technically incorrect for an undefined SE.
5746

5847
### Migrate Existing Inference Call Sites to `safe_inference()`
5948

60-
`safe_inference()` was added to `diff_diff/utils.py` to compute t_stat, p_value, and CI together with a NaN gate at the top. It is now the prescribed pattern for all new code (see CLAUDE.md design pattern #7). However, ~20 existing inline inference computations across 12 files have **not** been migrated yet.
49+
`safe_inference()` was added to `diff_diff/utils.py` to compute t_stat, p_value, and CI together with a NaN gate at the top. It is now the prescribed pattern for all new code (see CLAUDE.md design pattern #7). However, ~26 existing inline inference computations across 12 files have **not** been migrated yet.
6150

6251
**Files with inline inference to migrate:**
6352

6453
| File | Approx. Locations |
6554
|------|-------------------|
66-
| `estimators.py` | 1 |
67-
| `sun_abraham.py` | 4 |
68-
| `staggered.py` | 4 |
69-
| `staggered_aggregation.py` | 2 |
70-
| `triple_diff.py` | 1 |
71-
| `imputation.py` | 2 |
72-
| `diagnostics.py` | 2 |
73-
| `synthetic_did.py` | 1 |
74-
| `trop.py` | 2 |
55+
| `estimators.py` | 2 (lines 1038, 1089) |
56+
| `sun_abraham.py` | 4 (lines 621, 644, 661, 905) |
57+
| `staggered.py` | 6 (lines 696, 725, 763, 777, 792, 806) |
58+
| `staggered_aggregation.py` | 2 (lines 407, 479) |
59+
| `triple_diff.py` | 1 (line 601) |
60+
| `imputation.py` | 2 (lines 1806, 1927) |
61+
| `two_stage.py` | 2 (lines 1325, 1431) |
62+
| `diagnostics.py` | 2 (lines 665, 786) |
63+
| `synthetic_did.py` | 1 (line 426) |
64+
| `trop.py` | 2 (lines 1474, 2054) |
65+
| `utils.py` | 1 (line 641) |
66+
| `linalg.py` | 1 (line 1310) |
7567

7668
**How to find them:**
7769
```bash
78-
grep -n "t_stat[[:space:]]*=[[:space:]]*[^#]*/ *se" diff_diff/*.py | grep -v "safe_inference"
70+
grep -En "(t_stat|overall_t)\s*=\s*[^#]*/|\[.t_stat.\]\s*=\s*[^#]*/" diff_diff/*.py | grep -v "def safe_inference"
7971
```
8072

73+
**Note**: This command has one false positive (`utils.py:178`, inside the `safe_inference()` body) and misses multi-line expressions (e.g., `sun_abraham.py:660-661`). The table above is the authoritative list.
74+
8175
**Migration pattern:**
8276
```python
8377
# Before (inline, error-prone)
@@ -94,6 +88,42 @@ t_stat, p_value, ci = safe_inference(effect, se, alpha=alpha, df=df)
9488

9589
---
9690

91+
### Tech Debt from Code Reviews
92+
93+
Deferred items from PR reviews that were not addressed before merge.
94+
95+
#### Methodology/Correctness
96+
97+
| Issue | Location | PR | Priority |
98+
|-------|----------|----|----------|
99+
| TwoStageDiD & ImputationDiD bootstrap hardcodes Rademacher only; no `bootstrap_weights` parameter unlike CallawaySantAnna | `two_stage.py:1860`, `imputation.py:2363` | #156, #141 | Medium |
100+
| TwoStageDiD GMM score logic duplicated between analytic/bootstrap with inconsistent NaN/overflow handling | `two_stage.py:1454-1784` | #156 | Medium |
101+
| ImputationDiD weight construction duplicated between aggregation and bootstrap (drift risk) -- has explicit code comment acknowledging duplication | `imputation.py:1777-1786`, `imputation.py:2216-2221` | #141 | Medium |
102+
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py:1564` | #141 | Medium |
103+
104+
#### Performance
105+
106+
| Issue | Location | PR | Priority |
107+
|-------|----------|----|----------|
108+
| TwoStageDiD per-column `.toarray()` in loop for cluster scores | `two_stage.py:1766-1767` | #156 | Medium |
109+
| ImputationDiD event-study SEs recompute full conservative variance per horizon (should cache A0/A1 factorization) | `imputation.py:1772-1804` | #141 | Low |
110+
| Legacy `compute_placebo_effects` uses deprecated projected-gradient weights (marked deprecated, users directed to `SyntheticDiD`) | `utils.py:1689-1691` | #145 | Low |
111+
| Rust faer SVD ndarray-to-faer conversion overhead (minimal vs SVD cost) | `rust/src/linalg.rs:67` | #115 | Low |
112+
113+
#### Testing/Docs
114+
115+
| Issue | Location | PR | Priority |
116+
|-------|----------|----|----------|
117+
| Tutorial notebooks not executed in CI | `docs/tutorials/*.ipynb` | #159 | Low |
118+
| TwoStageDiD `test_nan_propagation` is a no-op (only `pass`) | `tests/test_two_stage.py:643-652` | #156 | Low |
119+
| ImputationDiD bootstrap + covariate path untested | `tests/test_imputation.py` | #141 | Low |
120+
| TROP `n_bootstrap >= 2` validation missing (can yield 0/NaN SE silently) | `trop.py:462` | #124 | Low |
121+
| SunAbraham deprecated `min_pre_periods`/`min_post_periods` still in `fit()` docstring | `sun_abraham.py:458-487` | #153 | Low |
122+
| R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low |
123+
| Rust TROP bootstrap SE returns 0.0 instead of NaN for <2 samples | `rust/src/trop.rs:1038-1054` | #115 | Low |
124+
125+
---
126+
97127
### Standard Error Consistency
98128

99129
Different estimators compute SEs differently. Consider unified interface.
@@ -187,7 +217,7 @@ Replace `@` operator with `np.dot()` at affected call sites. `np.dot()` bypasses
187217
- `linalg.py`: 332, 690, 704, 829, 1463
188218
- `staggered.py`: 78, 85, 102, 860, 1024-1025
189219
- `triple_diff.py`: 301, 307, 323
190-
- `utils.py`: 481
220+
- `utils.py`: 515
191221
- `imputation.py`: 1253, 1301, 1602, 1662
192222
- `trop.py`: 1098
193223

0 commit comments

Comments
 (0)