You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### ~~NaN Handling for Undefined t-statistics~~ -- DONE
39
42
40
-
Several estimators return `0.0` for t-statistic when SE is 0 or undefined. This is incorrect—a t-stat of 0 implies a null effect, whereas `np.nan` correctly indicates undefined inference.
43
+
All 7 t_stat locations fixed (diagnostics.py, sun_abraham.py, triple_diff.py) -- all now use `np.nan` or `np.isfinite()` guards. Fixed in PR #118 and follow-up PRs.
41
44
42
-
**Pattern to fix**: `t_stat = effect / se if se > 0 else 0.0` → `t_stat = effect / se if se > 0 else np.nan`
43
-
44
-
| Location | Line | Current Code |
45
-
|----------|------|--------------|
46
-
|`diagnostics.py`| 665 |`t_stat = original_att / se if se > 0 else 0.0`|
47
-
|`diagnostics.py`| 786 |`t_stat = mean_effect / se if se > 0 else 0.0`|
|`triple_diff.py`| 601 |`t_stat = att / se if se > 0 else 0.0`|
53
-
54
-
**Priority**: Medium - affects inference reporting in edge cases.
55
-
56
-
**Note**: CallawaySantAnna was fixed in PR #97 to use `np.nan`. These other estimators should follow the same pattern.
45
+
**Remaining nuance**: `diagnostics.py:785` still has `se = ... else 0.0` for the SE variable itself (not t_stat). The downstream t_stat line correctly returns `np.nan`, so inference is safe, but the SE value of 0.0 is technically incorrect for an undefined SE.
57
46
58
47
### Migrate Existing Inference Call Sites to `safe_inference()`
59
48
60
-
`safe_inference()` was added to `diff_diff/utils.py` to compute t_stat, p_value, and CI together with a NaN gate at the top. It is now the prescribed pattern for all new code (see CLAUDE.md design pattern #7). However, ~20 existing inline inference computations across 12 files have **not** been migrated yet.
49
+
`safe_inference()` was added to `diff_diff/utils.py` to compute t_stat, p_value, and CI together with a NaN gate at the top. It is now the prescribed pattern for all new code (see CLAUDE.md design pattern #7). However, ~26 existing inline inference computations across 12 files have **not** been migrated yet.
**Note**: This command has one false positive (`utils.py:178`, inside the `safe_inference()` body) and misses multi-line expressions (e.g., `sun_abraham.py:660-661`). The table above is the authoritative list.
74
+
81
75
**Migration pattern:**
82
76
```python
83
77
# Before (inline, error-prone)
@@ -94,6 +88,42 @@ t_stat, p_value, ci = safe_inference(effect, se, alpha=alpha, df=df)
94
88
95
89
---
96
90
91
+
### Tech Debt from Code Reviews
92
+
93
+
Deferred items from PR reviews that were not addressed before merge.
94
+
95
+
#### Methodology/Correctness
96
+
97
+
| Issue | Location | PR | Priority |
98
+
|-------|----------|----|----------|
99
+
| TwoStageDiD & ImputationDiD bootstrap hardcodes Rademacher only; no `bootstrap_weights` parameter unlike CallawaySantAnna |`two_stage.py:1860`, `imputation.py:2363`|#156, #141| Medium |
100
+
| TwoStageDiD GMM score logic duplicated between analytic/bootstrap with inconsistent NaN/overflow handling |`two_stage.py:1454-1784`|#156| Medium |
101
+
| ImputationDiD weight construction duplicated between aggregation and bootstrap (drift risk) -- has explicit code comment acknowledging duplication |`imputation.py:1777-1786`, `imputation.py:2216-2221`|#141| Medium |
102
+
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels |`imputation.py:1564`|#141| Medium |
103
+
104
+
#### Performance
105
+
106
+
| Issue | Location | PR | Priority |
107
+
|-------|----------|----|----------|
108
+
| TwoStageDiD per-column `.toarray()` in loop for cluster scores |`two_stage.py:1766-1767`|#156| Medium |
109
+
| ImputationDiD event-study SEs recompute full conservative variance per horizon (should cache A0/A1 factorization) |`imputation.py:1772-1804`|#141| Low |
0 commit comments