Skip to content

Commit a3a7909

Browse files
igerberclaude
andcommitted
Add overlap warning for inverse propensity clipping, align module docstring
P1 fix: estimate_inverse_propensity_sieve() now warns when s_hat values are clipped to [1, n], matching the ratio path's overlap diagnostics. Documented in REGISTRY.md. P3 fix: module-level docstring in efficient_did.py now qualifies the covariate path, consistent with the class docstring and REGISTRY.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 56fa2e7 commit a3a7909

3 files changed

Lines changed: 19 additions & 7 deletions

File tree

diff_diff/efficient_did.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
"""
22
Efficient Difference-in-Differences estimator.
33
4-
Implements the semiparametrically efficient ATT estimator from
5-
Chen, Sant'Anna & Xie (2025).
4+
Implements the ATT estimator from Chen, Sant'Anna & Xie (2025).
5+
Without covariates, achieves the semiparametric efficiency bound via
6+
closed-form within-group covariances. With covariates, uses a doubly
7+
robust path with OLS outcome regression, sieve propensity ratios, and
8+
kernel-smoothed conditional Omega*(X) (see class docstring for caveats).
69
7-
The estimator achieves the efficiency bound by optimally weighting
8-
across pre-treatment periods and comparison groups via the inverse of
9-
the within-group covariance matrix Omega*. Under the stronger PT-All
10-
assumption the model is overidentified and EDiD exploits this for
10+
Under PT-All the model is overidentified and EDiD exploits this for
1111
tighter inference; under PT-Post it reduces to the standard
1212
single-baseline estimator (Callaway-Sant'Anna).
1313
"""

diff_diff/efficient_did_covariates.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,18 @@ def estimate_inverse_propensity_sieve(
365365
stacklevel=2,
366366
)
367367

368+
# Overlap diagnostics: warn if s_hat values require clipping
369+
n_clipped = int(np.sum((best_s < 1.0) | (best_s > float(n_units))))
370+
if n_clipped > 0:
371+
pct = 100.0 * n_clipped / n_units
372+
warnings.warn(
373+
f"Inverse propensity estimates for {n_clipped} of {n_units} units "
374+
f"({pct:.1f}%) were outside [1, {n_units}] and will be clipped. "
375+
f"This may indicate overlap assumption violations.",
376+
UserWarning,
377+
stacklevel=2,
378+
)
379+
368380
# s = 1/p must be >= 1 (since p <= 1) and bounded above
369381
best_s = np.clip(best_s, 1.0, float(n_units))
370382
return best_s

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -672,7 +672,7 @@ where `q_{g,e} = pi_g / sum_{g' in G_{trt,e}} pi_{g'}`.
672672
- [x] Overlap diagnostics for propensity score ratios
673673
- **Note:** Sieve ratio estimation uses polynomial basis functions (total degree up to K) with AIC/BIC model selection. The paper describes sieve estimators generally without specifying a particular basis family; polynomial sieves are a standard choice (Section 4, Eq 4.2). Negative sieve ratio predictions are clipped to a small positive value since the population ratio p_g(X)/p_{g'}(X) is non-negative.
674674
- **Note:** Kernel-smoothed conditional covariance Omega*(X) uses Gaussian kernel with Silverman's rule-of-thumb bandwidth by default. The paper specifies kernel smoothing (step 5, Section 4) without mandating a particular kernel or bandwidth selection method.
675-
- **Note:** Conditional covariance Omega*(X) scales each term by per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X) (algorithm step 4), matching Eq 3.12. The inverse propensity estimation uses the same polynomial sieve convex minimization as the ratio estimator.
675+
- **Note:** Conditional covariance Omega*(X) scales each term by per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X) (algorithm step 4), matching Eq 3.12. The inverse propensity estimation uses the same polynomial sieve convex minimization as the ratio estimator. Estimated s_hat values are clipped to [1, n] with a UserWarning when clipping binds, mirroring the ratio path's overlap diagnostics.
676676
- **Note:** Outcome regressions m_hat_{g',t,tpre}(X) use linear OLS working models. The paper's Section 4 describes flexible nonparametric nuisance estimation (sieve regression, kernel smoothing, or ML methods). The DR property ensures consistency if either the OLS outcome model or the sieve propensity ratio is correctly specified, but the linear OLS specification does not generically guarantee attainment of the semiparametric efficiency bound unless the conditional mean is linear in the covariates.
677677

678678
---

0 commit comments

Comments
 (0)