Add overlap warning for inverse propensity clipping, align module docstring

igerber · claude · igerber · commit a3a7909307cc · 2026-03-21T19:19:17.000-04:00
P1 fix: estimate_inverse_propensity_sieve() now warns when s_hat values
are clipped to [1, n], matching the ratio path's overlap diagnostics.
Documented in REGISTRY.md.

P3 fix: module-level docstring in efficient_did.py now qualifies the
covariate path, consistent with the class docstring and REGISTRY.md.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/efficient_did.py b/diff_diff/efficient_did.py
@@ -1,13 +1,13 @@
 """
 Efficient Difference-in-Differences estimator.
 
-Implements the semiparametrically efficient ATT estimator from
-Chen, Sant'Anna & Xie (2025).
+Implements the ATT estimator from Chen, Sant'Anna & Xie (2025).
+Without covariates, achieves the semiparametric efficiency bound via
+closed-form within-group covariances.  With covariates, uses a doubly
+robust path with OLS outcome regression, sieve propensity ratios, and
+kernel-smoothed conditional Omega*(X) (see class docstring for caveats).
 
-The estimator achieves the efficiency bound by optimally weighting
-across pre-treatment periods and comparison groups via the inverse of
-the within-group covariance matrix Omega*.  Under the stronger PT-All
-assumption the model is overidentified and EDiD exploits this for
+Under PT-All the model is overidentified and EDiD exploits this for
 tighter inference; under PT-Post it reduces to the standard
 single-baseline estimator (Callaway-Sant'Anna).
 """
diff --git a/diff_diff/efficient_did_covariates.py b/diff_diff/efficient_did_covariates.py
@@ -365,6 +365,18 @@ def estimate_inverse_propensity_sieve(
             stacklevel=2,
         )
 
+    # Overlap diagnostics: warn if s_hat values require clipping
+    n_clipped = int(np.sum((best_s < 1.0) | (best_s > float(n_units))))
+    if n_clipped > 0:
+        pct = 100.0 * n_clipped / n_units
+        warnings.warn(
+            f"Inverse propensity estimates for {n_clipped} of {n_units} units "
+            f"({pct:.1f}%) were outside [1, {n_units}] and will be clipped. "
+            f"This may indicate overlap assumption violations.",
+            UserWarning,
+            stacklevel=2,
+        )
+
     # s = 1/p must be >= 1 (since p <= 1) and bounded above
     best_s = np.clip(best_s, 1.0, float(n_units))
     return best_s
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -672,7 +672,7 @@ where `q_{g,e} = pi_g / sum_{g' in G_{trt,e}} pi_{g'}`.
 - [x] Overlap diagnostics for propensity score ratios
 - **Note:** Sieve ratio estimation uses polynomial basis functions (total degree up to K) with AIC/BIC model selection. The paper describes sieve estimators generally without specifying a particular basis family; polynomial sieves are a standard choice (Section 4, Eq 4.2). Negative sieve ratio predictions are clipped to a small positive value since the population ratio p_g(X)/p_{g'}(X) is non-negative.
 - **Note:** Kernel-smoothed conditional covariance Omega*(X) uses Gaussian kernel with Silverman's rule-of-thumb bandwidth by default. The paper specifies kernel smoothing (step 5, Section 4) without mandating a particular kernel or bandwidth selection method.
-- **Note:** Conditional covariance Omega*(X) scales each term by per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X) (algorithm step 4), matching Eq 3.12. The inverse propensity estimation uses the same polynomial sieve convex minimization as the ratio estimator.
+- **Note:** Conditional covariance Omega*(X) scales each term by per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X) (algorithm step 4), matching Eq 3.12. The inverse propensity estimation uses the same polynomial sieve convex minimization as the ratio estimator. Estimated s_hat values are clipped to [1, n] with a UserWarning when clipping binds, mirroring the ratio path's overlap diagnostics.
 - **Note:** Outcome regressions m_hat_{g',t,tpre}(X) use linear OLS working models. The paper's Section 4 describes flexible nonparametric nuisance estimation (sieve regression, kernel smoothing, or ML methods). The DR property ensures consistency if either the OLS outcome model or the sieve propensity ratio is correctly specified, but the linear OLS specification does not generically guarantee attainment of the semiparametric efficiency bound unless the conditional mean is linear in the covariates.
 
 ---