Skip to content

Commit 16dc2f9

Browse files
igerberclaude
andcommitted
Implement sieve inverse propensity for conditional Omega* (Eq 3.12 step 4)
Replace unconditional cohort fractions pi_g in Omega*(X) with per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X), matching Eq 3.12 and the paper's algorithm step 4. The inverse propensity sieve uses the same polynomial basis and AIC/BIC selection as the ratio estimator. FOC: (Psi_{g'}' Psi_{g'}) beta = Psi_all.sum(axis=0) — same closed-form linear system structure. This eliminates the last documented methodology deviation from the paper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 7608264 commit 16dc2f9

3 files changed

Lines changed: 159 additions & 50 deletions

File tree

diff_diff/efficient_did.py

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
compute_generated_outcomes_cov,
2828
compute_omega_star_conditional,
2929
compute_per_unit_weights,
30+
estimate_inverse_propensity_sieve,
3031
estimate_outcome_regression,
3132
estimate_propensity_ratio_sieve,
3233
)
@@ -53,10 +54,9 @@ class EfficientDiD(EfficientDiDBootstrapMixin):
5354
Without covariates, uses a closed-form estimator based on within-group
5455
sample means and covariances. With covariates, uses the doubly robust
5556
path: sieve-based propensity score ratios (Eq 4.1-4.2) with AIC/BIC
56-
selection, OLS outcome regression, and kernel-smoothed conditional
57-
Omega*(X) for per-unit efficient weights. The conditional Omega*
58-
currently uses unconditional cohort fractions rather than per-unit
59-
conditional propensities (see REGISTRY.md deviation note).
57+
selection, OLS outcome regression, sieve-estimated inverse propensities
58+
(algorithm step 4), and kernel-smoothed conditional Omega*(X) with
59+
per-unit efficient weights (Eq 3.12).
6060
6161
Parameters
6262
----------
@@ -356,6 +356,7 @@ def fit(
356356
covariate_matrix: Optional[np.ndarray] = None
357357
m_hat_cache: Dict[Tuple, np.ndarray] = {}
358358
r_hat_cache: Dict[Tuple[float, float], np.ndarray] = {}
359+
s_hat_cache: Dict[float, np.ndarray] = {} # inverse propensities per group
359360

360361
if use_covariates:
361362
assert covariates is not None # for type narrowing
@@ -523,7 +524,21 @@ def fit(
523524

524525
y_hat = np.mean(gen_out, axis=0) # shape (H,)
525526

526-
# Conditional Omega*(X): (n_units, H, H)
527+
# Inverse propensity estimation (algorithm step 4)
528+
# s_hat_{g'}(X) = 1/p_{g'}(X) for Eq 3.12 scaling
529+
for group_id in {g, np.inf} | {gp for gp, _ in pairs}:
530+
if group_id not in s_hat_cache:
531+
group_mask_s = (
532+
never_treated_mask if np.isinf(group_id) else cohort_masks[group_id]
533+
)
534+
s_hat_cache[group_id] = estimate_inverse_propensity_sieve(
535+
covariate_matrix,
536+
group_mask_s,
537+
k_max=self.sieve_k_max,
538+
criterion=self.sieve_criterion,
539+
)
540+
541+
# Conditional Omega*(X) with per-unit propensities (Eq 3.12)
527542
omega_cond = compute_omega_star_conditional(
528543
target_g=g,
529544
target_t=t,
@@ -535,6 +550,7 @@ def fit(
535550
period_1_col=effective_p1_col,
536551
cohort_fractions=cohort_fractions,
537552
covariate_matrix=covariate_matrix,
553+
s_hat_cache=s_hat_cache,
538554
bandwidth=self.kernel_bandwidth,
539555
)
540556

diff_diff/efficient_did_covariates.py

Lines changed: 137 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,97 @@ def estimate_propensity_ratio_sieve(
254254
return best_ratio
255255

256256

257+
# ---------------------------------------------------------------------------
258+
# Sieve-based inverse propensity estimation (Algorithm step 4)
259+
# ---------------------------------------------------------------------------
260+
261+
262+
def estimate_inverse_propensity_sieve(
263+
covariate_matrix: np.ndarray,
264+
group_mask: np.ndarray,
265+
k_max: Optional[int] = None,
266+
criterion: str = "bic",
267+
) -> np.ndarray:
268+
r"""Estimate s_{g'}(X) = 1/p_{g'}(X) via sieve convex minimization.
269+
270+
Solves for each sieve degree K:
271+
272+
.. math::
273+
\hat\beta_K = \arg\min_\beta \frac{1}{n}
274+
\sum_i \bigl[ G_{g',i} (\psi^K(X_i)'\beta)^2
275+
- 2 (\psi^K(X_i)'\beta) \bigr]
276+
277+
FOC: ``(Psi_{g'}' Psi_{g'}) beta = Psi_all.sum(axis=0)``
278+
279+
This is the same structure as the ratio estimator but with all
280+
units on the RHS (not just group g), following the paper's
281+
algorithm step 4.
282+
283+
Parameters
284+
----------
285+
covariate_matrix : ndarray, shape (n_units, n_covariates)
286+
group_mask : ndarray of bool, shape (n_units,)
287+
Mask for the group whose inverse propensity to estimate.
288+
k_max : int or None
289+
Maximum polynomial degree. None = auto.
290+
criterion : str
291+
``"aic"`` or ``"bic"``.
292+
293+
Returns
294+
-------
295+
s_hat : ndarray, shape (n_units,)
296+
Estimated ``1/p_{g'}(X_i)`` for every unit. Clipped to [1, n].
297+
"""
298+
n_units = len(covariate_matrix)
299+
n_group = int(np.sum(group_mask))
300+
d = covariate_matrix.shape[1]
301+
302+
if n_group == 0:
303+
return np.ones(n_units)
304+
305+
if k_max is None:
306+
k_max = min(int(n_group**0.2), 5)
307+
k_max = max(k_max, 1)
308+
309+
c_n = 2.0 if criterion == "aic" else np.log(max(n_units, 2))
310+
311+
best_ic = np.inf
312+
best_s = np.full(n_units, n_units / n_group) # fallback: unconditional
313+
314+
for K in range(1, k_max + 1):
315+
n_basis = comb(K + d, d)
316+
if n_basis >= n_group:
317+
break
318+
319+
basis_all = _polynomial_sieve_basis(covariate_matrix, K)
320+
Psi_gp = basis_all[group_mask]
321+
322+
A = Psi_gp.T @ Psi_gp
323+
# RHS: sum of basis over ALL units (not just one group)
324+
b = basis_all.sum(axis=0)
325+
326+
try:
327+
beta = np.linalg.solve(A, b)
328+
except np.linalg.LinAlgError:
329+
continue
330+
if not np.all(np.isfinite(beta)):
331+
continue
332+
333+
s_hat = basis_all @ beta
334+
335+
# IC: loss = -(1/n) * b'beta (same derivation as ratio estimator)
336+
loss = -float(b @ beta) / n_units
337+
ic_val = 2.0 * loss + c_n * n_basis / n_units
338+
339+
if ic_val < best_ic:
340+
best_ic = ic_val
341+
best_s = s_hat.copy()
342+
343+
# s = 1/p must be >= 1 (since p <= 1) and bounded above
344+
best_s = np.clip(best_s, 1.0, float(n_units))
345+
return best_s
346+
347+
257348
# ---------------------------------------------------------------------------
258349
# Doubly robust generated outcomes (Eq 4.4)
259350
# ---------------------------------------------------------------------------
@@ -429,14 +520,17 @@ def compute_omega_star_conditional(
429520
period_1_col: int,
430521
cohort_fractions: Dict[float, float],
431522
covariate_matrix: np.ndarray,
523+
s_hat_cache: Dict[float, np.ndarray],
432524
bandwidth: Optional[float] = None,
433525
never_treated_val: float = np.inf,
434526
) -> np.ndarray:
435527
r"""Kernel-smoothed conditional Omega\*(X_i) for each unit (Eq 3.12).
436528
437529
Estimates the five-term conditional covariance matrix using
438530
Nadaraya-Watson kernel regression with Gaussian kernel and
439-
local (kernel-weighted) means.
531+
local (kernel-weighted) means. Scales each term by per-unit
532+
conditional inverse propensities ``s_hat_g(X_i) = 1/p_g(X_i)``
533+
(algorithm step 4), matching the paper's Eq 3.12.
440534
441535
Parameters
442536
----------
@@ -447,6 +541,9 @@ def compute_omega_star_conditional(
447541
cohort_masks, never_treated_mask, period_to_col, period_1_col,
448542
cohort_fractions : pre-computed data structures
449543
covariate_matrix : ndarray, shape (n_units, n_covariates)
544+
s_hat_cache : dict
545+
Inverse propensity estimates ``{group: s_hat(X_i)}`` where each
546+
value is shape ``(n_units,)``. Keyed by group identifier.
450547
bandwidth : float or None
451548
Kernel bandwidth. None = Silverman's rule.
452549
never_treated_val : float
@@ -468,48 +565,47 @@ def compute_omega_star_conditional(
468565
y1_col = period_1_col
469566

470567
g_mask = cohort_masks[target_g]
471-
pi_g = cohort_fractions[target_g]
472568

473569
Y_inf = outcome_wide[never_treated_mask]
474570
X_inf = covariate_matrix[never_treated_mask]
475-
pi_inf = cohort_fractions.get(never_treated_val, 0.0)
571+
572+
# Per-unit inverse propensities from sieve estimation (Eq 3.12)
573+
s_g = s_hat_cache.get(target_g, np.full(n_units, 1.0 / max(cohort_fractions[target_g], 1e-10)))
574+
s_inf = s_hat_cache.get(
575+
never_treated_val,
576+
np.full(n_units, 1.0 / max(cohort_fractions.get(never_treated_val, 1e-10), 1e-10)),
577+
)
476578

477579
# Scalability warning
478580
if n_units > 5000:
479581
warnings.warn(
480582
f"Conditional Omega* estimation with n={n_units} is expensive "
481-
f"(O(n^2 * H^2)). Consider using fewer units or unconditional Omega*.",
583+
f"(O(n^2 * H^2)). Consider using fewer units.",
482584
UserWarning,
483585
stacklevel=2,
484586
)
485587

486-
# Pre-compute kernel weight matrices per group (reused across pairs)
487-
# W_g[i, j] = normalized K_h(X_g[j], X_all[i])
588+
# Pre-compute kernel weight matrices per group
488589
Y_g = outcome_wide[g_mask]
489590
X_g = covariate_matrix[g_mask]
490591
Yg_t_minus_1 = Y_g[:, t_col] - Y_g[:, y1_col]
491592

492593
W_g = _kernel_weights_matrix(covariate_matrix, X_g, bandwidth)
493594
W_inf = _kernel_weights_matrix(covariate_matrix, X_inf, bandwidth)
494595

495-
# Pre-compute per-pair differenced arrays for never-treated
496596
inf_t_minus_tpre = {}
497597
for _, tpre in valid_pairs:
498598
tpre_col = period_to_col[tpre]
499599
if tpre_col not in inf_t_minus_tpre:
500600
inf_t_minus_tpre[tpre_col] = Y_inf[:, t_col] - Y_inf[:, tpre_col]
501601

502-
# Pre-compute kernel weights for comparison cohorts
503602
W_gp_cache: Dict[float, np.ndarray] = {}
504603
gp_outcomes_cache: Dict[float, np.ndarray] = {}
505604

506605
omega = np.zeros((n_units, H, H))
507606

508-
# Term 1: (1/pi_g) * Cov(Y_t-Y_1, Y_t-Y_1 | G=g, X) — same for all (j,k)
509-
if pi_g > 0:
510-
term1 = (1.0 / pi_g) * _kernel_weighted_cov(Yg_t_minus_1, Yg_t_minus_1, W_g)
511-
else:
512-
term1 = np.zeros(n_units)
607+
# Term 1: s_g(X) * Cov(Y_t-Y_1, Y_t-Y_1 | G=g, X) — same for all (j,k)
608+
term1 = s_g * _kernel_weighted_cov(Yg_t_minus_1, Yg_t_minus_1, W_g)
513609

514610
for j in range(H):
515611
gp_j, tpre_j = valid_pairs[j]
@@ -521,45 +617,42 @@ def compute_omega_star_conditional(
521617

522618
val = term1.copy()
523619

524-
# Term 2: (1/pi_inf) * Cov(Y_t-Y_{tpre_j}, Y_t-Y_{tpre_k} | G=inf, X)
525-
if pi_inf > 0:
526-
val += (1.0 / pi_inf) * _kernel_weighted_cov(
527-
inf_t_minus_tpre[tpre_j_col],
528-
inf_t_minus_tpre[tpre_k_col],
529-
W_inf,
530-
)
620+
# Term 2: s_inf(X) * Cov(Y_t-Y_{tpre_j}, Y_t-Y_{tpre_k} | G=inf, X)
621+
val += s_inf * _kernel_weighted_cov(
622+
inf_t_minus_tpre[tpre_j_col],
623+
inf_t_minus_tpre[tpre_k_col],
624+
W_inf,
625+
)
531626

532-
# Term 3: -1{g==g'_j}/pi_g * Cov(Y_t-Y_1, Y_{tpre_j}-Y_1 | G=g, X)
533-
if gp_j == target_g and pi_g > 0:
627+
# Term 3: -1{g==g'_j} * s_g(X) * Cov(Y_t-Y_1, Y_{tpre_j}-Y_1 | G=g, X)
628+
if gp_j == target_g:
534629
g_tpre_j = Y_g[:, tpre_j_col] - Y_g[:, y1_col]
535-
val -= (1.0 / pi_g) * _kernel_weighted_cov(Yg_t_minus_1, g_tpre_j, W_g)
630+
val -= s_g * _kernel_weighted_cov(Yg_t_minus_1, g_tpre_j, W_g)
536631

537-
# Term 4: -1{g==g'_k}/pi_g * Cov(Y_t-Y_1, Y_{tpre_k}-Y_1 | G=g, X)
538-
if gp_k == target_g and pi_g > 0:
632+
# Term 4: -1{g==g'_k} * s_g(X) * Cov(Y_t-Y_1, Y_{tpre_k}-Y_1 | G=g, X)
633+
if gp_k == target_g:
539634
g_tpre_k = Y_g[:, tpre_k_col] - Y_g[:, y1_col]
540-
val -= (1.0 / pi_g) * _kernel_weighted_cov(Yg_t_minus_1, g_tpre_k, W_g)
635+
val -= s_g * _kernel_weighted_cov(Yg_t_minus_1, g_tpre_k, W_g)
541636

542-
# Term 5: 1{g'_j==g'_k}/pi_{g'_j} * Cov(Y_{tpre_j}-Y_1, Y_{tpre_k}-Y_1 | G=g'_j, X)
637+
# Term 5: 1{g'_j==g'_k} * s_{g'_j}(X) * Cov(...)
543638
if gp_j == gp_k:
544639
if np.isinf(gp_j):
545-
if pi_inf > 0:
546-
inf_tpre_j = Y_inf[:, tpre_j_col] - Y_inf[:, y1_col]
547-
inf_tpre_k = Y_inf[:, tpre_k_col] - Y_inf[:, y1_col]
548-
val += (1.0 / pi_inf) * _kernel_weighted_cov(inf_tpre_j, inf_tpre_k, W_inf)
640+
inf_tpre_j = Y_inf[:, tpre_j_col] - Y_inf[:, y1_col]
641+
inf_tpre_k = Y_inf[:, tpre_k_col] - Y_inf[:, y1_col]
642+
val += s_inf * _kernel_weighted_cov(inf_tpre_j, inf_tpre_k, W_inf)
549643
else:
550-
pi_gp_j = cohort_fractions.get(gp_j, 0.0)
551-
if pi_gp_j > 0:
552-
if gp_j not in W_gp_cache:
553-
X_gp = covariate_matrix[cohort_masks[gp_j]]
554-
W_gp_cache[gp_j] = _kernel_weights_matrix(
555-
covariate_matrix, X_gp, bandwidth
556-
)
557-
gp_outcomes_cache[gp_j] = outcome_wide[cohort_masks[gp_j]]
558-
W_gp = W_gp_cache[gp_j]
559-
Y_gp = gp_outcomes_cache[gp_j]
560-
gp_tpre_j = Y_gp[:, tpre_j_col] - Y_gp[:, y1_col]
561-
gp_tpre_k = Y_gp[:, tpre_k_col] - Y_gp[:, y1_col]
562-
val += (1.0 / pi_gp_j) * _kernel_weighted_cov(gp_tpre_j, gp_tpre_k, W_gp)
644+
s_gp_j = s_hat_cache.get(
645+
gp_j, np.full(n_units, 1.0 / max(cohort_fractions.get(gp_j, 1e-10), 1e-10))
646+
)
647+
if gp_j not in W_gp_cache:
648+
X_gp = covariate_matrix[cohort_masks[gp_j]]
649+
W_gp_cache[gp_j] = _kernel_weights_matrix(covariate_matrix, X_gp, bandwidth)
650+
gp_outcomes_cache[gp_j] = outcome_wide[cohort_masks[gp_j]]
651+
W_gp = W_gp_cache[gp_j]
652+
Y_gp = gp_outcomes_cache[gp_j]
653+
gp_tpre_j = Y_gp[:, tpre_j_col] - Y_gp[:, y1_col]
654+
gp_tpre_k = Y_gp[:, tpre_k_col] - Y_gp[:, y1_col]
655+
val += s_gp_j * _kernel_weighted_cov(gp_tpre_j, gp_tpre_k, W_gp)
563656

564657
omega[:, j, k] = val
565658
if j != k:

docs/methodology/REGISTRY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -671,7 +671,7 @@ where `q_{g,e} = pi_g / sum_{g' in G_{trt,e}} pi_{g'}`.
671671
- [x] Overlap diagnostics for propensity score ratios
672672
- **Note:** Sieve ratio estimation uses polynomial basis functions (total degree up to K) with AIC/BIC model selection. The paper describes sieve estimators generally without specifying a particular basis family; polynomial sieves are a standard choice (Section 4, Eq 4.2). Negative sieve ratio predictions are clipped to a small positive value since the population ratio p_g(X)/p_{g'}(X) is non-negative.
673673
- **Note:** Kernel-smoothed conditional covariance Omega*(X) uses Gaussian kernel with Silverman's rule-of-thumb bandwidth by default. The paper specifies kernel smoothing (step 5, Section 4) without mandating a particular kernel or bandwidth selection method.
674-
- **Note (deviation from source):** The conditional covariance Omega*(X) scales each term by unconditional cohort fractions pi_g rather than conditional generalized propensities p_g(X) as in Eq 3.12. Implementing the full conditional propensity scaling requires per-unit group probability estimation (algorithm step 4: s_hat_{g'}(X) = 1/p_{g'}(X) via convex minimization), which is deferred. The unconditional-pi approximation is consistent under double robustness but does not achieve the full conditional efficiency bound of Eq 3.12.
674+
- **Note:** Conditional covariance Omega*(X) scales each term by per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X) (algorithm step 4), matching Eq 3.12. The inverse propensity estimation uses the same polynomial sieve convex minimization as the ratio estimator.
675675

676676
---
677677

0 commit comments

Comments
 (0)