Skip to content

Commit e97ca9d

Browse files
igerberclaude
andcommitted
Update docs and docstrings for PR #165 review feedback
Address 5 AI review items: update Methodology Registry for bootstrap_weights and SunAbraham param removal, fix bootstrap result docstrings, add memory guidance for dense .toarray(), and document n_bootstrap minimum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2f1d5df commit e97ca9d

6 files changed

Lines changed: 10 additions & 7 deletions

File tree

diff_diff/imputation_results.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class ImputationBootstrapResults:
3333
n_bootstrap : int
3434
Number of bootstrap iterations.
3535
weight_type : str
36-
Type of bootstrap weights (currently "rademacher" only).
36+
Type of bootstrap weights: "rademacher", "mammen", or "webb".
3737
alpha : float
3838
Significance level used for confidence intervals.
3939
overall_att_se : float

diff_diff/trop.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ class TROP:
9393
alpha : float, default=0.05
9494
Significance level for confidence intervals.
9595
n_bootstrap : int, default=200
96-
Number of bootstrap replications for variance estimation.
96+
Number of bootstrap replications for variance estimation. Must be >= 2.
9797
seed : int, optional
9898
Random seed for reproducibility.
9999

diff_diff/two_stage.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1225,6 +1225,8 @@ def _compute_gmm_variance(
12251225
# Convert sparse to dense once for efficient cluster aggregation.
12261226
# Total memory touched is identical to per-column .getcol().toarray();
12271227
# only peak allocation differs (full matrix vs one column at a time).
1228+
# For panels with >100K FE columns, consider reverting to per-column
1229+
# .getcol() to limit peak memory.
12281230
weighted_X10_dense = weighted_X10.toarray()
12291231
c_by_cluster = np.zeros((G, p))
12301232
for j_col in range(p):

diff_diff/two_stage_bootstrap.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,8 @@ def _compute_cluster_S_scores(
106106
unique_clusters, cluster_indices = np.unique(cluster_ids, return_inverse=True)
107107
G = len(unique_clusters)
108108

109-
# Convert sparse to dense once (see _compute_gmm_variance for memory note)
109+
# Convert sparse to dense once (see _compute_gmm_variance for memory note).
110+
# For panels with >100K FE columns, consider per-column .getcol() instead.
110111
weighted_X10_dense = weighted_X10.toarray()
111112
c_by_cluster = np.zeros((G, p))
112113
for j_col in range(p):

diff_diff/two_stage_results.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ class TwoStageBootstrapResults:
3434
n_bootstrap : int
3535
Number of bootstrap iterations.
3636
weight_type : str
37-
Type of bootstrap weights (currently "rademacher" only).
37+
Type of bootstrap weights: "rademacher", "mammen", or "webb".
3838
alpha : float
3939
Significance level used for confidence intervals.
4040
overall_att_se : float

docs/methodology/REGISTRY.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ where weights ŵ_{g,e} = n_{g,e} / Σ_g n_{g,e} (sample share of cohort g at eve
427427
- Extrapolation beyond observed event-times: not estimated
428428
- Event-time range: no artificial cap (estimates all available relative times, matching R's `fixest::sunab()`)
429429
- No post-treatment effects: returns `(NaN, NaN)` for overall ATT/SE; all inference fields (t_stat, p_value, conf_int) propagate NaN via `np.isfinite()` guards
430-
- `min_pre_periods`/`min_post_periods` parameters: deprecated (accepted but ignored, emit `FutureWarning`)
430+
- `min_pre_periods`/`min_post_periods` parameters: removed (previously deprecated with `FutureWarning`; callers passing these will now get `TypeError`)
431431
- Variance fallback: when full weight vector cannot be constructed for overall ATT SE, uses simplified variance (ignores covariances between periods) with `UserWarning`
432432
- Rank-deficient design matrix (covariate collinearity):
433433
- Detection: Pivoted QR decomposition with tolerance `1e-07` (R's `qr()` default)
@@ -545,7 +545,7 @@ Y_it = alpha_i + beta_t [+ X'_it * delta] + W'_it * gamma + epsilon_it
545545
- **treatment_effects DataFrame weights:** `weight` column uses `1/n_valid` for finite tau_hat and 0 for NaN tau_hat, consistent with the ATT estimand.
546546
- **Rank-deficient covariates in variance:** Covariates with NaN coefficients (dropped for rank deficiency in Step 1) are excluded from the variance design matrices `A_0`/`A_1`. Only covariates with finite coefficients participate in the `v_it` projection.
547547
- **Sparse variance solver:** `_compute_v_untreated_with_covariates` uses `scipy.sparse.linalg.spsolve` to solve `(A_0'A_0) z = A_1'w` without densifying the normal equations matrix. Falls back to dense `lstsq` if the sparse solver fails.
548-
- **Bootstrap inference:** Uses multiplier bootstrap on the Theorem 3 influence function: `psi_i = sum_t v_it * epsilon_tilde_it`. Cluster-level psi sums are pre-computed for each aggregation target (overall, per-horizon, per-group), then perturbed with Rademacher weights. This is a library extension (not in the paper) consistent with CallawaySantAnna/SunAbraham bootstrap patterns.
548+
- **Bootstrap inference:** Uses multiplier bootstrap on the Theorem 3 influence function: `psi_i = sum_t v_it * epsilon_tilde_it`. Cluster-level psi sums are pre-computed for each aggregation target (overall, per-horizon, per-group), then perturbed with multiplier weights (Rademacher by default; configurable via `bootstrap_weights` parameter to use Mammen or Webb weights, matching CallawaySantAnna). This is a library extension (not in the paper) consistent with CallawaySantAnna/SunAbraham bootstrap patterns.
549549
- **Auxiliary residuals (Equation 8):** Uses v_it-weighted tau_tilde_g formula: `tau_tilde_g = sum(v_it * tau_hat_it) / sum(v_it)` within each partition group. Zero-weight groups (common in event-study SE computation) fall back to unweighted mean.
550550

551551
**Reference implementation(s):**
@@ -610,7 +610,7 @@ where `psi_i` is the stacked influence function for unit i across all its observ
610610

611611
*Bootstrap:*
612612

613-
Our implementation uses multiplier bootstrap on the GMM influence function: cluster-level `psi` sums are pre-computed, then perturbed with Rademacher weights. The R `did2s` package defaults to block bootstrap (resampling clusters with replacement). Both approaches are asymptotically valid; the multiplier bootstrap is computationally cheaper and consistent with the CallawaySantAnna/ImputationDiD bootstrap patterns in this library.
613+
Our implementation uses multiplier bootstrap on the GMM influence function: cluster-level `psi` sums are pre-computed, then perturbed with multiplier weights (Rademacher by default; configurable via `bootstrap_weights` parameter to use Mammen or Webb weights, matching CallawaySantAnna). The R `did2s` package defaults to block bootstrap (resampling clusters with replacement). Both approaches are asymptotically valid; the multiplier bootstrap is computationally cheaper and consistent with the CallawaySantAnna/ImputationDiD bootstrap patterns in this library.
614614

615615
*Edge cases:*
616616
- **Always-treated units:** Units treated in all observed periods have no untreated observations for Stage 1 FE estimation. These are excluded with a warning listing the affected unit IDs. Their treated observations do NOT contribute to Stage 2.

0 commit comments

Comments
 (0)