You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
19
19
- SyntheticDiD `variance_method="bootstrap"` now computes p-values from the analytical normal-theory formula using the bootstrap SE (matching R's `synthdid::vcov()` convention), rather than an empirical null-distribution formula that is not valid for bootstrap draws. `is_significant` and `significance_stars` are derived from `p_value` and will also change for bootstrap fits. Placebo and jackknife are unchanged. Point estimates are unaffected.
20
20
- SyntheticDiD bootstrap SE formula applies the `sqrt((r-1)/r)` correction matching R's synthdid and the placebo SE formula.
21
21
- SyntheticDiD bootstrap now retries degenerate resamples (all-control or all-treated, or non-finite `τ_b`) until exactly `n_bootstrap` valid replicates are accumulated, matching R's `synthdid::bootstrap_sample` and Arkhangelsky et al. (2021) Algorithm 2. Previously the Python path counted attempts (with degenerate draws silently dropped), producing fewer valid replicates than requested. A bounded-attempt guard (`20 × n_bootstrap`) prevents pathological-input hangs.
22
-
- **TROP global bootstrap SE backend parity under fixed seed** — Rust and Python backends now produce bit-identical bootstrap SE under the same `seed`. Previously Rust's `bootstrap_trop_variance_global` seeded `rand_xoshiro::Xoshiro256PlusPlus` per replicate while Python's fallback consumed `numpy.random.default_rng` (PCG64), producing ~28% SE divergence on tiny panels under `seed=42`. Fixed by extracting a shared `stratified_bootstrap_indices` helper in `diff_diff/bootstrap_utils.py` that pre-generates per-replicate stratified sample indices via numpy on the Python side; both backends consume the same integer arrays through the PyO3 surface. Sampling law (stratified: controls then treated, with replacement) is unchanged. Closes silent-failures audit finding #23 (bootstrap half; grid-search half closed in PR #348). Local-method TROP also adopts the Python-canonical index contract for the RNG layer, but separately-discovered backend divergences (Rust normalizes weight-matrix outer product, Python `_compute_observation_weights` reads stale `_precomputed` cache) prevent local-method bit-identity SE; tracked as a follow-up in `TODO.md`.
22
+
-**TROP global bootstrap SE backend parity under fixed seed** — Rust and Python backends now produce bit-identical bootstrap SE under the same `seed`. Previously Rust's `bootstrap_trop_variance_global` seeded `rand_xoshiro::Xoshiro256PlusPlus` per replicate while Python's fallback consumed `numpy.random.default_rng` (PCG64), producing ~28% SE divergence on tiny panels under `seed=42`. Fixed by extracting a shared `stratified_bootstrap_indices` helper in `diff_diff/bootstrap_utils.py` that pre-generates per-replicate stratified sample indices via numpy on the Python side; both backends consume the same integer arrays through the PyO3 surface. Sampling law (stratified: controls then treated, with replacement) is unchanged. Closes the bootstrap-RNG half of silent-failures audit finding #23 (grid-search half closed in PR #348; local-method methodology half closed by the two Fixed entries below). Local-method TROP also adopts the Python-canonical index contract for the RNG layer here.
23
+
- **TROP local-method Rust weight-matrix no longer normalized** — `rust/src/trop.rs::compute_weight_matrix` no longer divides time-weights or unit-weights by their respective sums before the outer product. The paper's Equation 2/3 (Athey, Imbens, Qu, Viviano 2025) and REGISTRY.md Requirements checklist (line 2037: `[x] Unit weights: exp(-λ_unit × distance) (unnormalized, matching Eq. 2)`) both specify raw-exponential weights; Python's `_compute_observation_weights` was already REGISTRY-compliant. **User-visible effect**: Rust local-method ATT values may shift for any fit with `lambda_nn < infinity` — normalizing the weight-matrix inflated the effective nuclear-norm penalty relative to the data-fit term, changing the regularization trade-off. For `lambda_nn = infinity` (factor model disabled) outputs are unchanged because uniform weight scaling leaves the minimum-norm WLS argmin invariant. Rust LOOCV-selected lambdas may also shift on this boundary; both backends now converge on the same REGISTRY-compliant selection.
24
+
-**TROP local-method Python `_compute_observation_weights` now uses the function-argument `Y, D` and treats all non-target units as donors** — two coupled changes that bring Python structurally in line with Rust and the paper's Eq. 2/3:
25
+
1. Removed the `if self._precomputed is not None:` branch that silently substituted `self._precomputed["Y"]` / `["D"]` / `["time_dist_matrix"]` (original-panel cache populated during main fit) for the function-argument `Y, D`. Under bootstrap, `_fit_with_fixed_lambda` computes fresh `Y, D` from the resampled `boot_data` and passes them in; the helper was discarding those and recomputing unit distances from the original panel, so Python's local bootstrap resampled units but reused stale unit-distance weights. Rust's bootstrap was already correct (always consumed `y_boot, d_boot`).
26
+
2. Removed the `valid_control_at_t = D[t, :] == 0` target-period donor gate that zeroed `ω_j` for any unit `j` treated at the target period (other than the target unit itself). Per REGISTRY Eq. 2/3 and Rust's `compute_weight_matrix`, `ω_j = exp(-λ_unit × dist(j, i))` for all `j ≠ i`; treated-cell exclusion happens via the `(1 − W_{js})` factor applied inside `_estimate_model`. Same-cohort donors now contribute via their pre-treatment rows. Empirically the main-fit ATT is unchanged on tested fixtures because same-cohort pre-treatment observations are exactly absorbed by their own unit fixed effect `alpha_j` without propagating into `mu`, `beta`, or other units' parameters — so this change is structural alignment rather than a numerical shift in output. Users on same-cohort panels with very few controls may still see tiny differences in edge cases; the new `test_local_method_same_cohort_donor_parity` regression guards the aligned behavior.
27
+
Together with the normalization fix above, TROP local-method backend parity on the main-fit ATT is regime-dependent: `atol=rtol=1e-14` for `lambda_nn=inf` (no nuclear-norm regularization, uniform weight scaling leaves the WLS argmin invariant) and `atol=1e-10` for finite `lambda_nn` (FISTA inner loop + BLAS reduction ordering introduce sub-1e-10 roundoff across Rust `faer` vs numpy paths). Bootstrap SE parity is asserted at `atol=1e-5` to accommodate ~1e-7 roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq` that accumulates across per-replicate fits; sub-1e-14 bootstrap parity is tracked as a follow-up in `TODO.md` under "unify Rust local-method solver path". Closes silent-failures audit finding #23 (local-method half; the RNG half closed in PR #354 and the grid-search half in PR #348).
23
28
24
29
### Changed
25
30
-**`did_had_pretest_workflow(aggregate="event_study")` verdict no longer emits the "paper step 2 deferred to Phase 3 follow-up" caveat** — the joint pre-trends Stute test closes that gap. The two-period `aggregate="overall"` path retains the existing caveat since the joint variant does not apply to single-pre-period panels. Downstream code that greps verdict strings for the Phase 3 caveat will see it suppressed on the event-study path.
Copy file name to clipboardExpand all lines: TODO.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,8 +83,8 @@ Deferred items from PR reviews that were not addressed before merge.
83
83
| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. |`linalg.py::_compute_cr2_bm`| Phase 1a | Medium |
84
84
| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. |`benchmarks/R/generate_clubsandwich_golden.R`| Phase 1a | Medium |
85
85
|`honest_did.py:1907``np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. |`honest_did.py`|#334| Low |
86
-
| TROP local-method Rust vs Python bootstrap SE divergence beyond RNG: with stratified indices now Python-canonical (closes RNG axis-H), local-method SE still diverges by 10-20% on tiny panels. Two downstream causes: (a) Rust `compute_weight_matrix` (`rust/src/trop.rs:573-606`) normalizes time_weights and unit_weights to sum to 1 before the outer product; Python `_compute_observation_weights` (`diff_diff/trop_local.py:489`) does not normalize. (b) Python `_compute_observation_weights` reads `self._precomputed["Y"]`, `["D"]`, `["time_dist_matrix"]` (lines 423-458) — original-panel cache — rather than the bootstrap-sample data passed via the function arguments, so unit-distance computation in the Python fallback uses stale data. `@pytest.mark.xfail(strict=True)` in `tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local` baselines the gap across `seeds=[0, 42, 12345]`. Affects local-method TROP backend parity broadly, not just bootstrap. | `trop_local.py`, `rust/src/trop.rs` | follow-up | Medium |
87
-
| Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #NNN): pre-generate weights in Python via numpy and pass them to Rust through PyO3. |`rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py`| follow-up | Medium |
86
+
| Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. |`rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd`| follow-up | Low |
87
+
| Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #354): pre-generate weights in Python via numpy and pass them to Rust through PyO3. |`rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py`| follow-up | Medium |
88
88
|`bias_corrected_local_linear`: extend golden parity to `kernel="triangular"` and `kernel="uniform"` (currently epa-only; all three kernels share `kernel_W` and the `lprobust` math, so parity is expected but not separately asserted). |`benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py`| Phase 1c | Low |
89
89
|`bias_corrected_local_linear`: expose `vce in {"hc0", "hc1", "hc2", "hc3"}` on the public wrapper once R parity goldens exist (currently raises `NotImplementedError`). The port-level `lprobust` and `lprobust_res` already support all four; expanding the public surface requires a golden generator for each hc mode and a decision on hc2/hc3 q-fit leverage (R reuses p-fit `hii` for q-fit residuals; whether to match that or stage-match deserves a derivation before the wrapper advertises CCT-2014 conformance). |`diff_diff/local_linear.py::bias_corrected_local_linear`, `benchmarks/R/generate_nprobust_lprobust_golden.R`, `tests/test_bias_corrected_lprobust.py`| Phase 1c | Medium |
90
90
|`bias_corrected_local_linear`: support `weights=` once survey-design adaptation lands. nprobust's `lprobust` has no weight argument so there is no parity anchor; derivation needed. |`diff_diff/local_linear.py`, `diff_diff/_nprobust_port.py::lprobust`| Phase 1c | Medium |
0 commit comments