Skip to content

Commit 6b7ba9f

Browse files
igerberclaude
andcommitted
Precompute stratum-PSU scaffolding in aggregate_survey
The per-cell Taylor-series variance inside aggregate_survey previously rebuilt stratum-PSU scaffolding (np.unique, per-stratum pandas groupby, stratum FPC lookup) on every output cell. At BRFSS scale (50 states x 10 years = 500 cells, 20 strata, 1M microdata rows) this was ~10K pandas groupby constructions, each summing a mostly-zero psi vector and paying full pandas setup cost — the entire chain's runtime. This PR adds a frozen _PsuScaffolding dataclass plus private _precompute_psu_scaffolding(resolved) and _compute_if_variance_fast( psi, scf) helpers in diff_diff/survey.py. aggregate_survey builds scaffolding once per design and threads it through _cell_mean_variance via a new optional kwarg; the fast path replaces the per-stratum groupby loop with two vectorized np.bincount passes (psi → PSU sums, PSU sums → per-stratum first and second moments) plus a closed-form meat = sum_h adjustment_h * centered_ss_h. Scope is deliberately localized: _compute_stratified_psu_meat and compute_survey_if_variance are unchanged, so every other TSL caller (DiD, TWFE, CS, SunAbraham, dCDH, etc.) is unaffected. Replicate- weight designs continue to route through compute_replicate_if_variance unchanged. Measured impact (benchmarks/speed_review/run_all.py, 1M rows BRFSS): - Large: 24.4s → 1.33s (Python), 24.9s → 1.32s (Rust) [18.4-19.0x] - Medium: 6.1s → 0.49s [12.5-13.2x] - Small: 1.6s → 0.17s [7.6-10x] No regression in any other scenario (all within run-to-run noise). Numerical equivalence: new TestAggregateSurveyScaffolding asserts assert_allclose(atol=1e-14, rtol=1e-14) between fast and legacy paths across seven design cases — stratified+PSU+FPC, stratified no FPC, PSU-only, weights-only, and all three lonely_psu modes (remove / certainty / adjust) — plus structural tests on the scaffolding itself. On the actual BRFSS-large 1M-row panel, y_mean is bit-identical and y_se / y_precision drift at ~1 ULP (max relative diff 4.6e-16). Existing coverage unchanged: all 43 TestAggregateSurvey tests green on the fast path (new default); all 129 test_survey.py tests green. Documentation: - docs/performance-plan.md: finding #1 rewritten ("practitioner-fast at every scale"), BRFSS bullet updated, hotspots row #1 marked LANDED, memory finding updated, priority table item #1 marked LANDED, new "Optimization landed" subsection, bottom line updated ("no practitioner-perceptible bottleneck remains"). Auto-tables regenerated via gen_findings_tables.py. - CHANGELOG.md: new Performance entry under [Unreleased]. No user-facing API change. Methodology docs (REGISTRY.md, survey- theory.md) are deliberately not touched: this is a pure internal performance optimization with numerics preserved to sub-ULP tolerance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e834a48 commit 6b7ba9f

32 files changed

Lines changed: 906 additions & 347 deletions

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Performance
11+
- **`aggregate_survey` stratum-PSU scaffolding precompute** — the per-cell Taylor-series variance inside `aggregate_survey` no longer rebuilds stratum-PSU scaffolding on every cell. A frozen `_PsuScaffolding` (strata codes, global PSU codes unique across strata, per-stratum counts and FPC ratios, singleton mask, static legitimate-zero counts and variance-computable flag) is precomputed once per design at the top of `aggregate_survey` and threaded through `_cell_mean_variance` to a new `_compute_if_variance_fast` path that replaces the per-stratum pandas groupby with two vectorized `np.bincount` passes. BRFSS-shaped 50-state × 10-year × 1M-row microdata → state-year panel drops from ~24s to sub-2s under both backends (the path is pure Python, so Python and Rust track each other). Numerical output is preserved to sub-ULP tolerance; seven-case equivalence tests (`TestAggregateSurveyScaffolding`) assert `assert_allclose(atol=1e-14, rtol=1e-14)` between fast and legacy paths across stratified+PSU+FPC, stratified no FPC, PSU-only, weights-only, and all three `lonely_psu` modes (remove / certainty / adjust). Replicate-weight designs continue to route through `compute_replicate_if_variance` unchanged. `_compute_stratified_psu_meat` is untouched — all other TSL callers (DiD / TWFE / CS / etc.) are unaffected.
12+
1013
### Changed
1114
- Add Zenodo DOI badge to README; upgrade the BibTeX citation block with the concept DOI (`10.5281/zenodo.19646175`) and list author as Isaac Gerber (matching `CITATION.cff`). Add `doi:` and `identifiers:` entries (concept + versioned) to `CITATION.cff`. DOI was minted by Zenodo when v3.1.3 was released.
1215
- **`ChaisemartinDHaultfoeuille` heterogeneity + within-group-varying PSU/strata now supported under Binder TSL** - `fit(heterogeneity=..., survey_design=...)` no longer raises `NotImplementedError` when the resolved design's PSU or strata vary across the cells of a group. On the **Binder TSL** branch (`compute_survey_if_variance`), the heterogeneity WLS coefficient IF is expanded to observation level via the cell-period allocator `ψ_i = ψ_g * (w_i / W_{g, out_idx})` on the post-period cell — the DID_l post-period single-cell convention shipped in v3.1.x. Under PSU=group the PSU-level Binder TSL variance is byte-identical to the previous release (PSU-level aggregate telescopes to `ψ_g`); under within-group-varying PSU, mass lands in the post-period PSU of the transition. The **Rao-Wu replicate-weight** branch (`compute_replicate_if_variance`) retains the legacy group-level allocator `ψ_i = ψ_g * (w_i / W_g)`: replicate variance computes `θ_r = sum_i ratio_ir * ψ_i` at observation level and is therefore not PSU-telescoping, so the cell-period allocator would silently change the replicate SE whenever a replicate column's ratios vary within group (e.g., per-row replicate matrices). Replicate + heterogeneity fits therefore produce byte-identical SE to the previous release, and the newly-unblocked `heterogeneity=` + within-group-varying PSU combination is unreachable under replicate designs by construction (`SurveyDesign` rejects `replicate_weights` combined with explicit `strata/psu/fpc`).

benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,47 @@
22
"scenario": "brand_awareness_survey_large",
33
"backend": "python",
44
"has_rust_backend": false,
5-
"total_seconds": 1.0910496250000001,
5+
"total_seconds": 0.8670909579999999,
66
"memory": {
77
"available": true,
8-
"start_mb": 188.45,
9-
"peak_mb": 327.44,
10-
"growth_mb": 138.98,
8+
"start_mb": 200.7,
9+
"peak_mb": 340.16,
10+
"growth_mb": 139.45,
1111
"sampler_interval_s": 0.01
1212
},
1313
"phases": {
1414
"1_naive_fit_no_survey_design": {
15-
"seconds": 0.009826500000000182,
15+
"seconds": 0.01288558399999995,
1616
"ok": true,
1717
"error": null
1818
},
1919
"2_tsl_strata_psu_fpc": {
20-
"seconds": 0.030280333999999964,
20+
"seconds": 0.03156662499999996,
2121
"ok": true,
2222
"error": null
2323
},
2424
"3_replicate_weights_jk1": {
25-
"seconds": 0.6243122919999999,
25+
"seconds": 0.39469687499999995,
2626
"ok": true,
2727
"error": null
2828
},
2929
"4_multi_outcome_loop_3_metrics": {
30-
"seconds": 0.24174716599999968,
30+
"seconds": 0.22814783400000005,
3131
"ok": true,
3232
"error": null
3333
},
3434
"5_check_parallel_trends": {
35-
"seconds": 0.025623749999999834,
35+
"seconds": 0.04083812500000006,
3636
"ok": true,
3737
"error": null
3838
},
3939
"6_placebo_refit_pre_period": {
40-
"seconds": 0.01191299999999984,
40+
"seconds": 0.014936375000000002,
4141
"ok": true,
4242
"error": null
4343
},
4444
"7_event_study_plus_honest_did": {
45-
"seconds": 0.147335875,
45+
"seconds": 0.14401216700000008,
4646
"ok": true,
4747
"error": null
4848
}

benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,47 @@
22
"scenario": "brand_awareness_survey_large",
33
"backend": "rust",
44
"has_rust_backend": true,
5-
"total_seconds": 1.0000031249999999,
5+
"total_seconds": 0.9299781670000002,
66
"memory": {
77
"available": true,
8-
"start_mb": 194.03,
9-
"peak_mb": 336.08,
10-
"growth_mb": 142.05,
8+
"start_mb": 190.2,
9+
"peak_mb": 347.92,
10+
"growth_mb": 157.72,
1111
"sampler_interval_s": 0.01
1212
},
1313
"phases": {
1414
"1_naive_fit_no_survey_design": {
15-
"seconds": 0.013511041000000112,
15+
"seconds": 0.01335629100000002,
1616
"ok": true,
1717
"error": null
1818
},
1919
"2_tsl_strata_psu_fpc": {
20-
"seconds": 0.03037650000000003,
20+
"seconds": 0.0316900830000002,
2121
"ok": true,
2222
"error": null
2323
},
2424
"3_replicate_weights_jk1": {
25-
"seconds": 0.5431151669999998,
25+
"seconds": 0.46433058400000005,
2626
"ok": true,
2727
"error": null
2828
},
2929
"4_multi_outcome_loop_3_metrics": {
30-
"seconds": 0.21752962499999962,
30+
"seconds": 0.23703795799999994,
3131
"ok": true,
3232
"error": null
3333
},
3434
"5_check_parallel_trends": {
35-
"seconds": 0.04399687500000038,
35+
"seconds": 0.030673249999999985,
3636
"ok": true,
3737
"error": null
3838
},
3939
"6_placebo_refit_pre_period": {
40-
"seconds": 0.016433082999999904,
40+
"seconds": 0.011707583000000188,
4141
"ok": true,
4242
"error": null
4343
},
4444
"7_event_study_plus_honest_did": {
45-
"seconds": 0.13501837500000002,
45+
"seconds": 0.14117254200000007,
4646
"ok": true,
4747
"error": null
4848
}

benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,47 @@
22
"scenario": "brand_awareness_survey_medium",
33
"backend": "python",
44
"has_rust_backend": false,
5-
"total_seconds": 0.563283334,
5+
"total_seconds": 0.529578166,
66
"memory": {
77
"available": true,
8-
"start_mb": 133.69,
9-
"peak_mb": 187.7,
10-
"growth_mb": 54.02,
8+
"start_mb": 137.67,
9+
"peak_mb": 182.88,
10+
"growth_mb": 45.2,
1111
"sampler_interval_s": 0.01
1212
},
1313
"phases": {
1414
"1_naive_fit_no_survey_design": {
15-
"seconds": 0.010921792000000097,
15+
"seconds": 0.01053379199999993,
1616
"ok": true,
1717
"error": null
1818
},
1919
"2_tsl_strata_psu_fpc": {
20-
"seconds": 0.03732066599999995,
20+
"seconds": 0.032504792000000005,
2121
"ok": true,
2222
"error": null
2323
},
2424
"3_replicate_weights_jk1": {
25-
"seconds": 0.20805304199999997,
25+
"seconds": 0.16178545899999996,
2626
"ok": true,
2727
"error": null
2828
},
2929
"4_multi_outcome_loop_3_metrics": {
30-
"seconds": 0.12622899999999992,
30+
"seconds": 0.1744099589999999,
3131
"ok": true,
3232
"error": null
3333
},
3434
"5_check_parallel_trends": {
35-
"seconds": 0.01834783299999998,
35+
"seconds": 0.02328412499999999,
3636
"ok": true,
3737
"error": null
3838
},
3939
"6_placebo_refit_pre_period": {
40-
"seconds": 0.054030583000000076,
40+
"seconds": 0.06313762499999998,
4141
"ok": true,
4242
"error": null
4343
},
4444
"7_event_study_plus_honest_did": {
45-
"seconds": 0.10836029199999997,
45+
"seconds": 0.06389345899999999,
4646
"ok": true,
4747
"error": null
4848
}

benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,47 @@
22
"scenario": "brand_awareness_survey_medium",
33
"backend": "rust",
44
"has_rust_backend": true,
5-
"total_seconds": 0.5500554579999999,
5+
"total_seconds": 0.50248775,
66
"memory": {
77
"available": true,
8-
"start_mb": 135.36,
9-
"peak_mb": 184.86,
10-
"growth_mb": 49.5,
8+
"start_mb": 133.94,
9+
"peak_mb": 189.34,
10+
"growth_mb": 55.41,
1111
"sampler_interval_s": 0.01
1212
},
1313
"phases": {
1414
"1_naive_fit_no_survey_design": {
15-
"seconds": 0.011186999999999947,
15+
"seconds": 0.010962209,
1616
"ok": true,
1717
"error": null
1818
},
1919
"2_tsl_strata_psu_fpc": {
20-
"seconds": 0.03363270800000007,
20+
"seconds": 0.03478112499999997,
2121
"ok": true,
2222
"error": null
2323
},
2424
"3_replicate_weights_jk1": {
25-
"seconds": 0.18678066699999996,
25+
"seconds": 0.13834324999999992,
2626
"ok": true,
2727
"error": null
2828
},
2929
"4_multi_outcome_loop_3_metrics": {
30-
"seconds": 0.16038787500000007,
30+
"seconds": 0.1290292500000001,
3131
"ok": true,
3232
"error": null
3333
},
3434
"5_check_parallel_trends": {
35-
"seconds": 0.022171542000000155,
35+
"seconds": 0.02951112499999997,
3636
"ok": true,
3737
"error": null
3838
},
3939
"6_placebo_refit_pre_period": {
40-
"seconds": 0.0532650830000001,
40+
"seconds": 0.06002304200000008,
4141
"ok": true,
4242
"error": null
4343
},
4444
"7_event_study_plus_honest_did": {
45-
"seconds": 0.08262075000000002,
45+
"seconds": 0.09981400000000007,
4646
"ok": true,
4747
"error": null
4848
}

benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,47 @@
22
"scenario": "brand_awareness_survey_small",
33
"backend": "python",
44
"has_rust_backend": false,
5-
"total_seconds": 0.19338629200000002,
5+
"total_seconds": 0.22668149999999998,
66
"memory": {
77
"available": true,
8-
"start_mb": 115.48,
9-
"peak_mb": 127.31,
10-
"growth_mb": 11.83,
8+
"start_mb": 115.44,
9+
"peak_mb": 130.16,
10+
"growth_mb": 14.72,
1111
"sampler_interval_s": 0.01
1212
},
1313
"phases": {
1414
"1_naive_fit_no_survey_design": {
15-
"seconds": 0.0014470410000000378,
15+
"seconds": 0.00165958300000002,
1616
"ok": true,
1717
"error": null
1818
},
1919
"2_tsl_strata_psu_fpc": {
20-
"seconds": 0.0072707499999999925,
20+
"seconds": 0.006191999999999975,
2121
"ok": true,
2222
"error": null
2323
},
2424
"3_replicate_weights_jk1": {
25-
"seconds": 0.023173292000000068,
25+
"seconds": 0.02364570900000007,
2626
"ok": true,
2727
"error": null
2828
},
2929
"4_multi_outcome_loop_3_metrics": {
30-
"seconds": 0.03375529200000005,
30+
"seconds": 0.07623400000000002,
3131
"ok": true,
3232
"error": null
3333
},
3434
"5_check_parallel_trends": {
35-
"seconds": 0.01041325000000004,
35+
"seconds": 0.009393082999999969,
3636
"ok": true,
3737
"error": null
3838
},
3939
"6_placebo_refit_pre_period": {
40-
"seconds": 0.027520249999999913,
40+
"seconds": 0.02586829199999996,
4141
"ok": true,
4242
"error": null
4343
},
4444
"7_event_study_plus_honest_did": {
45-
"seconds": 0.08979433299999995,
45+
"seconds": 0.08367512499999996,
4646
"ok": true,
4747
"error": null
4848
}

benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,47 @@
22
"scenario": "brand_awareness_survey_small",
33
"backend": "rust",
44
"has_rust_backend": true,
5-
"total_seconds": 0.19669587500000008,
5+
"total_seconds": 0.198891041,
66
"memory": {
77
"available": true,
8-
"start_mb": 114.78,
9-
"peak_mb": 127.91,
10-
"growth_mb": 13.12,
8+
"start_mb": 115.05,
9+
"peak_mb": 127.78,
10+
"growth_mb": 12.73,
1111
"sampler_interval_s": 0.01
1212
},
1313
"phases": {
1414
"1_naive_fit_no_survey_design": {
15-
"seconds": 0.0016678749999999853,
15+
"seconds": 0.0019442080000000583,
1616
"ok": true,
1717
"error": null
1818
},
1919
"2_tsl_strata_psu_fpc": {
20-
"seconds": 0.005756874999999995,
20+
"seconds": 0.006045499999999926,
2121
"ok": true,
2222
"error": null
2323
},
2424
"3_replicate_weights_jk1": {
25-
"seconds": 0.012066042000000055,
25+
"seconds": 0.02063908400000003,
2626
"ok": true,
2727
"error": null
2828
},
2929
"4_multi_outcome_loop_3_metrics": {
30-
"seconds": 0.05887395800000006,
30+
"seconds": 0.05060483399999993,
3131
"ok": true,
3232
"error": null
3333
},
3434
"5_check_parallel_trends": {
35-
"seconds": 0.008938375000000054,
35+
"seconds": 0.009498208000000008,
3636
"ok": true,
3737
"error": null
3838
},
3939
"6_placebo_refit_pre_period": {
40-
"seconds": 0.0274049999999999,
40+
"seconds": 0.025947834000000003,
4141
"ok": true,
4242
"error": null
4343
},
4444
"7_event_study_plus_honest_did": {
45-
"seconds": 0.08197737500000002,
45+
"seconds": 0.08419849999999995,
4646
"ok": true,
4747
"error": null
4848
}

0 commit comments

Comments
 (0)