Skip to content

Commit 8d5eae8

Browse files
igerberclaude
andcommitted
Address CI review; add memory tracking and BRFSS allocator attribution
Addresses the four CI review findings: - BRR -> JK1 rename. generate_survey_did_data(include_replicate_weights= True) emits JK1 delete-one-PSU weights per prep.py:1248; Scenario 2 was labeling them as BRR, which uses a different variance formula. Fixed script, phase label, scenario doc data-shape text, and example code snippet. - Exit-code propagation. run_scenario now records a module-level failure flag; an atexit handler os._exit(1)s if any phase recorded ok=False. run_all.py's subprocess return-code check now reliably surfaces phase failures. Verified with a forced-failure harness test. - Path references. bench_shared.py and run_all.py docstrings plus performance-plan.md prose normalized to benchmarks/speed_review/baselines/. - Contributor README. "Commit HTMLs" instruction removed; flame HTMLs are gitignored and regenerated per run. Adds memory measurement: - psutil background RSS sampler (10ms) in run_scenario writes a memory field to every scenario JSON: start, peak, growth-during-run. Zero timing impact (background thread, single-syscall samples). - mem_profile_brfss.py - standalone tracemalloc allocator attribution for the BRFSS-1M scenario. Separate from the timing harness so its 2-5x overhead does not contaminate wall-clock baselines. Memory findings extend the optimization priority list without changing the #1 recommendation. Headline insight: BRFSS aggregate_survey at 1M rows grows only 23 MB of working memory (vs 46 MB input), and tracemalloc's net-retained allocation is 0.6 MB. The 24-second cost is pure CPU - confirms the precompute-scaffolding fix is low-risk and fits in any deployment target including 512 MB Lambda. Secondary finding: staggered CS chain allocates 252-322 MB at 1,500 units (peak RSS 486-589 MB). Fine for workstations, tight for Lambda- tier deployments. Flagged as a lower-priority follow-up. Still measurement only. No changes under diff_diff/ or rust/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 33d55fb commit 8d5eae8

35 files changed

Lines changed: 809 additions & 252 deletions

benchmarks/speed_review/README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,22 +19,31 @@ at data shapes anchored to applied-econ conventions.
1919
```
2020
benchmarks/speed_review/
2121
├── README.md # this file
22-
├── bench_shared.py # timing + pyinstrument harness
22+
├── bench_shared.py # timing + pyinstrument + RSS harness
2323
├── run_all.py # orchestrator (both backends)
2424
├── bench_campaign_staggered.py # Scenario 1: CS + 8-step chain
2525
├── bench_brand_awareness_survey.py # Scenario 2: DiD + SurveyDesign
2626
├── bench_brfss_panel.py # Scenario 3: aggregate_survey -> CS
2727
├── bench_geo_few_markets.py # Scenario 4: SDiD + jackknife
2828
├── bench_reversible_dcdh.py # Scenario 5: dCDH L_max + TSL
2929
├── bench_dose_response.py # Scenario 6: ContinuousDiD splines
30+
├── mem_profile_brfss.py # tracemalloc allocator attribution
31+
│ # for BRFSS-1M (standalone)
3032
├── bench_callaway.py # pre-existing CS scaling sweep
3133
├── baseline_results.json # pre-existing CS baseline
3234
└── baselines/ # this effort's output
33-
├── <scenario>_<backend>.json # phase-level wall-clock (committed)
35+
├── <scenario>_<backend>.json # phase-level wall-clock + peak RSS
36+
├── mem_profile_brfss_large_<backend>.txt # tracemalloc top-N sites
3437
└── profiles/ # flame HTMLs (gitignored)
3538
└── <scenario>_<backend>.html # pyinstrument flame output
3639
```
3740

41+
Each JSON baseline records both timing (per-phase wall-clock) and memory
42+
(start/peak/growth from a psutil background sampler at 10 ms). The
43+
`mem_profile_brfss.py` script does a separate tracemalloc pass on the
44+
BRFSS-1M scenario - this is kept out of the main timing harness because
45+
tracemalloc has 2-5x overhead and would contaminate wall-clock baselines.
46+
3847
**Note on profile HTMLs.** pyinstrument flames are ~500KB-1.2MB each and are
3948
regenerated on every run; they live under `baselines/profiles/` which is
4049
gitignored. The key hotspots identified from them are already captured in
@@ -77,6 +86,7 @@ the findings doc is the decision output.
7786
2. Add `bench_<name>.py` following the existing scripts: build data, define
7887
`phases` as a list of `(label, callable)` tuples, call `run_scenario`.
7988
3. Register it in `run_all.py`'s `SCRIPTS` dict.
80-
4. Run under both backends, commit the refreshed `baselines/*.json` and the
81-
corresponding `baselines/profiles/*.html`.
89+
4. Run under both backends and commit the refreshed `baselines/*.json`.
90+
The `baselines/profiles/*.html` flame HTMLs are gitignored and
91+
regenerated per run - do not commit them.
8292
5. Add a per-scenario finding paragraph to `docs/performance-plan.md`.

benchmarks/speed_review/baselines/brand_awareness_survey_large_python.json

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,47 @@
22
"scenario": "brand_awareness_survey_large",
33
"backend": "python",
44
"has_rust_backend": false,
5-
"total_seconds": 0.7940070000000001,
5+
"total_seconds": 0.9061127080000002,
6+
"memory": {
7+
"available": true,
8+
"start_mb": 189.33,
9+
"peak_mb": 335.59,
10+
"growth_mb": 146.27,
11+
"sampler_interval_s": 0.01
12+
},
613
"phases": {
714
"1_naive_fit_no_survey_design": {
8-
"seconds": 0.013499665999999966,
15+
"seconds": 0.013970540999999947,
916
"ok": true,
1017
"error": null
1118
},
1219
"2_tsl_strata_psu_fpc": {
13-
"seconds": 0.03187458300000001,
20+
"seconds": 0.03233287500000004,
1421
"ok": true,
1522
"error": null
1623
},
17-
"3_replicate_weights_brr": {
18-
"seconds": 0.3442796670000001,
24+
"3_replicate_weights_jk1": {
25+
"seconds": 0.44611704099999994,
1926
"ok": true,
2027
"error": null
2128
},
2229
"4_multi_outcome_loop_3_metrics": {
23-
"seconds": 0.19682533299999982,
30+
"seconds": 0.21938504199999986,
2431
"ok": true,
2532
"error": null
2633
},
2734
"5_check_parallel_trends": {
28-
"seconds": 0.030179500000000026,
35+
"seconds": 0.04051395800000002,
2936
"ok": true,
3037
"error": null
3138
},
3239
"6_placebo_refit_pre_period": {
33-
"seconds": 0.043751333999999975,
40+
"seconds": 0.016386375000000175,
3441
"ok": true,
3542
"error": null
3643
},
3744
"7_event_study_plus_honest_did": {
38-
"seconds": 0.13358487500000016,
45+
"seconds": 0.13737600000000016,
3946
"ok": true,
4047
"error": null
4148
}

benchmarks/speed_review/baselines/brand_awareness_survey_large_rust.json

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,47 @@
22
"scenario": "brand_awareness_survey_large",
33
"backend": "rust",
44
"has_rust_backend": true,
5-
"total_seconds": 0.828119375,
5+
"total_seconds": 0.8791467499999999,
6+
"memory": {
7+
"available": true,
8+
"start_mb": 187.97,
9+
"peak_mb": 315.2,
10+
"growth_mb": 127.23,
11+
"sampler_interval_s": 0.01
12+
},
613
"phases": {
714
"1_naive_fit_no_survey_design": {
8-
"seconds": 0.014049749999999861,
15+
"seconds": 0.013859375000000007,
916
"ok": true,
1017
"error": null
1118
},
1219
"2_tsl_strata_psu_fpc": {
13-
"seconds": 0.029422499999999907,
20+
"seconds": 0.02124400000000004,
1421
"ok": true,
1522
"error": null
1623
},
17-
"3_replicate_weights_brr": {
18-
"seconds": 0.36754912500000003,
24+
"3_replicate_weights_jk1": {
25+
"seconds": 0.42970375000000005,
1926
"ok": true,
2027
"error": null
2128
},
2229
"4_multi_outcome_loop_3_metrics": {
23-
"seconds": 0.16490987499999998,
30+
"seconds": 0.2112943330000001,
2431
"ok": true,
2532
"error": null
2633
},
2734
"5_check_parallel_trends": {
28-
"seconds": 0.03375229199999996,
35+
"seconds": 0.038379208000000276,
2936
"ok": true,
3037
"error": null
3138
},
3239
"6_placebo_refit_pre_period": {
33-
"seconds": 0.06475750000000025,
40+
"seconds": 0.025571082999999994,
3441
"ok": true,
3542
"error": null
3643
},
3744
"7_event_study_plus_honest_did": {
38-
"seconds": 0.15367104200000004,
45+
"seconds": 0.1390828329999998,
3946
"ok": true,
4047
"error": null
4148
}

benchmarks/speed_review/baselines/brand_awareness_survey_medium_python.json

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,47 @@
22
"scenario": "brand_awareness_survey_medium",
33
"backend": "python",
44
"has_rust_backend": false,
5-
"total_seconds": 0.48956791599999994,
5+
"total_seconds": 0.560188,
6+
"memory": {
7+
"available": true,
8+
"start_mb": 131.69,
9+
"peak_mb": 187.8,
10+
"growth_mb": 56.11,
11+
"sampler_interval_s": 0.01
12+
},
613
"phases": {
714
"1_naive_fit_no_survey_design": {
8-
"seconds": 0.01289191699999992,
15+
"seconds": 0.011834042000000045,
916
"ok": true,
1017
"error": null
1118
},
1219
"2_tsl_strata_psu_fpc": {
13-
"seconds": 0.035409875000000035,
20+
"seconds": 0.03354012500000003,
1421
"ok": true,
1522
"error": null
1623
},
17-
"3_replicate_weights_brr": {
18-
"seconds": 0.12633833299999997,
24+
"3_replicate_weights_jk1": {
25+
"seconds": 0.21381758399999995,
1926
"ok": true,
2027
"error": null
2128
},
2229
"4_multi_outcome_loop_3_metrics": {
23-
"seconds": 0.17774295900000003,
30+
"seconds": 0.13717983300000003,
2431
"ok": true,
2532
"error": null
2633
},
2734
"5_check_parallel_trends": {
28-
"seconds": 0.018629792000000034,
35+
"seconds": 0.018324165999999975,
2936
"ok": true,
3037
"error": null
3138
},
3239
"6_placebo_refit_pre_period": {
33-
"seconds": 0.0519646250000001,
40+
"seconds": 0.058137000000000105,
3441
"ok": true,
3542
"error": null
3643
},
3744
"7_event_study_plus_honest_did": {
38-
"seconds": 0.06657341699999986,
45+
"seconds": 0.08734375000000005,
3946
"ok": true,
4047
"error": null
4148
}

benchmarks/speed_review/baselines/brand_awareness_survey_medium_rust.json

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,47 @@
22
"scenario": "brand_awareness_survey_medium",
33
"backend": "rust",
44
"has_rust_backend": true,
5-
"total_seconds": 0.535454792,
5+
"total_seconds": 0.5398647089999999,
6+
"memory": {
7+
"available": true,
8+
"start_mb": 133.16,
9+
"peak_mb": 185.38,
10+
"growth_mb": 52.22,
11+
"sampler_interval_s": 0.01
12+
},
613
"phases": {
714
"1_naive_fit_no_survey_design": {
8-
"seconds": 0.011897708999999979,
15+
"seconds": 0.011500667000000075,
916
"ok": true,
1017
"error": null
1118
},
1219
"2_tsl_strata_psu_fpc": {
13-
"seconds": 0.03526237499999996,
20+
"seconds": 0.03384820799999999,
1421
"ok": true,
1522
"error": null
1623
},
17-
"3_replicate_weights_brr": {
18-
"seconds": 0.185435083,
24+
"3_replicate_weights_jk1": {
25+
"seconds": 0.191542875,
1926
"ok": true,
2027
"error": null
2128
},
2229
"4_multi_outcome_loop_3_metrics": {
23-
"seconds": 0.14044966699999994,
30+
"seconds": 0.105974083,
2431
"ok": true,
2532
"error": null
2633
},
2734
"5_check_parallel_trends": {
28-
"seconds": 0.019051875000000162,
35+
"seconds": 0.02876208299999994,
2936
"ok": true,
3037
"error": null
3138
},
3239
"6_placebo_refit_pre_period": {
33-
"seconds": 0.05337804200000007,
40+
"seconds": 0.06280441700000017,
3441
"ok": true,
3542
"error": null
3643
},
3744
"7_event_study_plus_honest_did": {
38-
"seconds": 0.08997387500000009,
45+
"seconds": 0.10540583399999992,
3946
"ok": true,
4047
"error": null
4148
}

benchmarks/speed_review/baselines/brand_awareness_survey_small_python.json

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,47 @@
22
"scenario": "brand_awareness_survey_small",
33
"backend": "python",
44
"has_rust_backend": false,
5-
"total_seconds": 0.15087129199999993,
5+
"total_seconds": 0.15974079200000002,
6+
"memory": {
7+
"available": true,
8+
"start_mb": 115.44,
9+
"peak_mb": 125.66,
10+
"growth_mb": 10.22,
11+
"sampler_interval_s": 0.01
12+
},
613
"phases": {
714
"1_naive_fit_no_survey_design": {
8-
"seconds": 0.0017902499999999932,
15+
"seconds": 0.0016714159999999811,
916
"ok": true,
1017
"error": null
1118
},
1219
"2_tsl_strata_psu_fpc": {
13-
"seconds": 0.00610949999999999,
20+
"seconds": 0.0061952499999999855,
1421
"ok": true,
1522
"error": null
1623
},
17-
"3_replicate_weights_brr": {
18-
"seconds": 0.02120725000000001,
24+
"3_replicate_weights_jk1": {
25+
"seconds": 0.018200666000000032,
1926
"ok": true,
2027
"error": null
2128
},
2229
"4_multi_outcome_loop_3_metrics": {
23-
"seconds": 0.011621500000000062,
30+
"seconds": 0.02470079199999997,
2431
"ok": true,
2532
"error": null
2633
},
2734
"5_check_parallel_trends": {
28-
"seconds": 0.001833375000000026,
35+
"seconds": 0.008862999999999954,
2936
"ok": true,
3037
"error": null
3138
},
3239
"6_placebo_refit_pre_period": {
33-
"seconds": 0.027076792000000016,
40+
"seconds": 0.024017708000000026,
3441
"ok": true,
3542
"error": null
3643
},
3744
"7_event_study_plus_honest_did": {
38-
"seconds": 0.081212583,
45+
"seconds": 0.07607645800000007,
3946
"ok": true,
4047
"error": null
4148
}

benchmarks/speed_review/baselines/brand_awareness_survey_small_rust.json

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,47 @@
22
"scenario": "brand_awareness_survey_small",
33
"backend": "rust",
44
"has_rust_backend": true,
5-
"total_seconds": 0.200881125,
5+
"total_seconds": 0.19896133300000007,
6+
"memory": {
7+
"available": true,
8+
"start_mb": 116.34,
9+
"peak_mb": 129.73,
10+
"growth_mb": 13.39,
11+
"sampler_interval_s": 0.01
12+
},
613
"phases": {
714
"1_naive_fit_no_survey_design": {
8-
"seconds": 0.0018462080000000158,
15+
"seconds": 0.0019397500000000178,
916
"ok": true,
1017
"error": null
1118
},
1219
"2_tsl_strata_psu_fpc": {
13-
"seconds": 0.005704333000000061,
20+
"seconds": 0.005711999999999939,
1421
"ok": true,
1522
"error": null
1623
},
17-
"3_replicate_weights_brr": {
18-
"seconds": 0.015561500000000006,
24+
"3_replicate_weights_jk1": {
25+
"seconds": 0.011531958999999925,
1926
"ok": true,
2027
"error": null
2128
},
2229
"4_multi_outcome_loop_3_metrics": {
23-
"seconds": 0.05937758399999993,
30+
"seconds": 0.06204845800000003,
2431
"ok": true,
2532
"error": null
2633
},
2734
"5_check_parallel_trends": {
28-
"seconds": 0.00939004099999996,
35+
"seconds": 0.00982324999999995,
2936
"ok": true,
3037
"error": null
3138
},
3239
"6_placebo_refit_pre_period": {
33-
"seconds": 0.025794415999999987,
40+
"seconds": 0.024675957999999998,
3441
"ok": true,
3542
"error": null
3643
},
3744
"7_event_study_plus_honest_did": {
38-
"seconds": 0.08319054199999998,
45+
"seconds": 0.08321629100000005,
3946
"ok": true,
4047
"error": null
4148
}

0 commit comments

Comments
 (0)