Problem
se_simple() and se_sum() in R/calculate_cvs.R are the most-called functions in the MOE pipeline (invoked hundreds of times inside dplyr::across() in calculate_moes()), but both have unnecessary overhead:
se_simple() (line 8)
Uses purrr::map_dbl(moe, ~ .x / 1.645) for what is natively vectorized as moe / 1.645. The map_dbl wrapper adds per-element function-call overhead for no benefit.
se_sum() (lines 16-61)
For each call, this function:
- Converts list inputs to data frames
- Pivots twice (
pivot_longer + pivot_wider)
- Groups and splits by observation
- Maps 4 times (pull, se_simple, square, sum)
- Collects with
map_dbl(sqrt)
The core operation is sqrt(sum(se^2)) row-wise, with a Census Bureau special case: when multiple zero-estimate observations are summed, keep only the largest MOE among them.
A vectorized matrix-based approach should be significantly faster — but needs careful testing against the current implementation for the zero-estimate edge case.
Suggested approach
- Replace
se_simple() body with moe / 1.645
- Rewrite
se_sum() using matrix operations
- Benchmark before/after on tract-level data for a large state
- Verify the zero-estimate edge case produces identical results
Context
Identified during code review. For tract-level data across multiple states, calculate_moes() is a significant portion of total compile_acs_data() runtime.
Problem
se_simple()andse_sum()inR/calculate_cvs.Rare the most-called functions in the MOE pipeline (invoked hundreds of times insidedplyr::across()incalculate_moes()), but both have unnecessary overhead:se_simple()(line 8)Uses
purrr::map_dbl(moe, ~ .x / 1.645)for what is natively vectorized asmoe / 1.645. Themap_dblwrapper adds per-element function-call overhead for no benefit.se_sum()(lines 16-61)For each call, this function:
pivot_longer+pivot_wider)map_dbl(sqrt)The core operation is
sqrt(sum(se^2))row-wise, with a Census Bureau special case: when multiple zero-estimate observations are summed, keep only the largest MOE among them.A vectorized matrix-based approach should be significantly faster — but needs careful testing against the current implementation for the zero-estimate edge case.
Suggested approach
se_simple()body withmoe / 1.645se_sum()using matrix operationsContext
Identified during code review. For tract-level data across multiple states,
calculate_moes()is a significant portion of totalcompile_acs_data()runtime.