Description
Updated description
A long time coming, this issue is that we should implement these changes simultaneously:
- Remove the
alloc_jemalloc
crate - Default allocations for all crate types to
std::alloc::System
. While currently the default for cdylib/staticlib, it's not the default for staticlib/executable - Add the
jemallocator
crate to rustc, but only rustc - Long-term, deprecate and remove the
alloc_system
crate
We for the longest time have defaulted to jemalloc as the default allocator for Rust programs. This has been in place since pre-1.0 and the vision was that we'd give programs a by-default faster allocator than what's on the system. Over time, this has not fared well:
- Jemalloc has been disabled on a wide variety of architectures for various reasons, the system allocator seems more reliable.
- Jemalloc, for whatever reason as we ship it, is incompatible with valgrind
- Jemalloc bloats the size of executables by deafult
- Not all Rust programs are bottlenecked on allocations, and those which are can use
#[global_allocator]
to opt-in to a jemalloc-based global allocator (through thejemallocator
or any other allocator crate).
The compiler, however still receives a good deal of benefit from using jemalloc (measured in #55202 (comment)). If that link is broken, it's basically a blanket across-the-board 8-10% regression in compile time for many benchmarks. (apparently the max rss also regressed on many benchmarks!). For this reason, we don't want to remove jemalloc from rustc itself.
The rest of this issue is now going to be technical details about how we can probably get rid of alloc_jemalloc
while preserving jemalloc in rustc itself. The tier 1 platforms that use alloc_jemalloc
which this issue will be focused on are:
- x86_64-unknown-linux-gnu
- i686-unknown-linux-gnu
- x86_64-apple-darwin
- i686-apple-darwin
Jemalloc is notably disabled on all Windows platforms (I believe due to our inability to ever get it building over there). Furthermore Jemalloc is enabled on some linux platforms but I think ended up basically being disabled on all but the above. This I believe narrows the targets we need to design for, as we basically need to keep the above working.
Note that we also have two modes of using jemalloc. In one mode we could actually use jemalloc-specific API functions, like alloc_jemalloc
does today. We could also use the standard API it has and the support to hook into the standard allocator on these two platforms. It's not been measured (AFAIK) at this time the tradeoff between these two strategies. Note that in any case we want to route LLVM's allocations to jemalloc, so we want to be sure to hook into the default allocator somehow.
I believe that this default allocator hooking on Linux works by basically relying on its own symbol malloc
overriding that in libc
, routing all memory allocation to jemalloc. I'm personally quite fuzzy on the details for OSX, but I think it has something to do with "zone allocators" and not much to do with symbol names. I think this means we can build jemalloc without symbol prefixes on Linux, and with symbol prefixes on OSX, and we should be able to, using that build, override the default allocator in both situations.
I would propose, first, a "hopefully easy" route to solve this:
- Let's link the compiler to the "system allocator". Let's then, on the four platforms above, link to
jemalloc_sys
, pulling in all of jemalloc itself. This should, with the right build configuration, mean that we're not using jemalloc everywhere in the compiler (just as we're rerouting LLVM we're rerouting the compiler).
I'm testing out the performance of this in #55217 and will report back with results. Results are that this is universally positive almost! @alexcrichton will make a PR.
Failing this @alexcrichton has ideas for a more invasive solution to use jemalloc-specific API calls in rustc itself, but hopefull that won't be necessary...
Original Description
@alexcrichton and I have increasingly come to think that Rust should not maintain jemalloc bindings in tree and link it by default. The primary reasons being:
- Being opinionated about the default allocator is against Rust's general philosophy of getting as close to the underlying system as possible. We've removed almost all runtime baggage from Rust except jemalloc.
- Due to breakage we've had to disable jemalloc support on some windows configurations, changing our default allocation characteristics there, and offering different implicit "service levels" on different tier 1 platforms.
- Keeping jemalloc working imposes increased maintenance burden. We support a lot of platforms and jemalloc upgrades sometimes do not work across all of them.
- The build system is complicated by supporting jemalloc on some platforms but not all.
For the sake of consistency and maintenance we'd prefer to just always use the system allocator, and make jemalloc an easy option to enable via the global allocator and a jemalloc crate on crates.io.
Activity
brson commentedon Oct 4, 2016
Depends on having stable global allocators.
Since this will result in immediate performance regressions on platforms using jemalloc today we'll need to be sensitive about how the transition is done and make sure it's clear how to regain that allocator perfomance. It might be a good idea to simultaneously publish other allocator crates to demonstrate the value of choice and for benchmark comparisons.
alexcrichton commentedon Oct 4, 2016
An alternative I've heard @sfackler advocate from time to time is:
That would allows us to optionally include jemalloc, but if you want the system allocator for heap profiling, valgrind, or other use cases you can choose so.
sfackler commentedon Oct 5, 2016
I would specifically like to jettison jemalloc entirely and use the system allocator. It breaks way too often, it dropped valgrind support, it adds a couple hundred kb to binaries, etc.
alexcrichton commentedon Oct 6, 2016
Some historical speed bumps we've had with jemalloc:
I'll try to keep this updated as we run into more issues.
kornelski commentedon Jan 2, 2017
jemalloc also makes Rust look bad to newcomers, because it makes "Hello World" executables much larger (I know it's not a fair way to judge a language, but people do, and I can't stop myself from caring about size of redistributable executables, too)
raphlinus commentedon Jan 2, 2017
Another observation - jemalloc seems to add a significant amount of overhead to thread creation, on both Linux and macOS. This hasn't been a major issue for me as we plan to use the system allocator on Fuchsia, but probably something worth looking into.
fweimer commentedon Jan 3, 2017
On the glibc side, we would be interested in workloads where jemalloc shows significant benefits. (@djdelorie is working on improving glibc malloc performance.)
move the alloc_jemalloc allocator out of tree, default to alloc_system
78 remaining items