-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
A-layoutArea: Memory layout of typesArea: Memory layout of typesC-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Description
Sometimes boxing a generated Future causes the value to be (unnecessarily?) copied. This can be worked around by wrapping the call to Box::new
.
Code:
use std::future::ready;
use std::mem;
struct Foo {}
impl Drop for Foo {
fn drop(&mut self) {
println!("foo");
}
}
async fn func1() {
let _f = Foo {};
let mut buf = [0u8; 16_384];
buf[0] = 42;
ready(()).await;
println!("{}", buf[0]);
}
fn _box_new<T>(x: T) -> Box<T> {
Box::new(x)
}
fn take<T>(v: T) {
println!("{}", mem::size_of_val(&v));
}
fn main() {
take(Box::new(func1()));
//take(_box_new(func1()));
}
Asm contains a ~16k memcpy:
$ rustc -O --edition 2021 --emit asm boxcopybug.rs && grep -B 3 memcpy boxcopybug.s
leaq 32(%rsp), %rsi
movl $16386, %edx
movq %rax, %rdi
callq *memcpy@GOTPCREL(%rip)
In main, if you comment out the first line and uncomment the second, the memcpy goes away, even though the code is nearly identical. I'd expect both ways to compile down the same.
Box::pin
is similarly affected.
The behavior is present in the latest stable and nightly.
rustc --version --verbose
:
rustc 1.63.0 (4b91a6ea7 2022-08-08)
binary: rustc
commit-hash: 4b91a6ea7258a947e59c6522cd5898e7c0a6a88f
commit-date: 2022-08-08
host: x86_64-unknown-linux-gnu
release: 1.63.0
LLVM version: 14.0.5
Metadata
Metadata
Assignees
Labels
A-layoutArea: Memory layout of typesArea: Memory layout of typesC-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
guswynn commentedon Aug 24, 2022
moves are actually implemented as
memcpy
's under the hood! Boxing the future requires you move the value from the stack into the allocation on the heap. In practice, I haven't ever seen a move cause a meaningful performance problem, unless the type is huge (futures CAN be huge). There were some RFC's at one point to add the ability to guarantee a value is constructed in place on the heap, but the problem-space is complex! I wonder if--release
optimizes anything awayest31 commentedon Aug 24, 2022
Thanks for this, I've been looking for small examples where
Box::new
is not optimized out by LLVM's optimizer. I knew they existed but I didn't find a concrete example. You can box directly via two nightly features,box x
or#[rustc_box] Box:new(x)
, this should be memcpy-free. But I don't know how long any of the two will still exist.est31 commentedon Aug 24, 2022
For more, see #49733
jkarneges commentedon Aug 24, 2022
Yeah it seems like a Rust best practice is to always work with smallish structs and basically assume everything might get copied when moved, and hope that such copying won't be meaningful. Unfortunately, generated futures often make working with big structs unavoidable.
Interestingly, I discovered this issue while attempting to avoid moving futures. Most runtime spawn functions (for example, tokio::spawn) are generic and take a non-boxed value (i.e. they have a signature like
spawn<T: Future>(fut: T)
). Such functions then box the future internally. I wondered if maybe moving a whole future by value into the spawn function could potentially be slow compared to having the caller box the future first and then changing the spawn function to take a box, so I tested that. My expectation was that this would probably have no impact because the optimizer was already avoiding copies, but maybe I'd get lucky my change might turn out to be faster. I did not at all expect that this would be slower, and so I went on a hunt to find out why.It is really counterintuitive because you'd think immediately boxing a future would be a good best practice, i.e. it would be the best chance of avoiding copies, when actually immediate boxing risks adding a copy. Passing the future by value to a function that then boxes the value is how you save the copy.
veber-alex commentedon Aug 24, 2022
On nightly with
-Copt-level=3
instead of-O
(which isopt-level=2
) there is no memcpy.Though it's still strange that putting the
Box::new
in a wrapper functions actually changes anything.jkarneges commentedon Aug 24, 2022
The memcpy is still present for me with
opt-level=3
.EDIT: sorry, that was on stable. On nightly I can confirm that
opt-level=3
does optimize away the memcpy. So maybe this issue is fixed.est31 commentedon Aug 24, 2022
Generally, optimizers employ a large amount of heuristics which are unreliable. It's not uncommon that something like this causes changes in their output.
If I had to guess, I would point at the LLVM 15 upgrade (#99464). LLVM keeps changing their optimizer all the time so it's possible that it now detects this case, whether unintentionally or not.
eholk commentedon Aug 24, 2022
@rustbot label +A-async-await
tmandry commentedon Aug 29, 2022
opt-level=3 can be quite aggressive, and I think we should be able to optimize this away at opt-level=2 at least.
eholk commentedon Aug 29, 2022
Do we know that opt-level=2 doesn't catch this on nightly still? If there was an LLVM upgrade, maybe that caught this case as well.
eholk commentedon Aug 29, 2022
We talked about this in the wg-async triage meeting. It sounds like this is a more general bug that async code happens to be more likely to expose, so we'll remove the async label.
@rustbot label -A-async-await
jkarneges commentedon Aug 29, 2022
opt-level=2 on nightly does not catch it
jkarneges commentedon Sep 22, 2022
I've confirmed this issue is fixed in 1.65.0-beta.1 (2a65764 2022-09-19).
jkarneges commentedon Nov 3, 2022
And... confirmed the fix made it to stable! (1.65.0)