Skip to content

Box::new copies generated Future unless call is wrapped #100932

@jkarneges

Description

@jkarneges
Contributor

Sometimes boxing a generated Future causes the value to be (unnecessarily?) copied. This can be worked around by wrapping the call to Box::new.

Code:

use std::future::ready;
use std::mem;

struct Foo {}

impl Drop for Foo {
    fn drop(&mut self) {
        println!("foo");
    }
}

async fn func1() {
    let _f = Foo {};
    let mut buf = [0u8; 16_384];
    buf[0] = 42;
    ready(()).await;
    println!("{}", buf[0]);
}

fn _box_new<T>(x: T) -> Box<T> {
    Box::new(x)
}

fn take<T>(v: T) {
    println!("{}", mem::size_of_val(&v));
}

fn main() {
    take(Box::new(func1()));
    //take(_box_new(func1()));
}

Asm contains a ~16k memcpy:

$ rustc -O --edition 2021 --emit asm boxcopybug.rs && grep -B 3 memcpy boxcopybug.s
        leaq    32(%rsp), %rsi
        movl    $16386, %edx
        movq    %rax, %rdi
        callq   *memcpy@GOTPCREL(%rip)

In main, if you comment out the first line and uncomment the second, the memcpy goes away, even though the code is nearly identical. I'd expect both ways to compile down the same.

Box::pin is similarly affected.

The behavior is present in the latest stable and nightly.

rustc --version --verbose:

rustc 1.63.0 (4b91a6ea7 2022-08-08)
binary: rustc
commit-hash: 4b91a6ea7258a947e59c6522cd5898e7c0a6a88f
commit-date: 2022-08-08
host: x86_64-unknown-linux-gnu
release: 1.63.0
LLVM version: 14.0.5

Activity

guswynn

guswynn commented on Aug 24, 2022

@guswynn
Contributor

moves are actually implemented as memcpy's under the hood! Boxing the future requires you move the value from the stack into the allocation on the heap. In practice, I haven't ever seen a move cause a meaningful performance problem, unless the type is huge (futures CAN be huge). There were some RFC's at one point to add the ability to guarantee a value is constructed in place on the heap, but the problem-space is complex! I wonder if --release optimizes anything away

est31

est31 commented on Aug 24, 2022

@est31
Member

Thanks for this, I've been looking for small examples where Box::new is not optimized out by LLVM's optimizer. I knew they existed but I didn't find a concrete example. You can box directly via two nightly features, box x or #[rustc_box] Box:new(x), this should be memcpy-free. But I don't know how long any of the two will still exist.

est31

est31 commented on Aug 24, 2022

@est31
Member

For more, see #49733

jkarneges

jkarneges commented on Aug 24, 2022

@jkarneges
ContributorAuthor

I haven't ever seen a move cause a meaningful performance problem, unless the type is huge (futures CAN be huge)

Yeah it seems like a Rust best practice is to always work with smallish structs and basically assume everything might get copied when moved, and hope that such copying won't be meaningful. Unfortunately, generated futures often make working with big structs unavoidable.

Interestingly, I discovered this issue while attempting to avoid moving futures. Most runtime spawn functions (for example, tokio::spawn) are generic and take a non-boxed value (i.e. they have a signature like spawn<T: Future>(fut: T)). Such functions then box the future internally. I wondered if maybe moving a whole future by value into the spawn function could potentially be slow compared to having the caller box the future first and then changing the spawn function to take a box, so I tested that. My expectation was that this would probably have no impact because the optimizer was already avoiding copies, but maybe I'd get lucky my change might turn out to be faster. I did not at all expect that this would be slower, and so I went on a hunt to find out why.

It is really counterintuitive because you'd think immediately boxing a future would be a good best practice, i.e. it would be the best chance of avoiding copies, when actually immediate boxing risks adding a copy. Passing the future by value to a function that then boxes the value is how you save the copy.

veber-alex

veber-alex commented on Aug 24, 2022

@veber-alex
Contributor

On nightly with -Copt-level=3 instead of -O (which is opt-level=2) there is no memcpy.
Though it's still strange that putting the Box::new in a wrapper functions actually changes anything.

jkarneges

jkarneges commented on Aug 24, 2022

@jkarneges
ContributorAuthor

The memcpy is still present for me with opt-level=3.

EDIT: sorry, that was on stable. On nightly I can confirm that opt-level=3 does optimize away the memcpy. So maybe this issue is fixed.

est31

est31 commented on Aug 24, 2022

@est31
Member

it's still strange that putting the Box::new in a wrapper functions actually changes anything.

Generally, optimizers employ a large amount of heuristics which are unreliable. It's not uncommon that something like this causes changes in their output.

On nightly I can confirm that opt-level=3 does optimize away the memcpy.

If I had to guess, I would point at the LLVM 15 upgrade (#99464). LLVM keeps changing their optimizer all the time so it's possible that it now detects this case, whether unintentionally or not.

eholk

eholk commented on Aug 24, 2022

@eholk
Contributor

@rustbot label +A-async-await

added
I-slowIssue: Problems and improvements with respect to performance of generated code.
A-layoutArea: Memory layout of types
on Aug 25, 2022
tmandry

tmandry commented on Aug 29, 2022

@tmandry
Member

opt-level=3 can be quite aggressive, and I think we should be able to optimize this away at opt-level=2 at least.

eholk

eholk commented on Aug 29, 2022

@eholk
Contributor

Do we know that opt-level=2 doesn't catch this on nightly still? If there was an LLVM upgrade, maybe that caught this case as well.

eholk

eholk commented on Aug 29, 2022

@eholk
Contributor

We talked about this in the wg-async triage meeting. It sounds like this is a more general bug that async code happens to be more likely to expose, so we'll remove the async label.

@rustbot label -A-async-await

jkarneges

jkarneges commented on Aug 29, 2022

@jkarneges
ContributorAuthor

opt-level=2 on nightly does not catch it

jkarneges

jkarneges commented on Sep 22, 2022

@jkarneges
ContributorAuthor

On nightly I can confirm that opt-level=3 does optimize away the memcpy. So maybe this issue is fixed.

I've confirmed this issue is fixed in 1.65.0-beta.1 (2a65764 2022-09-19).

jkarneges

jkarneges commented on Nov 3, 2022

@jkarneges
ContributorAuthor

And... confirmed the fix made it to stable! (1.65.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-layoutArea: Memory layout of typesC-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @eholk@jkarneges@tmandry@guswynn@est31

        Issue actions

          Box::new copies generated Future unless call is wrapped · Issue #100932 · rust-lang/rust