Skip to content

Unnecessary memcpy caused by ordering of unwrap #56172

Closed
@jrmuizel

Description

@jrmuizel
Contributor

In the following code f and g have these lines swapped:

    let item = SpecificDisplayItem::PopStackingContext;
    clip.unwrap();

Unwrapping later causes f to have an additional memcpy.

#[inline(never)]
pub fn f(clip: Option<&bool>) {
    let item = SpecificDisplayItem::PopStackingContext;
    clip.unwrap();
    do_item(&DI {
            item,
    });
}

#[inline(never)]
pub fn g(clip: Option<&bool>) {
    clip.unwrap();
    let item = SpecificDisplayItem::PopStackingContext;
    do_item(&DI {
            item,
    });
}

pub enum SpecificDisplayItem {
    PopStackingContext,
    Other([f64; 22]),
}

struct DI {
    item: SpecificDisplayItem,
}


fn do_item(di: &DI) { unsafe { ext(di) } }
extern {
    fn ext(di: &DI);
}

Compiles to:

example::f:
        sub     rsp, 360
        test    rdi, rdi
        je      .LBB0_1
        mov     qword ptr [rsp], 0
        lea     rdi, [rsp + 8]
        lea     rsi, [rsp + 184]
        mov     edx, 176
        call    memcpy@PLT
        mov     rdi, rsp
        call    ext@PLT
        add     rsp, 360
        ret
.LBB0_1:
        lea     rdi, [rip + .Lbyte_str.2]
        call    core::panicking::panic@PLT
        ud2

example::g:
        sub     rsp, 184
        test    rdi, rdi
        je      .LBB1_1
        mov     qword ptr [rsp], 0
        mov     rdi, rsp
        call    ext@PLT
        add     rsp, 184
        ret
.LBB1_1:
        lea     rdi, [rip + .Lbyte_str.2]
        call    core::panicking::panic@PLT
        ud2

Ideally, f and g should compile to the same thing.

Activity

added a commit that references this issue on Nov 22, 2018

Auto merge of #3341 - jrmuizel:empty-item, r=gw3583

b4f7e43
lqd

lqd commented on Nov 23, 2018

@lqd
Member

cc @rust-lang/wg-codegen

eddyb

eddyb commented on Nov 23, 2018

@eddyb
Member

IIRC @pcwalton found that LLVM only elides copies in the same basic block (and unwrap can panic, so it terminates a basic block, splitting the code before and after into separate basic blocks).

nox

nox commented on Nov 23, 2018

@nox
Contributor

Yes, cf. @jrmuizel's PR on Webrender:

LLVM's ability to eliminate memcpy's across basic blocks is bad.

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
I-slowIssue: Problems and improvements with respect to performance of generated code.
on Dec 1, 2018
nikic

nikic commented on Dec 1, 2018

@nikic
Contributor

There were some attempts to make memcpyopt work across BBs, but they were reverted due to regressions, see https://bugs.llvm.org/show_bug.cgi?id=35519.

added
A-mir-optArea: MIR optimizations
C-enhancementCategory: An issue proposing an enhancement or a PR with one.
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
A-mir-opt-nrvoFixed by the Named Return Value Opt. (NRVO)
on Jun 10, 2020
dotdash

dotdash commented on Jun 26, 2020

@dotdash
Contributor

It might be worth noting that the memcpy here is especially pointless because it copies uninitialized memory. When the memcpy optimizations fold a memset into a memcpy it checks whether the memcpy copies more memory than what has been memset, and if so and the remainder is uninitialized, the memcpy for the uninitialized part is dropped.

In this case here, SROA splits the alloca for the item, into the discriminant part, and 176 uninitialized bytes. But since it runs without memory dependency analysis, is has no easy way to drop the memcpy for the uninitialized part. And before the MemCpy pass can kill the memcpy, the inliner comes and breaks the code into multiple basic blocks. So for this constellation, it would help to run the MemCpy pass before the inliner, but I have no real idea what tradeoff that makes.

Another option would be to add an optimization that drops memcpys from uninitialized sources to a pass that does cross-bb memory dependence analysis anyway, but a quick look didn't reveal any obvious place to do that. Given the rather common use of enums like this, where only some variants have a payload, that might be worth a try though.

oli-obk

oli-obk commented on Jun 26, 2020

@oli-obk
Contributor

This may get resolved with #72632 by doing the necessary work on MIR.

dotdash

dotdash commented on Jun 26, 2020

@dotdash
Contributor

The approach from #72632 breaks if you assign the same source to multiple destinations, because there's no simple chain that can be reduced to a single destination. I think you need to do copy-propagation (replacing the destination with the source, instead of the other way around) to handle that.

The following doesn't get properly optimized by #72632, but is handled by the memcpy pass being run before the inliner. That approach of course also doesn't handle all the cases, thus the proposal for the optimization to catch copies from uninitialized memory.

#[inline(never)]
pub fn f(clip: Option<&bool>) {
    let item = SpecificDisplayItem::PopStackingContext;
    clip.unwrap();
    do_item(&DI {
            item,
    });
   do_item(&DI {
            item,
    });
}}

In fact #72632 even stops the patched (MemCpyOpt before Inliner) LLVM from optimizing this version, because SROA can no longer split the alloca and so there's no memcpy that copies only uninitialized memory. For the modified f function #72632 still produces better code than nightly, but if you apply the same change to g, then nightly produces a properly optimized version, while dest-prop causes the lifetimes of the two DI instances to overlap, forcing double stack usage and a memcpy.

Edit: I'm not trying to criticize the work that went into #72632. I didn't notice the comment on the PR (and didn't realize it had progressed so far) when I started to look into this. After being made aware of it, I wanted to see that it works for myself and got confused by the way that optimization pass works because I (for some reason) always assumed it would do copy propagation and was just named weirdly, so I tried to figure out why it does things the way it does, and noticed that it breaks this modified example.

jrmuizel

jrmuizel commented on Mar 12, 2021

@jrmuizel
ContributorAuthor

Fixed by #82806

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-mir-optArea: MIR optimizationsA-mir-opt-nrvoFixed by the Named Return Value Opt. (NRVO)C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @eddyb@nox@nikic@dotdash@lqd

        Issue actions

          Unnecessary memcpy caused by ordering of unwrap · Issue #56172 · rust-lang/rust