Skip to content

Redundant Copies with #[repr(align)] Enum References #140182

Open
@WindFrank

Description

@WindFrank

When creating references to #[repr(align)] types wrapped in enums, LLVM generates suboptimal assembly with redundant memory operations, despite the reference being unused. This occurs even at opt-level=3.
I tried this code: (opt-level=3)
https://godbolt.org/z/P8E4hsdbn

#![crate_type = "lib"]
#[repr(align(64))]
pub struct Align64(i32);

pub enum Enum64 {
    A(Align64),
    B(i32),
}

/// Processes data and returns an Enum64 variant
/// Logs intermediate state for debugging purposes
#[no_mangle]
pub fn process_data(a: Align64) -> Enum64 {
    let result = Enum64::A(a);
    
    // Common debugging pattern - logging intermediate values
    log_intermediate(&result);
    result
}

#[inline(never)]
fn log_intermediate(e: &Enum64) {
    // The empty function still forces the reference to be created
}

I expected to see this happen:

process_data:
        mov     rax, rdi
        movaps  xmm0, xmmword ptr [rsi]
        movaps  xmm1, xmmword ptr [rsi + 16]
        movaps  xmm2, xmmword ptr [rsi + 32]
        movaps  xmm3, xmmword ptr [rsi + 48]
        movaps  xmmword ptr [rdi + 112], xmm3
        movaps  xmmword ptr [rdi + 96], xmm2
        movaps  xmmword ptr [rdi + 80], xmm1
        movaps  xmmword ptr [rdi + 64], xmm0
        mov     dword ptr [rdi], 0
        ret

Instead, this happened:

process_data:
        mov     rax, rdi
        movups  xmm0, xmmword ptr [rsi]
        movups  xmm1, xmmword ptr [rsi + 16]
        movups  xmm2, xmmword ptr [rsi + 32]
        movups  xmm3, xmmword ptr [rsi + 48]
        movups  xmmword ptr [rsp - 16], xmm3
        movups  xmmword ptr [rsp - 32], xmm2
        movups  xmmword ptr [rsp - 48], xmm1
        movups  xmmword ptr [rsp - 64], xmm0
        mov     dword ptr [rdi], 0
        movups  xmm0, xmmword ptr [rsp - 124]
        movups  xmm1, xmmword ptr [rsp - 108]
        movups  xmm2, xmmword ptr [rsp - 92]
        movups  xmm3, xmmword ptr [rsp - 76]
        movups  xmmword ptr [rdi + 4], xmm0
        movups  xmmword ptr [rdi + 20], xmm1
        movups  xmmword ptr [rdi + 36], xmm2
        movups  xmmword ptr [rdi + 52], xmm3
        movups  xmm0, xmmword ptr [rsp - 60]
        movups  xmmword ptr [rdi + 68], xmm0
        movups  xmm0, xmmword ptr [rsp - 44]
        movups  xmmword ptr [rdi + 84], xmm0
        movups  xmm0, xmmword ptr [rsp - 28]
        movups  xmmword ptr [rdi + 100], xmm0
        movups  xmm0, xmmword ptr [rsp - 16]
        movups  xmmword ptr [rdi + 112], xmm0
        ret

Performance Impact
1.Instruction Count: 24 vs 8 instructions (3x increase)
2.Memory Operations:
-2x bandwidth usage (128B vs 64B transferred)
-Unnecessary stack spills
3.Instruction Selection:
-Uses movups (unaligned) instead of movaps (aligned)
-Missed opportunity for aligned vector ops

Real-World Relevance
This pattern occurs in:
1.Debug logging (even when logs are disabled)
2.Generic code passing references
3.Derive macros (e.g., #[derive(Debug)])
4.Error handling paths

Could you please review the situation? Thank you!

Meta

rustc 1.85.0-nightly (d117b7f21 2024-12-31)
binary: rustc
commit-hash: d117b7f211835282b3b177dc64245fff0327c04c
commit-date: 2024-12-31
host: x86_64-unknown-linux-gnu
release: 1.85.0-nightly
LLVM version: 19.1.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-reprArea: the `#[repr(stuff)]` attributeC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions