Skip to content

Unnecessary instructions generated #75978

Closed
@bugadani

Description

@bugadani
Contributor

I'm trying to figure out why certain iterator-based algorithms get const-optimized, while others don't and I stumbled upon this case.

example (updated 2021.03.05)

Compiling fn1 results in the expected machine code, but fn2 has extra stack reserve/free instructions that are not necessary, since the function's contents are optimized away completely. The stack size equals the size of the array.

Activity

tesuji

tesuji commented on Aug 28, 2020

@tesuji
Contributor

Two observations:

nikic

nikic commented on Sep 20, 2020

@nikic
Contributor

This would very likely be fixed by https://reviews.llvm.org/D87972.

bugadani

bugadani commented on Mar 5, 2021

@bugadani
ContributorAuthor

Fixed by #81451

mati865

mati865 commented on Mar 5, 2021

@mati865
Member

There should be codegen test for this issue.

bugadani

bugadani commented on Mar 5, 2021

@bugadani
ContributorAuthor

Yeah, right. I can add those for the issues I closed.

bugadani

bugadani commented on Mar 5, 2021

@bugadani
ContributorAuthor

Actually, I'm reopening this one as the issue stands on ARM: https://rust.godbolt.org/z/6fEdoq

nikic

nikic commented on Mar 5, 2021

@nikic
Contributor

Looks like a phase ordering problem. The remaining IR would be eliminated by GVN + InstCombine. Currently it only gets eliminated by DAGCombine, thus you still see the prologue/epilogue on ARM.

self-assigned this
on Mar 5, 2021
nikic

nikic commented on Mar 7, 2021

@nikic
Contributor

The problem here is that this loop only gets unrolled during full loop unrolling, while we need it to be unrolled during simple unrolling to still have a chance to optimize. For the IR at that point, SCEV cannot determine the loop trip count:

; Preheader:
start:
  %s = alloca [7 x i32], align 4
  %0 = bitcast [7 x i32]* %s to i8*
  call void @llvm.lifetime.start.p0i8(i64 28, i8* nonnull %0)
  %1 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 0
  store i32 1, i32* %1, align 4
  %2 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 1
  store i32 2, i32* %2, align 4 
  %3 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 2
  store i32 3, i32* %3, align 4
  %4 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 3
  store i32 4, i32* %4, align 4
  %5 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 4
  store i32 5, i32* %5, align 4
  %6 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 5
  store i32 6, i32* %6, align 4
  %7 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 6
  store i32 7, i32* %7, align 4
  %8 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 0
  %9 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 7
  %10 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 1
  br label %bb5

; Loop:
bb5:                                              ; preds = %start, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" 
  %11 = phi i32* [ %10, %start ], [ %14, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ]
  %sum.020 = phi i32 [ 0, %start ], [ %13, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ]
  %iter.sroa.0.019 = phi i32* [ %8, %start ], [ %iter.sroa.0.218, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ]
  %_12.i = icmp eq i32* %11, %9
  br i1 %_12.i, label %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit", label %bb3.i

bb3.i:                                            ; preds = %bb5
  %12 = getelementptr inbounds i32, i32* %iter.sroa.0.019, i64 2
  %.val.i = load i32, i32* %11, align 4, !alias.scope !2
  br label %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit"

"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit": ; preds = %bb5, %bb3.i
  %iter.sroa.0.218 = phi i32* [ %12, %bb3.i ], [ %11, %bb5 ]
  %.0.i = phi i32 [ %.val.i, %bb3.i ], [ 1, %bb5 ]
  %13 = add i32 %.0.i, %sum.020
  %_12.i7 = icmp eq i32* %iter.sroa.0.218, %9
  %14 = getelementptr inbounds i32, i32* %iter.sroa.0.218, i64 1
  br i1 %_12.i7, label %bb4, label %bb5

; Exit blocks
bb4:                                              ; preds = %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit"
  %.lcssa = phi i32 [ %13, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ] 
  call void @llvm.lifetime.end.p0i8(i64 28, i8* nonnull %0)
  ret i32 %.lcssa
added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
I-slowIssue: Problems and improvements with respect to performance of generated code.
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Apr 3, 2023
nikic

nikic commented on Apr 3, 2023

@nikic
Contributor

Fixed by the LLVM 16 upgrade.

added a commit that references this issue on Apr 3, 2023
73f40d4
added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Apr 5, 2023
added a commit that references this issue on Apr 11, 2023

Rollup merge of rust-lang#109895 - nikic:llvm-16-tests, r=cuviper

efb96af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-mir-optArea: MIR optimizationsE-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @nikic@bugadani@mati865@tesuji@camelid

    Issue actions

      Unnecessary instructions generated · Issue #75978 · rust-lang/rust