[TII] Do not fold undef copies #147392

jrbyrnes · 2025-07-07T20:40:04Z

RegallocBase::cleanupFailedVReg hacks up the state of the liveness in order to facilitate producing valid IR. During this process, we may end up producing undef copies.

If the destination of these copies is a spill candidate, we will attempt to fold the source register when issuing the spill. The undef of the source is not propagated to storeRegToStackSlot , thus we end up dropping the undef, issuing a spill, and producing an illegal liveness state.

This checks for undef copies, and, if found, inserts a kill instead of spill.

Change-Id: I134bc03e7eeb7f5324c1ce7005b96b28d1b01643

llvmbot · 2025-07-07T20:40:37Z

@llvm/pr-subscribers-llvm-regalloc

Author: Jeffrey Byrnes (jrbyrnes)

Changes

Since this would result in spilling an undef register, the spill code is unnecessary.

Full diff: https://github.com/llvm/llvm-project/pull/147392.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/InlineSpiller.cpp (+3)
(added) llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir (+80)

diff --git a/llvm/lib/CodeGen/InlineSpiller.cpp b/llvm/lib/CodeGen/InlineSpiller.cpp
index ee08dc42dce57..6b8b8731e6b56 100644
--- a/llvm/lib/CodeGen/InlineSpiller.cpp
+++ b/llvm/lib/CodeGen/InlineSpiller.cpp
@@ -946,6 +946,9 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
     if (MO.isUse() && !MO.readsReg() && !MO.isTied())
       continue;
 
+    if (MI->isCopy() && MI->getOperand(1).isUndef())
+      continue;
+
     if (MO.isImplicit()) {
       ImpReg = MO.getReg();
       continue;
diff --git a/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
new file mode 100644
index 0000000000000..440753fc88277
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
@@ -0,0 +1,80 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs --start-before=greedy,2 --stop-after=greedy,2 %s -o - | FileCheck %s
+
+# Make sure there's no machine verifier error
+
+# If RA is unable to find a register to allocate, then cleanupFailedVReg will do ad-hoc rewriting and will insert undefs to make the LiveRanges workable.
+# %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3 is an example of such a rewrite / undef. If we were to want to spill %30, we should not be inserting
+# actual spill code, as the source operand is undef.
+# Check that there are no verfier issues with the LiveRange of $vgpr0_vgpr1_vgpr2_vgpr3 / that we do not insert spill code for %30.
+
+
+--- |
+  define void @foo() #0 {
+    ret void
+  }
+
+  attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
+
+...
+
+---
+name:            foo
+tracksRegLiveness: true
+stack:
+  - { id: 0, type: spill-slot, size: 32, alignment: 4 }
+machineFunctionInfo:
+  maxKernArgAlign: 4
+  isEntryFunction: true
+  waveLimiter:     true
+  scratchRSrcReg:  '$sgpr96_sgpr97_sgpr98_sgpr99'
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+  hasSpilledVGPRs: true
+  argumentInfo:
+    privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
+    dispatchPtr:     { reg: '$sgpr4_sgpr5' }
+    kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
+    workGroupIDX:    { reg: '$sgpr8' }
+    privateSegmentWaveByteOffset: { reg: '$sgpr9' }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: foo
+    ; CHECK: INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10 /* regdef */, def %10, 10 /* regdef */, def %1, 10 /* regdef */, def %2, 10 /* regdef */, def $vgpr0_vgpr1_vgpr2_vgpr3, 10 /* regdef */, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    ; CHECK-NEXT: SI_SPILL_AV128_SAVE [[COPY]], %stack.2, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV160_SAVE %2, %stack.1, $sgpr32, 0, implicit $exec :: (store (s160) into %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV256_SAVE %1, %stack.3, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vreg_512 = COPY %10
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY1]], %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY2]], %stack.6, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY [[SI_SPILL_V512_RESTORE]]
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE1:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[SI_SPILL_V512_RESTORE1]], %stack.4, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_AV256_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V256_SAVE [[SI_SPILL_AV256_RESTORE]], %stack.5, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV160_RESTORE:%[0-9]+]]:vreg_160 = SI_SPILL_AV160_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s160) from %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_AV128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV512_RESTORE:%[0-9]+]]:av_512 = SI_SPILL_AV512_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_V256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9 /* reguse */, [[SI_SPILL_AV512_RESTORE]], 9 /* reguse */, [[SI_SPILL_V256_RESTORE]], 9 /* reguse */, [[SI_SPILL_AV160_RESTORE]], 9 /* reguse */, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9 /* reguse */, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    ; CHECK-NEXT: SI_RETURN
+    INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10, def %22:vreg_512, 10, def %25:vreg_256, 10, def %28:vreg_160, 10, def $vgpr0_vgpr1_vgpr2_vgpr3, 10, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    %27:av_160 = COPY %28:vreg_160
+    %24:av_256 = COPY %25:vreg_256
+    SI_SPILL_V512_SAVE %22:vreg_512, %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    %18:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY %18:vreg_512
+    %23:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    %26:vreg_256 = COPY %24:av_256
+    %29:vreg_160 = COPY %27:av_160
+    $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %30:av_128
+    INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9, %23:vreg_512, 9, %26:vreg_256, 9, %29:vreg_160, 9, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    SI_RETURN
+
+...

llvmbot · 2025-07-07T20:40:38Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

Since this would result in spilling an undef register, the spill code is unnecessary.

Full diff: https://github.com/llvm/llvm-project/pull/147392.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/InlineSpiller.cpp (+3)
(added) llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir (+80)

diff --git a/llvm/lib/CodeGen/InlineSpiller.cpp b/llvm/lib/CodeGen/InlineSpiller.cpp
index ee08dc42dce57..6b8b8731e6b56 100644
--- a/llvm/lib/CodeGen/InlineSpiller.cpp
+++ b/llvm/lib/CodeGen/InlineSpiller.cpp
@@ -946,6 +946,9 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
     if (MO.isUse() && !MO.readsReg() && !MO.isTied())
       continue;
 
+    if (MI->isCopy() && MI->getOperand(1).isUndef())
+      continue;
+
     if (MO.isImplicit()) {
       ImpReg = MO.getReg();
       continue;
diff --git a/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
new file mode 100644
index 0000000000000..440753fc88277
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
@@ -0,0 +1,80 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs --start-before=greedy,2 --stop-after=greedy,2 %s -o - | FileCheck %s
+
+# Make sure there's no machine verifier error
+
+# If RA is unable to find a register to allocate, then cleanupFailedVReg will do ad-hoc rewriting and will insert undefs to make the LiveRanges workable.
+# %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3 is an example of such a rewrite / undef. If we were to want to spill %30, we should not be inserting
+# actual spill code, as the source operand is undef.
+# Check that there are no verfier issues with the LiveRange of $vgpr0_vgpr1_vgpr2_vgpr3 / that we do not insert spill code for %30.
+
+
+--- |
+  define void @foo() #0 {
+    ret void
+  }
+
+  attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
+
+...
+
+---
+name:            foo
+tracksRegLiveness: true
+stack:
+  - { id: 0, type: spill-slot, size: 32, alignment: 4 }
+machineFunctionInfo:
+  maxKernArgAlign: 4
+  isEntryFunction: true
+  waveLimiter:     true
+  scratchRSrcReg:  '$sgpr96_sgpr97_sgpr98_sgpr99'
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+  hasSpilledVGPRs: true
+  argumentInfo:
+    privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
+    dispatchPtr:     { reg: '$sgpr4_sgpr5' }
+    kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
+    workGroupIDX:    { reg: '$sgpr8' }
+    privateSegmentWaveByteOffset: { reg: '$sgpr9' }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: foo
+    ; CHECK: INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10 /* regdef */, def %10, 10 /* regdef */, def %1, 10 /* regdef */, def %2, 10 /* regdef */, def $vgpr0_vgpr1_vgpr2_vgpr3, 10 /* regdef */, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    ; CHECK-NEXT: SI_SPILL_AV128_SAVE [[COPY]], %stack.2, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV160_SAVE %2, %stack.1, $sgpr32, 0, implicit $exec :: (store (s160) into %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV256_SAVE %1, %stack.3, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vreg_512 = COPY %10
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY1]], %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY2]], %stack.6, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY [[SI_SPILL_V512_RESTORE]]
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE1:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[SI_SPILL_V512_RESTORE1]], %stack.4, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_AV256_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V256_SAVE [[SI_SPILL_AV256_RESTORE]], %stack.5, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV160_RESTORE:%[0-9]+]]:vreg_160 = SI_SPILL_AV160_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s160) from %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_AV128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV512_RESTORE:%[0-9]+]]:av_512 = SI_SPILL_AV512_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_V256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9 /* reguse */, [[SI_SPILL_AV512_RESTORE]], 9 /* reguse */, [[SI_SPILL_V256_RESTORE]], 9 /* reguse */, [[SI_SPILL_AV160_RESTORE]], 9 /* reguse */, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9 /* reguse */, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    ; CHECK-NEXT: SI_RETURN
+    INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10, def %22:vreg_512, 10, def %25:vreg_256, 10, def %28:vreg_160, 10, def $vgpr0_vgpr1_vgpr2_vgpr3, 10, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    %27:av_160 = COPY %28:vreg_160
+    %24:av_256 = COPY %25:vreg_256
+    SI_SPILL_V512_SAVE %22:vreg_512, %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    %18:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY %18:vreg_512
+    %23:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    %26:vreg_256 = COPY %24:av_256
+    %29:vreg_160 = COPY %27:av_160
+    $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %30:av_128
+    INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9, %23:vreg_512, 9, %26:vreg_256, 9, %29:vreg_160, 9, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    SI_RETURN
+
+...

llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir

llvm/lib/CodeGen/InlineSpiller.cpp

Change-Id: Icfd4d568221b9d2425ebd7568c06b48c6a8ecdc9

llvm/lib/CodeGen/RegAllocBase.cpp

arsenm · 2025-07-10T08:43:33Z

Shouldn't describe this as "optimizing" the undef copies. I also do not understand why copy would be special. If we were to handle copy-like target instructions, this could fail the same way

llvm/lib/CodeGen/RegAllocBase.cpp

llvm/lib/CodeGen/InlineSpiller.cpp

arsenm · 2025-07-10T09:50:15Z

I think this is a more correct fix:

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 8b82deb2a9d8..a36c3e03e889 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -476,6 +476,9 @@ static const TargetRegisterClass *canFoldCopy(const MachineInstr &MI,
   if (FoldOp.getSubReg() || LiveOp.getSubReg())
     return nullptr;
 
+  if (LiveOp.isUndef())
+    return nullptr;
+
   Register FoldReg = FoldOp.getReg();
   Register LiveReg = LiveOp.getReg();

arsenm · 2025-07-10T09:55:28Z

The problem isn't the undef copy, it's the fact that storeRegToStack slot doesn't propagate the undef flag from the source when folding the copy. Disabling the optimization works, but we alternatively could pass through the undef flag and let the target produce the undef spill. Or, not emit the spill in the first place (that is complicated since all this code expects a single instruction be produced. I guess we could instead insert an undef KILL in place of a store)

arsenm · 2025-07-10T09:59:11Z

Option 2:

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 8b82deb2a9d8..8c23dff565ab 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -773,10 +773,14 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
   const MachineOperand &MO = MI.getOperand(1 - Ops[0]);
   MachineBasicBlock::iterator Pos = MI;
 
-  if (Flags == MachineMemOperand::MOStore)
-    storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI,
-                        Register());
-  else
+  if (Flags == MachineMemOperand::MOStore) {
+    if (MO.isUndef()) {
+      BuildMI(*MBB, Pos, MI.getDebugLoc(), get(TargetOpcode::KILL)).add(MO);
+    } else {
+      storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI,
+                          Register());
+    }
+  } else
     loadRegFromStackSlot(*MBB, Pos, MO.getReg(), FI, RC, TRI, Register());
   return &*--Pos;
 }

This reverts commit adff47f.

Change-Id: I0d76cc98d2368f4f929bee032c22714560f273cc

jrbyrnes · 2025-07-10T17:43:18Z

Thanks for the suggested fixes

The problem isn't the undef copy, it's the fact that storeRegToStack slot doesn't propagate the undef flag from the source when folding the copy.

I think ideally storeRegToStackSlot would have the required information to produce kill vs spill in order to reduce the contract between the client and implementation. Maybe we could work torwads that. However, without that information I guess we should put this logic in "canFoldCopy" since it seems designed our purpose (only used to filter out copies which should be fed to load/storeRegToStackSlot). If we want to use canFoldCopy in other places, where allowing undefs is appropriate, then we can adjust at that time.

arsenm · 2025-07-10T22:59:53Z

Maybe we could work torwads that. However, without that information I guess we should put this logic in "canFoldCopy" since it seems designed our purpose (only used to filter out copies which should be fed to load/storeRegToStackSlot). If we want to use canFoldCopy in other places, where allowing undefs is appropriate, then we can adjust at that time.

I think I prefer option 2, it's the more direct fix. It's probably better to just deal with it once here, pushing this responsibility into targets is just going to result in the same bug in N backends with poor test coverage.

llvm/lib/CodeGen/TargetInstrInfo.cpp

Change-Id: I36ddacfdb0806e988e0d8bc0c9ad018dc7e57e6f

[InlineSpiller] Do not fold undef copies

fb9754d

Change-Id: I134bc03e7eeb7f5324c1ce7005b96b28d1b01643

jrbyrnes requested a review from arsenm July 7, 2025 20:40

llvmbot added backend:AMDGPU llvm:codegen llvm:regalloc labels Jul 7, 2025

jrbyrnes mentioned this pull request Jul 7, 2025

[AMDGPU] Allocate AVRegClass last #146606

Open

arsenm reviewed Jul 8, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir Show resolved Hide resolved

llvm/lib/CodeGen/InlineSpiller.cpp Outdated Show resolved Hide resolved

Use IMPLICIT_DEF instead of undef COPYs

adff47f

Change-Id: Icfd4d568221b9d2425ebd7568c06b48c6a8ecdc9

jrbyrnes commented Jul 9, 2025

View reviewed changes

llvm/lib/CodeGen/RegAllocBase.cpp Outdated Show resolved Hide resolved

jrbyrnes changed the title ~~[InlineSpiller] Do not fold undef copies~~ [RegAllocBase] Produce IMPLICIT_DEF instead of COPY undef during cleanupFailedVReg Jul 9, 2025

arsenm reviewed Jul 10, 2025

View reviewed changes

llvm/lib/CodeGen/RegAllocBase.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/InlineSpiller.cpp Outdated Show resolved Hide resolved

jrbyrnes added 2 commits July 10, 2025 09:31

Revert "Use IMPLICIT_DEF instead of undef COPYs"

bf5258b

This reverts commit adff47f.

[TII] Do not fold undef copies

f6cb3c5

Change-Id: I0d76cc98d2368f4f929bee032c22714560f273cc

jrbyrnes changed the title ~~[RegAllocBase] Produce IMPLICIT_DEF instead of COPY undef during cleanupFailedVReg~~ [TII] Do not fold undef copies Jul 10, 2025

arsenm reviewed Jul 10, 2025

View reviewed changes

llvm/lib/CodeGen/TargetInstrInfo.cpp Outdated Show resolved Hide resolved

Review comments

e5b6539

Change-Id: I36ddacfdb0806e988e0d8bc0c9ad018dc7e57e6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TII] Do not fold undef copies #147392

[TII] Do not fold undef copies #147392

jrbyrnes commented Jul 7, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

jrbyrnes commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

[TII] Do not fold undef copies #147392

Are you sure you want to change the base?

[TII] Do not fold undef copies #147392

Conversation

jrbyrnes commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

jrbyrnes commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

jrbyrnes commented Jul 7, 2025 •

edited

Loading