-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[AArch64][SME] Fix restoring callee-saves from FP with hazard padding #143371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently, when hazard-padding is enabled a (fixed-size) hazard slot is placed in the CS area, just after the frame record. The size of this slot is part of the "CalleeSaveBaseToFrameRecordOffset". The SVE epilogue emission code assumed this offset was always zero, and incorrectly setting the stack pointer, resulting in all SVE registers being reloaded from incorrect offsets. ``` | prev_lr | | prev_fp | | (a.k.a. "frame record") | |-----------------------------------| <- fp(=x29) | <hazard padding> | |-----------------------------------| <- callee-saved base | | | callee-saved fp/simd/SVE regs | | | |-----------------------------------| <- SVE callee-save base ``` i.e. in the above diagram, the code assumed `fp == callee-saved base`.
@llvm/pr-subscribers-backend-aarch64 Author: Benjamin Maxwell (MacDue) ChangesCurrently, when hazard-padding is enabled a (fixed-size) hazard slot is placed in the CS area, just after the frame record. The size of this slot is part of the "CalleeSaveBaseToFrameRecordOffset". The SVE epilogue emission code assumed this offset was always zero, and incorrectly setting the stack pointer, resulting in all SVE registers being reloaded from incorrect offsets.
i.e. in the above diagram, the code assumed Patch is 25.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143371.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 22683237fa0a8..9d1cb09a1a568 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -2539,13 +2539,15 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
// restore the stack pointer from the frame pointer prior to SVE CSR
// restoration.
if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
- if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
+ if (int64_t SVECalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
// Set SP to start of SVE callee-save area from which they can
// be reloaded. The code below will deallocate the stack space
// space by moving FP -> SP.
- emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
- StackOffset::getScalable(-CalleeSavedSize), TII,
- MachineInstr::FrameDestroy);
+ emitFrameOffset(
+ MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
+ StackOffset::get(-AFI->getCalleeSaveBaseToFrameRecordOffset(),
+ -SVECalleeSavedSize),
+ TII, MachineInstr::FrameDestroy);
}
} else {
if (AFI->getSVECalleeSavedStackSize()) {
diff --git a/llvm/test/CodeGen/AArch64/stack-hazard.ll b/llvm/test/CodeGen/AArch64/stack-hazard.ll
index e169b199733bd..0b92aea51cf62 100644
--- a/llvm/test/CodeGen/AArch64/stack-hazard.ll
+++ b/llvm/test/CodeGen/AArch64/stack-hazard.ll
@@ -3143,3 +3143,428 @@ entry:
call void @bar(ptr noundef nonnull %b)
ret i32 0
}
+
+
+define i32 @svecc_call_dynamic_alloca(<4 x i16> %P0, i32 %P1, i32 %P2, <vscale x 16 x i8> %P3, i16 %P4) "aarch64_pstate_sm_compatible" {
+; CHECK0-LABEL: svecc_call_dynamic_alloca:
+; CHECK0: // %bb.0: // %entry
+; CHECK0-NEXT: stp x29, x30, [sp, #-64]! // 16-byte Folded Spill
+; CHECK0-NEXT: .cfi_def_cfa_offset 64
+; CHECK0-NEXT: cntd x9
+; CHECK0-NEXT: stp x27, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK0-NEXT: stp x9, x28, [sp, #16] // 16-byte Folded Spill
+; CHECK0-NEXT: stp x20, x19, [sp, #48] // 16-byte Folded Spill
+; CHECK0-NEXT: mov x29, sp
+; CHECK0-NEXT: .cfi_def_cfa w29, 64
+; CHECK0-NEXT: .cfi_offset w19, -8
+; CHECK0-NEXT: .cfi_offset w20, -16
+; CHECK0-NEXT: .cfi_offset w26, -24
+; CHECK0-NEXT: .cfi_offset w27, -32
+; CHECK0-NEXT: .cfi_offset w28, -40
+; CHECK0-NEXT: .cfi_offset w30, -56
+; CHECK0-NEXT: .cfi_offset w29, -64
+; CHECK0-NEXT: addvl sp, sp, #-18
+; CHECK0-NEXT: str p15, [sp, #4, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p14, [sp, #5, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p13, [sp, #6, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p12, [sp, #7, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p11, [sp, #8, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p10, [sp, #9, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p9, [sp, #10, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p8, [sp, #11, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p7, [sp, #12, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p6, [sp, #13, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p5, [sp, #14, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str p4, [sp, #15, mul vl] // 2-byte Folded Spill
+; CHECK0-NEXT: str z23, [sp, #2, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z22, [sp, #3, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z21, [sp, #4, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z20, [sp, #5, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z19, [sp, #6, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z18, [sp, #7, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z17, [sp, #8, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z16, [sp, #9, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z15, [sp, #10, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z14, [sp, #11, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z13, [sp, #12, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z12, [sp, #13, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z11, [sp, #14, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z10, [sp, #15, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z9, [sp, #16, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: str z8, [sp, #17, mul vl] // 16-byte Folded Spill
+; CHECK0-NEXT: .cfi_escape 0x10, 0x48, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x78, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d8 @ cfa - 64 - 8 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x49, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x70, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d9 @ cfa - 64 - 16 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x4a, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x68, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d10 @ cfa - 64 - 24 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x4b, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x60, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d11 @ cfa - 64 - 32 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x4c, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x58, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d12 @ cfa - 64 - 40 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x4d, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x50, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d13 @ cfa - 64 - 48 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x4e, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x48, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d14 @ cfa - 64 - 56 * VG
+; CHECK0-NEXT: .cfi_escape 0x10, 0x4f, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x40, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d15 @ cfa - 64 - 64 * VG
+; CHECK0-NEXT: mov w9, w0
+; CHECK0-NEXT: mov x8, sp
+; CHECK0-NEXT: mov w2, w1
+; CHECK0-NEXT: add x9, x9, #15
+; CHECK0-NEXT: mov x19, sp
+; CHECK0-NEXT: and x9, x9, #0x1fffffff0
+; CHECK0-NEXT: sub x8, x8, x9
+; CHECK0-NEXT: mov sp, x8
+; CHECK0-NEXT: //APP
+; CHECK0-NEXT: //NO_APP
+; CHECK0-NEXT: bl __arm_sme_state
+; CHECK0-NEXT: and x20, x0, #0x1
+; CHECK0-NEXT: .cfi_offset vg, -48
+; CHECK0-NEXT: tbz w20, #0, .LBB35_2
+; CHECK0-NEXT: // %bb.1: // %entry
+; CHECK0-NEXT: smstop sm
+; CHECK0-NEXT: .LBB35_2: // %entry
+; CHECK0-NEXT: mov x0, x8
+; CHECK0-NEXT: mov w1, #45 // =0x2d
+; CHECK0-NEXT: bl memset
+; CHECK0-NEXT: tbz w20, #0, .LBB35_4
+; CHECK0-NEXT: // %bb.3: // %entry
+; CHECK0-NEXT: smstart sm
+; CHECK0-NEXT: .LBB35_4: // %entry
+; CHECK0-NEXT: mov w0, #22647 // =0x5877
+; CHECK0-NEXT: movk w0, #59491, lsl #16
+; CHECK0-NEXT: .cfi_restore vg
+; CHECK0-NEXT: addvl sp, x29, #-18
+; CHECK0-NEXT: ldr z23, [sp, #2, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z22, [sp, #3, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z21, [sp, #4, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z20, [sp, #5, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z19, [sp, #6, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z18, [sp, #7, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z17, [sp, #8, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z16, [sp, #9, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z15, [sp, #10, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z14, [sp, #11, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z13, [sp, #12, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z12, [sp, #13, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z11, [sp, #14, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z10, [sp, #15, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z9, [sp, #16, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr z8, [sp, #17, mul vl] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr p15, [sp, #4, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p14, [sp, #5, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p13, [sp, #6, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p12, [sp, #7, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p11, [sp, #8, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p10, [sp, #9, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p9, [sp, #10, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p8, [sp, #11, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p7, [sp, #12, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p6, [sp, #13, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p5, [sp, #14, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: ldr p4, [sp, #15, mul vl] // 2-byte Folded Reload
+; CHECK0-NEXT: .cfi_restore z8
+; CHECK0-NEXT: .cfi_restore z9
+; CHECK0-NEXT: .cfi_restore z10
+; CHECK0-NEXT: .cfi_restore z11
+; CHECK0-NEXT: .cfi_restore z12
+; CHECK0-NEXT: .cfi_restore z13
+; CHECK0-NEXT: .cfi_restore z14
+; CHECK0-NEXT: .cfi_restore z15
+; CHECK0-NEXT: mov sp, x29
+; CHECK0-NEXT: .cfi_def_cfa wsp, 64
+; CHECK0-NEXT: ldp x20, x19, [sp, #48] // 16-byte Folded Reload
+; CHECK0-NEXT: ldr x28, [sp, #24] // 8-byte Folded Reload
+; CHECK0-NEXT: ldp x27, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK0-NEXT: ldp x29, x30, [sp], #64 // 16-byte Folded Reload
+; CHECK0-NEXT: .cfi_def_cfa_offset 0
+; CHECK0-NEXT: .cfi_restore w19
+; CHECK0-NEXT: .cfi_restore w20
+; CHECK0-NEXT: .cfi_restore w26
+; CHECK0-NEXT: .cfi_restore w27
+; CHECK0-NEXT: .cfi_restore w28
+; CHECK0-NEXT: .cfi_restore w30
+; CHECK0-NEXT: .cfi_restore w29
+; CHECK0-NEXT: ret
+;
+; CHECK64-LABEL: svecc_call_dynamic_alloca:
+; CHECK64: // %bb.0: // %entry
+; CHECK64-NEXT: sub sp, sp, #128
+; CHECK64-NEXT: .cfi_def_cfa_offset 128
+; CHECK64-NEXT: cntd x9
+; CHECK64-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill
+; CHECK64-NEXT: stp x9, x28, [sp, #80] // 16-byte Folded Spill
+; CHECK64-NEXT: stp x27, x26, [sp, #96] // 16-byte Folded Spill
+; CHECK64-NEXT: stp x20, x19, [sp, #112] // 16-byte Folded Spill
+; CHECK64-NEXT: add x29, sp, #64
+; CHECK64-NEXT: .cfi_def_cfa w29, 64
+; CHECK64-NEXT: .cfi_offset w19, -8
+; CHECK64-NEXT: .cfi_offset w20, -16
+; CHECK64-NEXT: .cfi_offset w26, -24
+; CHECK64-NEXT: .cfi_offset w27, -32
+; CHECK64-NEXT: .cfi_offset w28, -40
+; CHECK64-NEXT: .cfi_offset w30, -56
+; CHECK64-NEXT: .cfi_offset w29, -64
+; CHECK64-NEXT: addvl sp, sp, #-18
+; CHECK64-NEXT: str p15, [sp, #4, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p14, [sp, #5, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p13, [sp, #6, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p12, [sp, #7, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p11, [sp, #8, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p10, [sp, #9, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p9, [sp, #10, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p8, [sp, #11, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p7, [sp, #12, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p6, [sp, #13, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p5, [sp, #14, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str p4, [sp, #15, mul vl] // 2-byte Folded Spill
+; CHECK64-NEXT: str z23, [sp, #2, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z22, [sp, #3, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z21, [sp, #4, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z20, [sp, #5, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z19, [sp, #6, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z18, [sp, #7, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z17, [sp, #8, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z16, [sp, #9, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z15, [sp, #10, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z14, [sp, #11, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z13, [sp, #12, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z12, [sp, #13, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z11, [sp, #14, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z10, [sp, #15, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z9, [sp, #16, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: str z8, [sp, #17, mul vl] // 16-byte Folded Spill
+; CHECK64-NEXT: .cfi_escape 0x10, 0x48, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x78, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d8 @ cfa - 128 - 8 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x49, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x70, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d9 @ cfa - 128 - 16 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x4a, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x68, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d10 @ cfa - 128 - 24 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x4b, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x60, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d11 @ cfa - 128 - 32 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x4c, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x58, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d12 @ cfa - 128 - 40 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x4d, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x50, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d13 @ cfa - 128 - 48 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x4e, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x48, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d14 @ cfa - 128 - 56 * VG
+; CHECK64-NEXT: .cfi_escape 0x10, 0x4f, 0x0b, 0x11, 0x80, 0x7f, 0x22, 0x11, 0x40, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d15 @ cfa - 128 - 64 * VG
+; CHECK64-NEXT: sub sp, sp, #64
+; CHECK64-NEXT: mov w9, w0
+; CHECK64-NEXT: mov x8, sp
+; CHECK64-NEXT: mov w2, w1
+; CHECK64-NEXT: add x9, x9, #15
+; CHECK64-NEXT: mov x19, sp
+; CHECK64-NEXT: and x9, x9, #0x1fffffff0
+; CHECK64-NEXT: sub x8, x8, x9
+; CHECK64-NEXT: mov sp, x8
+; CHECK64-NEXT: //APP
+; CHECK64-NEXT: //NO_APP
+; CHECK64-NEXT: bl __arm_sme_state
+; CHECK64-NEXT: and x20, x0, #0x1
+; CHECK64-NEXT: .cfi_offset vg, -48
+; CHECK64-NEXT: tbz w20, #0, .LBB35_2
+; CHECK64-NEXT: // %bb.1: // %entry
+; CHECK64-NEXT: smstop sm
+; CHECK64-NEXT: .LBB35_2: // %entry
+; CHECK64-NEXT: mov x0, x8
+; CHECK64-NEXT: mov w1, #45 // =0x2d
+; CHECK64-NEXT: bl memset
+; CHECK64-NEXT: tbz w20, #0, .LBB35_4
+; CHECK64-NEXT: // %bb.3: // %entry
+; CHECK64-NEXT: smstart sm
+; CHECK64-NEXT: .LBB35_4: // %entry
+; CHECK64-NEXT: mov w0, #22647 // =0x5877
+; CHECK64-NEXT: movk w0, #59491, lsl #16
+; CHECK64-NEXT: .cfi_restore vg
+; CHECK64-NEXT: sub sp, x29, #64
+; CHECK64-NEXT: addvl sp, sp, #-18
+; CHECK64-NEXT: ldr z23, [sp, #2, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z22, [sp, #3, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z21, [sp, #4, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z20, [sp, #5, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z19, [sp, #6, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z18, [sp, #7, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z17, [sp, #8, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z16, [sp, #9, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z15, [sp, #10, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z14, [sp, #11, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z13, [sp, #12, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z12, [sp, #13, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z11, [sp, #14, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z10, [sp, #15, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z9, [sp, #16, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr z8, [sp, #17, mul vl] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr p15, [sp, #4, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p14, [sp, #5, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p13, [sp, #6, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p12, [sp, #7, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p11, [sp, #8, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p10, [sp, #9, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p9, [sp, #10, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p8, [sp, #11, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p7, [sp, #12, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p6, [sp, #13, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p5, [sp, #14, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: ldr p4, [sp, #15, mul vl] // 2-byte Folded Reload
+; CHECK64-NEXT: .cfi_restore z8
+; CHECK64-NEXT: .cfi_restore z9
+; CHECK64-NEXT: .cfi_restore z10
+; CHECK64-NEXT: .cfi_restore z11
+; CHECK64-NEXT: .cfi_restore z12
+; CHECK64-NEXT: .cfi_restore z13
+; CHECK64-NEXT: .cfi_restore z14
+; CHECK64-NEXT: .cfi_restore z15
+; CHECK64-NEXT: sub sp, x29, #64
+; CHECK64-NEXT: .cfi_def_cfa wsp, 128
+; CHECK64-NEXT: ldp x20, x19, [sp, #112] // 16-byte Folded Reload
+; CHECK64-NEXT: ldr x28, [sp, #88] // 8-byte Folded Reload
+; CHECK64-NEXT: ldp x27, x26, [sp, #96] // 16-byte Folded Reload
+; CHECK64-NEXT: ldp x29, x30, [sp, #64] // 16-byte Folded Reload
+; CHECK64-NEXT: add sp, sp, #128
+; CHECK64-NEXT: .cfi_def_cfa_offset 0
+; CHECK64-NEXT: .cfi_restore w19
+; CHECK64-NEXT: .cfi_restore w20
+; CHECK64-NEXT: .cfi_restore w26
+; CHECK64-NEXT: .cfi_restore w27
+; CHECK64-NEXT: .cfi_restore w28
+; CHECK64-NEXT: .cfi_restore w30
+; CHECK64-NEXT: .cfi_restore w29
+; CHECK64-NEXT: ret
+;
+; CHECK1024-LABEL: svecc_call_dynamic_alloca:
+; CHECK1024: // %bb.0: // %entry
+; CHECK1024-NEXT: sub sp, sp, #1088
+; CHECK1024-NEXT: .cfi_def_cfa_offset 1088
+; CHECK1024-NEXT: cntd x9
+; CHECK1024-NEXT: str x29, [sp, #1024] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x30, [sp, #1032] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x9, [sp, #1040] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x28, [sp, #1048] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x27, [sp, #1056] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x26, [sp, #1064] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x20, [sp, #1072] // 8-byte Folded Spill
+; CHECK1024-NEXT: str x19, [sp, #1080] // 8-byte Folded Spill
+; CHECK1024-NEXT: add x29, sp, #1024
+; CHECK1024-NEXT: .cfi_def_cfa w29, 64
+; CHECK1024-NEXT: .cfi_offset w19, -8
+; CHECK1024-NEXT: .cfi_offset w20, -16
+; CHECK1024-NEXT: .cfi_offset w26, -24
+; CHECK1024-NEXT: .cfi_offset w27, -32
+; CHECK1024-NEXT: .cfi_offset w28, -40
+; CHECK1024-NEXT: .cfi_offset w30, -56
+; CHECK1024-NEXT: .cfi_offset w29, -64
+; CHECK1024-NEXT: addvl sp, sp, #-18
+; CHECK1024-NEXT: str p15, [sp, #4, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p14, [sp, #5, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p13, [sp, #6, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p12, [sp, #7, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p11, [sp, #8, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p10, [sp, #9, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p9, [sp, #10, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p8, [sp, #11, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p7, [sp, #12, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p6, [sp, #13, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p5, [sp, #14, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str p4, [sp, #15, mul vl] // 2-byte Folded Spill
+; CHECK1024-NEXT: str z23, [sp, #2, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z22, [sp, #3, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z21, [sp, #4, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z20, [sp, #5, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z19, [sp, #6, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z18, [sp, #7, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z17, [sp, #8, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z16, [sp, #9, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z15, [sp, #10, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z14, [sp, #11, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z13, [sp, #12, mul vl] // 16-byte Folded Spill
+; CHECK1024-NEXT: str z12, [sp, #13, mul vl] // 16-byte Folded Spill
+; ...
[truncated]
|
; CHECK64-NEXT: movk w0, #59491, lsl #16 | ||
; CHECK64-NEXT: .cfi_restore vg | ||
; CHECK64-NEXT: sub sp, x29, #64 | ||
; CHECK64-NEXT: addvl sp, sp, #-18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying SP this way introduces a small window for memory corruption: if you take an interrupt between the sub and the addvl, the callee-saves can get overwritten. I think you need a temporary register here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I've reworked this so the base pointer will always be available in this case. This means if there's no realignment, we can use the base pointer to get to the start of the SVE callee saves, and if there is, we can use the base pointer as a temporary (as at this point it should not be used later, since the next step is restoring callee-saves).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess using the base register as a temporary is okay?
if (ST.getStreamingHazardSize() && | ||
!AFI->getSMEFnAttrs().hasNonStreamingInterfaceAndBody()) { | ||
return true; | ||
} | ||
|
||
if (hasStackRealignment(MF)) | ||
return MFI.hasVarSizedObjects() || MF.hasEHFunclets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The early return here makes this pretty hard to follow...
We currently have the following cases:
- Moving SP + realignment means we can't use sp or fp to refer to locals
- Moving SP + scalable bits on stack means we can't refer to fixed-size locals cheaply
- Moving SP + large locals means we can't refer to locals cheaply
- Moving SP + hazard padding means we can't refer to locals cheaply, including the emergency spill slot.
This patch adds:
- Stack realignment plus hazard padding means we don't know how to emit the epilogue.
I think I'd prefer to separate that out explicitly, and add a comment explaining it, since it's very different from the reasons for other stuff: the rest are all about referring to locals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this was an unpleasant hack... I've reverted this and instead now find any restored GPR in the epilogue, SVE functions are marked as having a "big stack" so always save at least one GPR we could use.
; CHECK0-NEXT: .cfi_escape 0x10, 0x4c, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x58, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d12 @ cfa - 64 - 40 * VG | ||
; CHECK0-NEXT: .cfi_escape 0x10, 0x4d, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x50, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d13 @ cfa - 64 - 48 * VG | ||
; CHECK0-NEXT: .cfi_escape 0x10, 0x4e, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x48, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d14 @ cfa - 64 - 56 * VG | ||
; CHECK0-NEXT: .cfi_escape 0x10, 0x4f, 0x0a, 0x11, 0x40, 0x22, 0x11, 0x40, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d15 @ cfa - 64 - 64 * VG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly connected to this patch, but the DWARF unwind seems to be missing some registers.
Currently, when hazard-padding is enabled a (fixed-size) hazard slot is placed in the CS area, just after the frame record. The size of this slot is part of the "CalleeSaveBaseToFrameRecordOffset". The SVE epilogue emission code assumed this offset was always zero, and incorrectly setting the stack pointer, resulting in all SVE registers being reloaded from incorrect offsets.
i.e. in the above diagram, the code assumed
fp == callee-saved base
.