-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] uses @llvm.stackrestore() to set the stack pointer #123389
Comments
@llvm/issue-subscribers-backend-powerpc Author: Mike (tankf33der)
hi,
I have a question or new bug. I have LLVM-IR code which uses @llvm.stackrestore() to set the stack pointer. Is this a known problem? If so, is there a solution? p.s. PicoLisp crashes on LLVM14-LLVM19 on ppc64le. |
Hello. In the PowerPC ABI for C code, the stack pointer %r1 must be a multiple at 16, and the first words are reserved. The pointer at 0(%r1) should point to the caller's frame. If you are resizing your frame as if by C's alloca(3), then I would expect LLVM to write the pointer at 0(%r1). If you are moving %r1 to an existing frame, then I would expect LLVM to refrain from overwriting the pointer at 0(%r1). A little-endian ppc64le uses the 64-Bit ELF V2 ABI Specification. Its example, in section 2.3.8 Dynamic Stack Space Allocation, resizes a frame using stdu to both set %r1 and write the pointer at the new 0(%r1). |
We are not using the C ABI (raw LLVM-IR only), but the problem is in how the backend transforms LLVM-IR to PPC64 assembly. Only Tier1 platform works consistently everywhere - as soon as you take a step to the side, various backend errors and bugs start occurring. |
Let me show an example for the issue from production code. The following The assembly code produced by the LLVM backend then are correct for These second of these two lines "std 4, 0(1)" writes to the stack memory PicoLisp Source:
LLVM-IR:
x86-64:
Arm64:
Ppc64:
I believe this is a bug in the LLVM backend for Ppc64. |
The PPC backend ensures that the function conforms to the ABI. The ABI requires the back chain to be kept at |
The ABI is not relevant here, as it is for languages like C. We have no "back chain". We are switching between coroutines, not C-like stack frames. See my examples above. Non-PPC architectures compile stackrestore() as a single instruction like "mov sp, x0", and do not mess with the memory the stack pointer points to. As I understand it, the backend for PPC does an illegal mixing of assumptions between low level (LLVM-IR) and higher levels (C). The minimal LLVM IR code above is taken directly from the PicoLisp distribution code. Unfortunately I have no access to a running PPC system. If the above code snippet is not helpful enough, perhaps @tankf33der wrap it into a function and test it? |
I understand that you are working in some system that does not need to conform to the ABI. However, what annotation is there to tell the back end that the code generated for this particular function does not need to conform to the ABI? If the back end just sees a function (particularly if the function has external linkage), it must emit code that conforms to the ABI. This is why I asked for a compilable piece of IR so we can see what attribute or similar annotation needs to be added to it in order to tell the back end that it does not need to handle the stack pointer in any special way. FWIW, for code generation (i.e. if you don't need to execute the code on real HW), you shouldn't need a PPC machine. If it is helpful, you can just paste your code in godbolt.org and select clang for PowerPC as the compiler. |
clang will not help. There is no C involved. The above Lisp code sets the stack pointer to a newly allocated coroutine stack segment. The PicoLisp compiler cannot make any assumption about where is written what by which ABI into that segment. It just wants to set the stack pointer to a fresh memory. It crashes if that memory is unexpectedly overwritten, because the backend believes it is C. @tankf33der , should we give up on PPC? |
Perhaps the backend confuses stackrestore() with longjmp() ? |
This Ppc64 code is wrong, even after removing the
This is a wrong call to a C function Ppc64 ELFv2 2.2.2 The Stack Frame would require for
An optimized memset might skip saving CR or LR, but probably saves TOC. If r20 - 256 > 24, then this illegal memset would clobber TOC and soon crash the program. I argue that LLVM IR runs on the C stack, so it must obey the C ABI for stack frames. The IR is free to ignore C types (like Would it work to subtract 32 from the stack pointer, and round it down to a multiple 16, before passing it to llvm.stackrestore? I will argue that llvm.stackrestore is correct to set 0(r1). The doc for llvm.stackrestore refers to llvm.stacksave, which says that stackrestore "pops any alloca blocks" since stacksave. This suggests that llvm.stackrestore is for shrinking the current stack frame (not for switching to a different stack frame). I wrote some LLVM IR to pop an alloca block, target triple = "powerpc64-unknown-openbsd"
declare void @take(ptr)
define void @give(i64 %s, i64 %t) {
entry:
%sp = call ptr @llvm.stacksave.p0()
%a = alloca i8, i64 %s
call void @take(ptr %a)
call void @llvm.stackrestore.p0(ptr %sp)
%b = alloca i8, i64 %t
call void @take(ptr %b)
ret void
} Then I ran mflr %r0
stdu %r1, -64(%r1)
std %r0, 80(%r1) Allocate a stack frame of 64 bytes, set the Back Chain at old -64(r1) or new 0(r1), then save the return address (via r0) in the caller's LR Save Doubleword at old 16(r1) or new 80(r1). addi %r3, %r3, 15
rldicr %r3, %r3, 0, 59
neg 3, 3 Set r3 = -((s + 15) & ~15), rounding up s to a multiple of 16, then negating it. mr %r31, %r1
addi %r4, %r31, 64
mr %r29, %r1
stdux %r4, %r1, %r3 Enlarge our stack frame by at least s bytes, and also move the Back Chain from the old 0(r1) to the new 0(r1). It would be correct to load r4 = 0(r1), but we are enlarging from 64 bytes, so r31 = r1, r4 = r31 + 64 gets the same value. The studx adds r3 to r1 (which moves r1 down by the rounded s) and also sets the new Back Chain = r4. Do addi %r3, %r1, 32
bl take
nop Call take(r3 = r1 + 32). We can't call take(r1), because we can't clobber the 32 reserved bytes. ld %r3, 0(%r1)
mr %r1, %r29
std %r3, 0(%r1) Do |
Thank you very much for the explanations and examples! I understand this is all about differing assumptions, and I'm afraid there is no easy solution.
Yes, that's the problem. The LLVM backend makes certain assumptions about different levels which I believe should not be mixed up. The original meaning of "LLVM" is "low level virtual machine", not "C machine".
'memset' being a C function is an assumption. The LLVM-IR code just says
'memset' is an intrinsic. It could well be a set of machine instructions.
I agree as far as function entry and exit code is concerned. It is typically generated by "define .. { .. }" and may set/clean up stack frames, initialize registers and so on. But I would not agree for a low level primitive like setting a pointer.
Theoretically yes. But PicoLisp is distributed as a system-agnostic IR file, to allow the initial build after installation. We would need to distribute a special version for Ppc64.
Right. I understand this as a typical use case, but not as the only one. So I assume that stackrestore() simply sets a pointer, while the Ppc64 backend assumes a whole stackframe machinery (which I would in turn assume for higher level stuff like longjmp()).
But how should the "low level" machine then do it? Or, in gereral, set the stack pointer to a computed location? There is no option for 'llvm-as' or 'opt' to fix this behavior, right? Like, for example, using a dedicated (asynchronous) frame pointer? |
I assume that any operation in LLVM IR might call a C function. This is target triple = "powerpc-unknown-openbsd"
define i64 @invert(i64 %i) {
entry:
%j = sdiv i64 1, %i
ret i64 %j
} The command
I am finding that the intrinsics read_register and write_register can access PowerPC's stack pointer in r1. The doc for read_register and write_register was outdated and incomplete, so I needed to glance at some test cases in llvm-project/llvm/test/CodeGen/PowerPC. I turned on my old PowerBook G4, so this code is for PPC32, not 64. I computed a PPC32 stack pointer in C code and used it in LLVM IR. chstack.ll target triple = "powerpc-unknown-openbsd"
declare i32 @llvm.read_register.i32(metadata)
declare void @llvm.write_register.i32(metadata, i32)
!0 = !{!"r1"}
define void @chstack(i32 %new_sp, ptr %fun) {
entry:
%old_sp = call i32 @llvm.read_register.i32(metadata !0)
call void @llvm.write_register.i32(metadata !0, i32 %new_sp)
call void %fun()
call void @llvm.write_register.i32(metadata !0, i32 %old_sp)
ret void
} main.c #include <sys/mman.h>
#include <stdio.h>
void chstack(void *, void (*)(void));
void hello(void) { puts("Hello from other stack."); }
int main(void) {
size_t size = 256 * 1024;
/* In OpenBSD, stack must be MAP_STACK. */
char *stack = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_STACK, -1, 0);
if (stack == MAP_FAILED) { perror("mmap"); return 1; }
/* In PPC32, stack grows down, and frame needs 16 bytes. */
char *sp = stack + size - 16;
chstack(sp, hello);
} It seems to work,
If it didn't work, I might have tried adding the attribute There isn't a general way to set the stack pointer for any target. The 1st problem is that I don't know whether the stack grows up or down. (It will probably grow down unless someone teaches LLVM to target HPPA.) The 2nd problem is that I don't know how many bytes to reserve for function calls. If the IR is call void @llvm.write_register.i64(metadata !0, i64 %new_sp)
call void something(ptr %a, ptr %b, ptr %c, ptr %d, ptr %e, ptr %f,
ptr %g, ptr %h, ptr %i, ptr %j, ptr %k, ptr %l) then I need enough stack to pass 12 arguments to something. If it's PPC64, then I might need 128 bytes. (It would be 32 reserved bytes, plus 96 bytes for the Parameter Save Area.) If it's some other target, I don't know. |
On Tue, Jan 28, 2025 at 07:28:07PM -0800, George Koehler wrote:
> But how should the "low level" machine then do (switching to a different
> stack frame)? Or, in gereral, set the stack pointer to a computed
> location?
I am finding that the intrinsics _read_register_ and _write_register_
can access PowerPC's stack pointer in r1.
...
I turned on my old PowerBook G4, so this code is for PPC32, not 64. I
computed a PPC32 stack pointer in C code and used it in LLVM IR.
Thanks! That's good to know. However, as I said, the PicoLisp distro
comes with pre-gererated, target-independent *.ll files, so accessing
CPU registers is no option. Then we could as well do target-dependent
stack calculations.
There isn't a general way to set the stack pointer for any target.
True, but PicoLisp runs fine on x86-64, Arm64 and RISC-V.
The 1st problem is that I don't know whether the stack grows up or
down.
PicoLisp explicitly states these requirements:
- 8 bit per byte
- 64 bit per word
- Pointer size is 64 bit
- Stack grows downwards
- sizeof(float) = 4 bytes
- sizeof(double) = 8 bytes
but an upwards growing stack could actually be handled, as the
calculations are done at runtime.
The 2nd problem is that I don't know how many bytes to reserve for
function calls.
This is a problem of the PPC ABI, no? Other ABIs have the callee set up
the stack frame, with only non-register arguments being pushed by the
caller. No need to reserve anything unknown.
So if PPC has not an option to generate code using a frame pointer, I'm
afraid we must give up PicoLisp on Ppc64.
|
The reasons why I switched the PicoLisp implementation language from C to assembly were stack manipulations and carry flag access. Then the reason to switch from assembly to LLVM-IR was machine independency. LLVM-IR as a generic assembler. If the Ppc64 ABI does not allow stack manipulations and machine independency, we cannot use it for PicoLisp. |
I meant unknown to me. I need to reserve 128 bytes on PPC64, and to reserve something unknown on other targets, because I don't know. You might know! For "arguments being pushed by the caller", it must do 2 things:
I assume that stackrestore or write_register moves the stack pointer. Then, the pointer given to stackrestore or write_register must already have enough room for the arguments for any function calls (including any hidden calls to memset or __divdi3).
PowerPC has an option to use a frame pointer, the attribute I ask LLVM how many bytes to reserve. My IR is, declare void @take(i64, i64, i64, i64, i64, i64,
i64, i64, i64, i64, i64, i64)
define void @give() {
entry:
call void @take(i64 501, i64 502, i64 503, i64 504, i64 505, i64 506,
i64 507, i64 508, i64 509, i64 510, i64 511, i64 512)
ret void
} I start with powerpc64-openbsd (by running li 3, 512
std 3, 120(1) powerpc64 needs 128 bytes. This is 120 + 8 because the 512 is at stack offset 120 and takes 8 bytes. amd64-openbsd, pushq $512 # imm = 0x200
...
pushq $507 # imm = 0x1FB amd64 needs 48 bytes. I count 6 aarch64-openbsd for arm64, mov w8, #512 // =0x200
stp x8, x30, [sp, #24] // 8-byte Folded Spill Guess arm64 needs 32 bytes. I don't know what stp does. I guess that it puts the 512 at stack offset 24, so 24 + 8 = 32. riscv64-openbsd, li t0, 512
sd t0, 24(sp) Guess riscv needs 32 bytes. mips64-openbsd, daddiu $4, $zero, 512
sd $4, 24($sp) Guess mips64 needs 32 bytes. sparc64-openbsd, mov 512, %i0
stx %i0, [%sp+2263] Guess sparc64 needs 224 bytes. If the stack bias is 2047, then the stack might begin at %sp+2047, so 2263 - 2047 + 8 = 224. My guess might be wrong, because I don't know about register windows. PowerPC is not the only target to reserve stack space for arguments. The function in this example might need the stack pointer to reserve 32 bytes on arm64 or 48 bytes on amd64. In general, if I set the stack pointer and call arbitrary functions, then my stack must reserve a machine-dependent number of bytes. (Or reserve extra; 256 bytes might work with most functions on most targets? If my function arguments fit in 6 registers, then 0 bytes might be enough on arm64 and amd64?)
Stacks are not machine-independent. I guess that amd64, arm64, mips64 and riscv64 have similar stacks. If you do something with a stack on 1 of amd64, arm64, mips64, or riscv64, then it might work on the other 3. I suspect that these 4 targets began with GCC as their 1st C compiler, and GCC designed similar stacks. powerpc64 and sparc64 have different histories,
AIX and Solaris had their own C compilers, not GCC. The different C compilers might have preferred different stack designs. Now the Ppc64 stacks and Sparc64 stacks don't look like the rest. Yes, the Ppc64 ABI allows stack manipulations. I believe that In general, the stack pointer in LLVM IR is the target's stack pointer, so it has different rules on each target. It has an LR Save Doubleword on Ppc64, a stack bias of 2047 on Sparc64, and so on. A Ppc64 or Sparc64 programmer, who sets the stack pointer in LLVM IR, would not assume that the same IR works on other targets. |
For "arguments being pushed by the caller", it must do 2 things:
1.move the stack pointer to make room for the arguments,
2.store the arguments on the stack.
Yes. Or - as I would simply say - push each argument, one by one.
Perfect!
I assume that stackrestore or write_register moves the stack pointer.
Yes.
Then, the pointer given to stackrestore or write_register must already
have enough room for the arguments for any function calls (including any
hidden calls to memset or __divdi3).
Yes.
BUT: There must be enough room BELOW the pointer, not above.
foo(a, b) pushes a and b (or other hidden stuff if needed), calls foo,
and restores the stack pointer to what it was before (it pops a and b).
The problem on Ppc64 is that it writes something ABOVE the stack
pointer. Such stuff is normally managed in a stack frame set up upon
function entry.
PowerPC has an option to use a frame pointer, the attribute
"frame-pointer"="all",
Great! Then we should try that.
but it would not help here. PPC uses the stack pointer, not the frame
pointer, to pass arguments to functions. The stack pointer still needs
to reserve space for arguments.
This is not a problem (see above, or how x86 or Arm do it).
I ask LLVM how many bytes to reserve. My IR is,
...
PowerPC is not the only target to reserve stack space for arguments.
All this is irrelevant. A function call may reserve as much space as it
wants. Just please allocate it BELOW the SP!
|
For the records: "frame-pointer"="all" did not help. It still compiles "mr 1, 3" and then "std 5, 0(1)". It overwrites the local stack area, instead of using a function-level stack frame as other CPUs do. |
Not always? There must be room ABOVE the stack pointer in this x86-64 example, declare void @llvm.write_register.i64(metadata, i64)
declare void @take(i64, i64, i64, i64, i64, i64,
i64, i64, i64, i64, i64, i64)
!0 = !{!"rsp"}
define void @give(i64 %p) {
entry:
%q = add i64 %p, 99
call void @llvm.write_register.i64(metadata !0, i64 %q)
call void @take(i64 undef, i64 undef, i64 undef, i64 undef,
i64 undef, i64 undef, i64 undef, i64 undef,
i64 undef, i64 undef, i64 undef, i64 512)
/* Missing code to restore rsp. */
ret void
} So leaq 99(%rdi), %rsp
movq $512, 40(%rsp) # imm = 0x200
callq take@PLT The leaq sets %rsp to %p + 99. Then the movq stores 512 in 8 bytes at %rsp + 40, which is ABOVE the stack pointer. Therefore, I needed to reserve 8 bytes at (%p + 99) + 40 to pass this argument. LLVM didn't move %rsp down from %p + 99 to make room for the argument at 40(%rsp); should it do so? |
On Thu, Jan 30, 2025 at 12:21:35PM -0800, George Koehler wrote:
> BUT: There must be enough room BELOW the pointer, not above.
...
The _leaq_ sets %rsp to %p + 99. Then the _movq_ stores 512 in 8 bytes
at %rsp + 40, which is ABOVE the stack pointer.
What does it prove? It is the *purpose* of that function. Of course
there is always code which accesses areas higher up in the stack.
This is not what we are talking about. We have the problem that setting
the stack pointer immediately writes to that location.
I give up. PicoLisp will not be supported on Ppc64.
It runs fine on x86-64, Arm64, Sparc and RISC-V. Enough :)
|
Thanks for your time! Let's close the issue. |
hi,
I have a question or new bug.
I have LLVM-IR code which uses @llvm.stackrestore() to set the stack pointer.
On PowerPC the LLVM backend generates instructions like "mr 1, 3" (which is correct), but then also instructions like "std 4, 0(1)" which (incorrectly in my view) write into the stack space above the newly set stack pointer, causing a crash.
Is this a known problem? If so, is there a solution?
p.s. PicoLisp crashes on LLVM14-LLVM19.1.7. on ppc64le.
The text was updated successfully, but these errors were encountered: