Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PowerPC] uses @llvm.stackrestore() to set the stack pointer #123389

Open
tankf33der opened this issue Jan 17, 2025 · 20 comments
Open

[PowerPC] uses @llvm.stackrestore() to set the stack pointer #123389

tankf33der opened this issue Jan 17, 2025 · 20 comments
Labels
backend:PowerPC question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@tankf33der
Copy link

tankf33der commented Jan 17, 2025

hi,

I have a question or new bug.

I have LLVM-IR code which uses @llvm.stackrestore() to set the stack pointer.
On PowerPC the LLVM backend generates instructions like "mr 1, 3" (which is correct), but then also instructions like "std 4, 0(1)" which (incorrectly in my view) write into the stack space above the newly set stack pointer, causing a crash.

Is this a known problem? If so, is there a solution?

p.s. PicoLisp crashes on LLVM14-LLVM19.1.7. on ppc64le.

@EugeneZelenko EugeneZelenko added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! backend:PowerPC and removed new issue labels Jan 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/issue-subscribers-backend-powerpc

Author: Mike (tankf33der)

hi,

I have a question or new bug.

I have LLVM-IR code which uses @llvm.stackrestore() to set the stack pointer.
On PowerPC the LLVM backend generates instructions like "mr 1, 3" (which is correct), but then also instructions like "std 4, 0(1)" which (incorrectly in my view) write into the stack space above the newly set stack pointer, causing a crash.

Is this a known problem? If so, is there a solution?

p.s. PicoLisp crashes on LLVM14-LLVM19 on ppc64le.

@kernigh
Copy link
Contributor

kernigh commented Jan 17, 2025

Hello. In the PowerPC ABI for C code, the stack pointer %r1 must be a multiple at 16, and the first words are reserved. The pointer at 0(%r1) should point to the caller's frame. If you are resizing your frame as if by C's alloca(3), then I would expect LLVM to write the pointer at 0(%r1). If you are moving %r1 to an existing frame, then I would expect LLVM to refrain from overwriting the pointer at 0(%r1).

A little-endian ppc64le uses the 64-Bit ELF V2 ABI Specification. Its example, in section 2.3.8 Dynamic Stack Space Allocation, resizes a frame using stdu to both set %r1 and write the pointer at the new 0(%r1).

@tankf33der
Copy link
Author

We are not using the C ABI (raw LLVM-IR only), but the problem is in how the backend transforms LLVM-IR to PPC64 assembly.

Only Tier1 platform works consistently everywhere - as soon as you take a step to the side, various backend errors and bugs start occurring.

@gerdolf
Copy link

gerdolf commented Jan 19, 2025

Let me show an example for the issue from production code. The following
three lines from the PicoLisp Source are compiled to LLVM-IR by the
PicoLisp build process.

The assembly code produced by the LLVM backend then are correct for
x86-64 and Arm64, but contains for Ppc64 two additional lines commented
with "???" below.

These second of these two lines "std 4, 0(1)" writes to the stack memory
after the stack pointer was set, causing a crash.

PicoLisp Source:

   (memset
      (Main: lim (stack (ofs Stk (- Siz))))
      7 (- Siz 256) T )

LLVM-IR:

   ; # (Main: lim (stack (ofs Stk (- Siz))))
     %78 = getelementptr i8, i8* %63, i32 40
     %79 = bitcast i8* %78 to i8**
     %80 = sub i64 0, %76
     %81 = getelementptr i8, i8* %77, i64 %80
     call void @llvm.stackrestore(i8* %81)
     store i8* %81, i8** %79
   ; # (- Siz 256)
     %82 = sub i64 %76, 256
   ; # (memset (Main: lim (stack (ofs Stk (- Siz)))) 7 (- Siz 256) T)
     call void @llvm.memset.p0i8.i64(i8* align 8 %81, i8 7, i64 %82, i1 0)

x86-64:

   subq  %rdx, %rdi           # %81 = getelementptr ..
   movq  %rdi, %rsp           # stackrestore(%81)
   movq  %rdi, 40(%r14)       # store i8* %81, ..
   addq  $-256, %rdx          # %82 = sub ..
   movl  $7, %esi             # 7
   callq memset@PLT

Arm64:

   sub   x0, x20, x8          # %81 = getelementptr ..
   stp   x9, x0, [x21, #32]   # store i8* %81, ..
   mov   sp, x0               # stackrestore(%81)
   sub   x2, x8, #256         # %82 = sub ..
   mov   w1, #7               # 7
   bl memset

Ppc64:

   sub 3, 21, 20              # %81 = getelementptr ..
   ld 4, 0(1)                 # ???
   mr 1, 3                    # stackrestore(%81)
   std 4, 0(1)                # ???
   addi 5, 20, -256           # %82 = sub ..
   li 4, 7                    # 7
   std 3, 40(27)              # store i8* %81, ..
   bl memset

I believe this is a bug in the LLVM backend for Ppc64.

@nemanjai
Copy link
Member

The PPC backend ensures that the function conforms to the ABI. The ABI requires the back chain to be kept at 0(1) (i.e. at the first slot in the stack frame). When you change the stack pointer, the back end presumably saves it in order to maintain the back chain. Why is this a problem? I would expect that if you save the stack pointer and then restore it, it will just write the original value to the back chain.
Would you be able to provide a complete (minimal) LLVM IR function that the reader can compile to see the problem?

@gerdolf
Copy link

gerdolf commented Jan 26, 2025

The ABI is not relevant here, as it is for languages like C. We have no "back chain". We are switching between coroutines, not C-like stack frames.

See my examples above. Non-PPC architectures compile stackrestore() as a single instruction like "mov sp, x0", and do not mess with the memory the stack pointer points to.

As I understand it, the backend for PPC does an illegal mixing of assumptions between low level (LLVM-IR) and higher levels (C).

The minimal LLVM IR code above is taken directly from the PicoLisp distribution code. Unfortunately I have no access to a running PPC system. If the above code snippet is not helpful enough, perhaps @tankf33der wrap it into a function and test it?

@nemanjai
Copy link
Member

I understand that you are working in some system that does not need to conform to the ABI. However, what annotation is there to tell the back end that the code generated for this particular function does not need to conform to the ABI? If the back end just sees a function (particularly if the function has external linkage), it must emit code that conforms to the ABI. This is why I asked for a compilable piece of IR so we can see what attribute or similar annotation needs to be added to it in order to tell the back end that it does not need to handle the stack pointer in any special way.
Let's wait until the OP provides a piece of code that we can compile and we'll see.

FWIW, for code generation (i.e. if you don't need to execute the code on real HW), you shouldn't need a PPC machine. If it is helpful, you can just paste your code in godbolt.org and select clang for PowerPC as the compiler.

@gerdolf
Copy link

gerdolf commented Jan 26, 2025

clang will not help. There is no C involved.

The above Lisp code sets the stack pointer to a newly allocated coroutine stack segment. The PicoLisp compiler cannot make any assumption about where is written what by which ABI into that segment. It just wants to set the stack pointer to a fresh memory. It crashes if that memory is unexpectedly overwritten, because the backend believes it is C.

@tankf33der , should we give up on PPC?

@gerdolf
Copy link

gerdolf commented Jan 26, 2025

Perhaps the backend confuses stackrestore() with longjmp() ?

@kernigh
Copy link
Contributor

kernigh commented Jan 28, 2025

This Ppc64 code is wrong, even after removing the ??? lines.

   sub 3, 21, 20              # %81 = getelementptr ..
   ld 4, 0(1)                 # ???
   mr 1, 3                    # stackrestore(%81)
   std 4, 0(1)                # ???
   addi 5, 20, -256           # %82 = sub ..
   li 4, 7                    # 7
   std 3, 40(27)              # store i8* %81, ..
   bl memset

This is a wrong call to a C function memset(r3 = r1, r4 = 7, r5 = r20 - 256). At this call, the stack pointer must obey the C ABI, which (for Ppc64 ELFv2) reserves the 32 bytes from 0(r1) to 31(r1). If r20 - 256 > 0, then memset would clobber the reserved bytes and probably crash the program.

Ppc64 ELFv2 2.2.2 The Stack Frame would require for bl memset that,

  • r1 must be a multiple of 16. ("The stack is quadword aligned.")
  • 0(r1) is the Back Chain, a pointer to the caller's stack frame.
  • 8(r1) is the CR Save Word, where memset may save the condition register.
  • 16(r1) is the LR Save Doubleword, where memset may save its return address.
  • 24(r1) is the TOC Pointer Doubleword, where memset's plt stub might save the TOC pointer.

An optimized memset might skip saving CR or LR, but probably saves TOC. If r20 - 256 > 24, then this illegal memset would clobber TOC and soon crash the program.

I argue that LLVM IR runs on the C stack, so it must obey the C ABI for stack frames. The IR is free to ignore C types (like int, double complex, struct or whatever), but it must respect the C ABI for the stack pointer.

Would it work to subtract 32 from the stack pointer, and round it down to a multiple 16, before passing it to llvm.stackrestore?


I will argue that llvm.stackrestore is correct to set 0(r1).

The doc for llvm.stackrestore refers to llvm.stacksave, which says that stackrestore "pops any alloca blocks" since stacksave. This suggests that llvm.stackrestore is for shrinking the current stack frame (not for switching to a different stack frame).

I wrote some LLVM IR to pop an alloca block,

target triple = "powerpc64-unknown-openbsd"
declare void @take(ptr)
define void @give(i64 %s, i64 %t) {
entry:
  %sp = call ptr @llvm.stacksave.p0()
  %a = alloca i8, i64 %s
  call void @take(ptr %a)
  call void @llvm.stackrestore.p0(ptr %sp)
  %b = alloca i8, i64 %t
  call void @take(ptr %b)
  ret void
}

Then I ran llc exam.ll to get ppc64 code in exam.s. I quote part of this code. I add some register names, changing mflr 0 to mflr %r0 and so on.

	mflr %r0
	stdu %r1, -64(%r1)
	std %r0, 80(%r1)

Allocate a stack frame of 64 bytes, set the Back Chain at old -64(r1) or new 0(r1), then save the return address (via r0) in the caller's LR Save Doubleword at old 16(r1) or new 80(r1).

	addi %r3, %r3, 15
	rldicr %r3, %r3, 0, 59
	neg 3, 3

Set r3 = -((s + 15) & ~15), rounding up s to a multiple of 16, then negating it.

	mr	%r31, %r1
	addi %r4, %r31, 64
	mr	%r29, %r1
	stdux %r4, %r1, %r3

Enlarge our stack frame by at least s bytes, and also move the Back Chain from the old 0(r1) to the new 0(r1). It would be correct to load r4 = 0(r1), but we are enlarging from 64 bytes, so r31 = r1, r4 = r31 + 64 gets the same value. The studx adds r3 to r1 (which moves r1 down by the rounded s) and also sets the new Back Chain = r4.

Do stacksave by setting r29 = r1 before moving r1. (This is a missed optimization: we don't need r29 = r1 because we have r31 = r1.)

	addi %r3, %r1, 32
	bl take
        nop

Call take(r3 = r1 + 32). We can't call take(r1), because we can't clobber the 32 reserved bytes.

	ld %r3, 0(%r1)
	mr	%r1, %r29
	std %r3, 0(%r1)

Do stackrestore, shrinking our stack frame to free the s bytes. This restores r1 = r29 and also moves the Back Chain from old 0(r1) to new 0(r1). This looks correct to me. LLVM IR functions run on the C stack and have a Back Chain, like C functions do.

@gerdolf
Copy link

gerdolf commented Jan 28, 2025

Thank you very much for the explanations and examples!

I understand this is all about differing assumptions, and I'm afraid there is no easy solution.

This Ppc64 code is wrong, even after removing the ??? lines.

Yes, that's the problem.

The LLVM backend makes certain assumptions about different levels which I believe should not be mixed up. The original meaning of "LLVM" is "low level virtual machine", not "C machine".

This is a wrong call to a C function memset(r3 = r1, r4 = 7, r5 = r20 -
256). At this call, the stack pointer must obey the C ABI,

'memset' being a C function is an assumption. The LLVM-IR code just says

 call void @llvm.memset.p0i8.i64(i8* align 8 %81, i8 7, i64 %82, i1 0)

'memset' is an intrinsic. It could well be a set of machine instructions.

I argue that LLVM IR runs on the C stack, so it must obey the C ABI for
stack frames.

I agree as far as function entry and exit code is concerned. It is typically generated by "define .. { .. }" and may set/clean up stack frames, initialize registers and so on. But I would not agree for a low level primitive like setting a pointer.

Would it work to subtract 32 from the stack pointer, and round it down
to a multiple 16, before passing it to llvm.stackrestore?

Theoretically yes. But PicoLisp is distributed as a system-agnostic IR file, to allow the initial build after installation. We would need to distribute a special version for Ppc64.

The doc for llvm.stackrestore refers to llvm.stacksave, which says that
stackrestore "pops any alloca blocks" since stacksave. This suggests
that llvm.stackrestore is for shrinking the current stack frame

Right. I understand this as a typical use case, but not as the only one.

So I assume that stackrestore() simply sets a pointer, while the Ppc64 backend assumes a whole stackframe machinery (which I would in turn assume for higher level stuff like longjmp()).

(not for switching to a different stack frame).

But how should the "low level" machine then do it? Or, in gereral, set the stack pointer to a computed location?

There is no option for 'llvm-as' or 'opt' to fix this behavior, right? Like, for example, using a dedicated (asynchronous) frame pointer?

@kernigh
Copy link
Contributor

kernigh commented Jan 29, 2025

'memset' being a C function is an assumption.... It could well be a set of machine instructions.

I assume that any operation in LLVM IR might call a C function. This is sdiv i64 for 32-bit PowerPC,

target triple = "powerpc-unknown-openbsd"
define i64 @invert(i64 %i) {
entry:
  %j = sdiv i64 1, %i
  ret i64 %j
}

The command llc invert.ll writes an invert.s containing bl __divdi3. This is a C function call __divdi3(r3:r4 = 1, r5:r6 = i). I can sdiv i64 on Ppc32 only if my stack pointer allows C function calls.

But how should the "low level" machine then do (switching to a different stack frame)? Or, in gereral, set the stack pointer to a computed location?

I am finding that the intrinsics read_register and write_register can access PowerPC's stack pointer in r1. The doc for read_register and write_register was outdated and incomplete, so I needed to glance at some test cases in llvm-project/llvm/test/CodeGen/PowerPC.

I turned on my old PowerBook G4, so this code is for PPC32, not 64. I computed a PPC32 stack pointer in C code and used it in LLVM IR.

chstack.ll

target triple = "powerpc-unknown-openbsd"
declare i32 @llvm.read_register.i32(metadata)
declare void @llvm.write_register.i32(metadata, i32)
!0 = !{!"r1"}
define void @chstack(i32 %new_sp, ptr %fun) {
entry:
  %old_sp = call i32 @llvm.read_register.i32(metadata !0)
  call void @llvm.write_register.i32(metadata !0, i32 %new_sp)
  call void %fun()
  call void @llvm.write_register.i32(metadata !0, i32 %old_sp)
  ret void
}

main.c

#include <sys/mman.h>
#include <stdio.h>
void chstack(void *, void (*)(void));
void hello(void) { puts("Hello from other stack."); }
int main(void) {
  size_t size = 256 * 1024;
  /* In OpenBSD, stack must be MAP_STACK. */
  char *stack = mmap(NULL, size, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANON | MAP_STACK, -1, 0);
  if (stack == MAP_FAILED) { perror("mmap"); return 1; }
  /* In PPC32, stack grows down, and frame needs 16 bytes. */
  char *sp = stack + size - 16;
  chstack(sp, hello);
}

It seems to work,

$ llc-17 chstack.ll                                                   
$ clang-17 -O2 -o main main.c chstack.s 
$ ./main
Hello from other stack.

If it didn't work, I might have tried adding the attribute "frame-pointer"="all" to my function chstack.

There isn't a general way to set the stack pointer for any target. The 1st problem is that I don't know whether the stack grows up or down. (It will probably grow down unless someone teaches LLVM to target HPPA.) The 2nd problem is that I don't know how many bytes to reserve for function calls. If the IR is

  call void @llvm.write_register.i64(metadata !0, i64 %new_sp)
  call void something(ptr %a, ptr %b, ptr %c, ptr %d, ptr %e, ptr %f,
                      ptr %g, ptr %h, ptr %i, ptr %j, ptr %k, ptr %l)

then I need enough stack to pass 12 arguments to something. If it's PPC64, then I might need 128 bytes. (It would be 32 reserved bytes, plus 96 bytes for the Parameter Save Area.) If it's some other target, I don't know.

@gerdolf
Copy link

gerdolf commented Jan 29, 2025 via email

@gerdolf
Copy link

gerdolf commented Jan 29, 2025

The reasons why I switched the PicoLisp implementation language from C to assembly were stack manipulations and carry flag access.

Then the reason to switch from assembly to LLVM-IR was machine independency. LLVM-IR as a generic assembler. If the Ppc64 ABI does not allow stack manipulations and machine independency, we cannot use it for PicoLisp.

@kernigh
Copy link
Contributor

kernigh commented Jan 30, 2025

This is a problem of the PPC ABI, no? Other ABIs have the callee set up the stack frame, with only non-register arguments being pushed by the caller. No need to reserve anything unknown.

I meant unknown to me. I need to reserve 128 bytes on PPC64, and to reserve something unknown on other targets, because I don't know. You might know!

For "arguments being pushed by the caller", it must do 2 things:

  1. move the stack pointer to make room for the arguments,
  2. store the arguments on the stack.

I assume that stackrestore or write_register moves the stack pointer. Then, the pointer given to stackrestore or write_register must already have enough room for the arguments for any function calls (including any hidden calls to memset or __divdi3).

So if PPC has not an option to generate code using a frame pointer, I'm
afraid we must give up PicoLisp on Ppc64.

PowerPC has an option to use a frame pointer, the attribute "frame-pointer"="all", but it would not help here. PPC uses the stack pointer, not the frame pointer, to pass arguments to functions. The stack pointer still needs to reserve space for arguments.


I ask LLVM how many bytes to reserve. My IR is,

declare void @take(i64, i64, i64, i64, i64, i64,
                   i64, i64, i64, i64, i64, i64)
define void @give() {
entry:
  call void @take(i64 501, i64 502, i64 503, i64 504, i64 505, i64 506,
                  i64 507, i64 508, i64 509, i64 510, i64 511, i64 512)
  ret void
}

I start with powerpc64-openbsd (by running llc --mtriple=powerpc64-openbsd give.ll) and see in give.s,

	li 3, 512
	std 3, 120(1)

powerpc64 needs 128 bytes. This is 120 + 8 because the 512 is at stack offset 120 and takes 8 bytes.

amd64-openbsd,

	pushq	$512                            # imm = 0x200
	...
	pushq	$507                            # imm = 0x1FB

amd64 needs 48 bytes. I count 6 pushqs, so 6 × 8 = 48.

aarch64-openbsd for arm64,

	mov	w8, #512                        // =0x200
	stp	x8, x30, [sp, #24]              // 8-byte Folded Spill

Guess arm64 needs 32 bytes. I don't know what stp does. I guess that it puts the 512 at stack offset 24, so 24 + 8 = 32.

riscv64-openbsd,

	li	t0, 512
	sd	t0, 24(sp)

Guess riscv needs 32 bytes.

mips64-openbsd,

	daddiu	$4, $zero, 512
	sd	$4, 24($sp)

Guess mips64 needs 32 bytes.

sparc64-openbsd,

	mov	512, %i0
	stx %i0, [%sp+2263]

Guess sparc64 needs 224 bytes. If the stack bias is 2047, then the stack might begin at %sp+2047, so 2263 - 2047 + 8 = 224. My guess might be wrong, because I don't know about register windows.

PowerPC is not the only target to reserve stack space for arguments. The function in this example might need the stack pointer to reserve 32 bytes on arm64 or 48 bytes on amd64. In general, if I set the stack pointer and call arbitrary functions, then my stack must reserve a machine-dependent number of bytes. (Or reserve extra; 256 bytes might work with most functions on most targets? If my function arguments fit in 6 registers, then 0 bytes might be enough on arm64 and amd64?)

If the Ppc64 ABI does not allow stack manipulations and machine independency, we cannot use it for PicoLisp.

Stacks are not machine-independent. I guess that amd64, arm64, mips64 and riscv64 have similar stacks. If you do something with a stack on 1 of amd64, arm64, mips64, or riscv64, then it might work on the other 3. I suspect that these 4 targets began with GCC as their 1st C compiler, and GCC designed similar stacks.

powerpc64 and sparc64 have different histories,

  • PowerPC began with IBM AIX.
  • Sparc began with Solaris.

AIX and Solaris had their own C compilers, not GCC. The different C compilers might have preferred different stack designs. Now the Ppc64 stacks and Sparc64 stacks don't look like the rest.

Yes, the Ppc64 ABI allows stack manipulations. I believe that @llvm.stackrestore can shrink a Ppc64 stack frame and @llvm.write_register.i64 can set Ppc64's stack pointer, but the stack must look like a Ppc64 stack, and reserve space for the LR Save Doubleword and such.

In general, the stack pointer in LLVM IR is the target's stack pointer, so it has different rules on each target. It has an LR Save Doubleword on Ppc64, a stack bias of 2047 on Sparc64, and so on. A Ppc64 or Sparc64 programmer, who sets the stack pointer in LLVM IR, would not assume that the same IR works on other targets.

@gerdolf
Copy link

gerdolf commented Jan 30, 2025 via email

@gerdolf
Copy link

gerdolf commented Jan 30, 2025

For the records:

"frame-pointer"="all" did not help.

It still compiles "mr 1, 3" and then "std 5, 0(1)".

It overwrites the local stack area, instead of using a function-level stack frame as other CPUs do.

@kernigh
Copy link
Contributor

kernigh commented Jan 30, 2025

BUT: There must be enough room BELOW the pointer, not above.

Not always? There must be room ABOVE the stack pointer in this x86-64 example,

declare void @llvm.write_register.i64(metadata, i64)
declare void @take(i64, i64, i64, i64, i64, i64,
                   i64, i64, i64, i64, i64, i64)
!0 = !{!"rsp"}
define void @give(i64 %p) {
entry:
  %q = add i64 %p, 99
  call void @llvm.write_register.i64(metadata !0, i64 %q)
  call void @take(i64 undef, i64 undef, i64 undef, i64 undef,
                  i64 undef, i64 undef, i64 undef, i64 undef,
                  i64 undef, i64 undef, i64 undef, i64 512)
  /* Missing code to restore rsp. */
  ret void
}

So llc --mtriple=amd64-openbsd give.ll emits,

	leaq	99(%rdi), %rsp
	movq	$512, 40(%rsp)                  # imm = 0x200
	callq	take@PLT

The leaq sets %rsp to %p + 99. Then the movq stores 512 in 8 bytes at %rsp + 40, which is ABOVE the stack pointer. Therefore, I needed to reserve 8 bytes at (%p + 99) + 40 to pass this argument. LLVM didn't move %rsp down from %p + 99 to make room for the argument at 40(%rsp); should it do so?

@gerdolf
Copy link

gerdolf commented Jan 30, 2025 via email

@gerdolf
Copy link

gerdolf commented Jan 30, 2025

Thanks for your time!

Let's close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:PowerPC question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

6 participants