Skip to content

Conversation

malsyned
Copy link
Contributor

@malsyned malsyned commented Sep 5, 2025

When a non-privileged task voluntarily relinquishes the CPU, ulContext contains
a pointer into the system call stack, rather than the task stack. While that
task is paused, pulTaskStack (pulTaskStackPointer on some ports) will contain a
backup of the task's stack pointer.

However, when that task returns to a RUNNING state, pulTaskStack gets zeroed and
the only way to get a stack top that bears any resemblance to reality is to read
directly from the stack pointer register.

Note that this can still have surprising results if the program is halted in an
ISR or system call on architectures which use dedicated exception stacks. The
second commit in this PR adds a Cortex-A/R-specific workaround for this. It's a
separate commit in case it proves to be controversial.

During ISRs and other exception handlers, $sp uses one of the banked r13
registers (r13_irq, r13_fiq, etc.). Use $r13_usr if available instead.
@PhilippHaefele
Copy link
Collaborator

Thanks for starting implementing this 👍. Please let me check some things during next week. We should still do some additional checks as discussed in the issue (interrupt).

Also I currently do not see a need to only do it for the MPU enabled version. @haneefdm What do you think?

@haneefdm
Copy link
Contributor

haneefdm commented Sep 6, 2025

Also I currently do not see a need to only do it for the MPU enabled version. @haneefdm What do you think?

If at all possible, we should do it for all cases. I just haven't finished thinking it through.

But it does bring up another topic. Is this for FreeRTOS only? Ideally the non-MPU case applies to all and we can implement it uniformly. If we only have time to do it for FreeRTOS, fine. We should put it in a TODO list.

It's a separate commit in case it proves to be controversial.

@malsyned, so this means I should remember not do a "Squash and merge" when accepting this PR? That is the default

@malsyned
Copy link
Contributor Author

malsyned commented Sep 6, 2025

I'm late to thinking about this, it's only been on my mind since the MPU-enabled port I'm working with behaves so radically differently w.r.t. the stack. Here's where my head's at on this, though:

On architectures that don't use a separate interrupt stack, like Cortex-M, I think there's no harm in using $sp. You get correct Stack Top and Stack Used info for the running thread, even within ISRs, with no downsides.

However, on architectures with a separate interrupt stack and banked registers, like Cortex-A/R ARM Cortex, using $sp instead of pxTopOfStack can degrade user experience during ISRs relative to the status quo. During an ISR, instead of seeing a slightly out of date Stack Top retrieved from pxTopOfStack, they'd see a Stack Top that pointed into the interrupt stack and had no relationship to pxStack or pxEndOfStack. In my recent testing I've seen this happen and lead to huge values for Stack Used.

The MPU adds an additional wrinkle that makes $sp more appealing, manly because the status quo user experience is worse to begin with. I'll go into the gory details below, but the upshot is that for running tasks, the stack pointer from the last context switch is often not pointing into the task stack anyway, leading to huge or even negative Stack Used.

MPU, not using $sp (status quo):
Running after yield: ❌
Running after preemption: ✅
In ISR: Maybe ❌, maybe ✅, depending on the same conditions

MPU, using $sp:
Running after yield: ✅
Running after preemption: ✅
In ISR: ❌

For myself, I'd rather have behavior that is consistent, and that always works when not halted in an ISR, than behavior that may or may not be correct depending on recent past events. That's why I created this PR that only uses $sp for MPU ports.

On architectures with a dedicated ISR stack, I think any work-around would be necessarily platform-specific. My PR's second commit adds the platform-specific workaround for Cortex-A/R, which is to try $r13_usr before trying $sp. The third commit adds a Cortex-M workaround of trying $psp. In practice, the only FreeRTOS-MPU ports are for ARM, so until that changes this is a complete fix ✅✅✅. If this PR were expanded to include non-MPU ports, though, then any non-ARM, dedicated-irq-stack architecture will have incorrect Stack Top values for a running task that is halted in an ISR.


The promised MPU gory details:

In addition to possibly having separate hardware support for the ISR stack, FreeRTOS-MPU uses a separate per-task stack for system calls. Whenever a non-privileged task voluntarily relinquishes the CPU, the saved stack pointer (stored in ulContext, not in pxTopOfStack) points into the system call stack, not the task stack. While the task is paused, the task stack pointer is backed up elsewhere and rtos-views uses the backup, but that backup gets zeroed out when the task gets put back on the CPU, leaving ulContext holding a pointer into the system call stack and $sp as the only place where a pointer into the task stack can be found.

@malsyned
Copy link
Contributor Author

malsyned commented Sep 6, 2025

If at all possible, we should do it for all cases. I just haven't finished thinking it through.

My personal opinion is that it's better to have up-to-date Stack Top & Stack Used in regular code at the cost of surprising results on some architectures while in an ISR. However, I can easily see how reasonable people could disagree about this, especially since users may experience it as a regression. I created this more limited MPU-only PR in the hopes that you'll find it easier to get comfortable with merging, since the case for it being an improvement or even just a bug fix is more clear-cut.

But it does bring up another topic. Is this for FreeRTOS only? Ideally the non-MPU case applies to all and we can implement it uniformly. If we only have time to do it for FreeRTOS, fine. We should put it in a TODO list.

For this PR, I'm thinking of the use of $sp as a workaround for FreeRTOS-MPU's weird stack handling. If usage of $sp expands out beyond FreeRTOS-MPU, then yes I think it probably makes plenty of sense to use it for every RTOS.

It's a separate commit in case it proves to be controversial.

@malsyned, so this means I should remember not do a "Squash and merge" when accepting this PR? That is the default

If you all decide you don't want the second commit with the Cortex-A/R-specific fix for Stack Top reporting during ISRs, just let me know and I'll reset my PR branch to be just the first commit. Then you can squash and merge as normal.

@haneefdm
Copy link
Contributor

haneefdm commented Sep 6, 2025

@malsyned Keep your PR the way it is. It serves as a reminder for me not to squash the commits.

Lets give @PhilippHaefele a bit more time to put in his comments/concerns before I merge.

Let's also document that stack may not be correct when stopped inside an ISR.

@PhilippHaefele
Copy link
Collaborator

On architectures that don't use a separate interrupt stack, like Cortex-M, I think there's no harm in using $sp. You get correct Stack Top and Stack Used info for the running thread, even within ISRs, with no downsides.

I'm a little bit confused about this. I did a quick look and at least for most of the Cortex-M ports of FreeRTOS and uC/OS-II do use the PSP for the tasks and to my knowledge interrupts always use the MSP.

And that's where we need the special handling. Something like this should be the way to go (at least for Cortex-M):

Check if we're in an interrupt (e.g. via the CONTROL register) -> when so we either consider using the PSP or the last saved one (I don't think that someone will break in the SVC handler and do expect that our values are precise + old task shouldn't be running anymore) -> if not we can use the SP which in this case has the same value as the PSP.

After writing this we maybe just consider using the PSP for the currently running task when available... 🤔

@PhilippHaefele
Copy link
Collaborator

PhilippHaefele commented Sep 6, 2025

But it does bring up another topic. Is this for FreeRTOS only? Ideally the non-MPU case applies to all and we can implement it uniformly. If we only have time to do it for FreeRTOS, fine. We should put it in a TODO list.

As I think that this handling mainly has architectural elements, we can easily adapt the other RTOS implementation too. There is a small chance that some ports e.g. do not use the PSP stuff from Cortex-M but I do not see a lot of reasons not to do this.
We should still should check at least some of the most common ports (Cortex-M/A/R) of the RTOS implementations before introducing the change to them.

Regarding the hint I also have something in mind. Maybe printing the architecture next to detected RTOS (we do have a function since the last MR now to check at least for some of them) and add a hint when we do not properly support/know the architecture somewhere (e.g. below the table)

@malsyned
Copy link
Contributor Author

malsyned commented Sep 7, 2025

On architectures that don't use a separate interrupt stack, like Cortex-M, I think there's no harm in using $sp. You get correct Stack Top and Stack Used info for the running thread, even within ISRs, with no downsides.

I'm a little bit confused about this. I did a quick look and at least for most of the Cortex-M ports of FreeRTOS and uC/OS-II do use the PSP for the tasks and to my knowledge interrupts always use the MSP.

That's my mistake, I haven't worked with Cortex-M in a while, and I got it confused with some other architectures in my head. I'll go back and strike out the parts of my posts that relate to that mistake.

After writing this we maybe just consider using the PSP for the currently running task when available... 🤔

If I'm understanding this right, then I think always using PSP is the right move. The XRTOS tab is talking about the task's stack position and usage, not the stack as seen by ISR and other non-task frames. The regular debugger backtrace can tell you everything you need to know about the ISR's stack while it's executing. So (unless I'm still mistaken about PSP and MSP), PSP will always unambiguously give the currently running task's stack pointer regardless of whether an ISR is executing, and can be preferred to SP whenever it exists. Is my reasoning sound there?

One thing we haven't talked about, and I don't know how to deal with, is multi-core. If there is more than one task in the RUNNING state, will rtos-views see the right value of SP / PSP / R13_USR for each of them by virtue of passing the correct frameId, or is there something more that has to be done?

During ISRs and other exception handlers, $sp is linked to $msp. Use the
task stack pointer $psp if available instead.
@malsyned
Copy link
Contributor Author

malsyned commented Sep 7, 2025

Just pushed a new commit that adds Cortex-M $psp support. It could use some more testing, but I can confirm it still works on my Cortex-R board.

Note that none of this is using the architecture detection yet. The only real value to doing so would be the possibility of a slight speed-up under some circumstances by avoiding trying $psp and $r13_usr before $sp on every refresh.

Another thing to note is that while the target description from JLinkGDBServer calls the Cortex-A/M register $r13_usr, it looks like OpenOCD calls it $sp_usr. So this isn't just a matter of MCU architecture, but also of debugging tools. I think probably trial-and-error on the register name is a more flexible approach than architecture detection. It's all a bit messy, though, huh?

@PhilippHaefele
Copy link
Collaborator

Thanks for adapting your comments and code.

I will try to test the code with ARMv7-M & ARMv8.1-M, so we`re more confident about the new approach with PSP (e.g. as the 8 8.1 introduces additional safe contexts with additional stack pointers).
I might still consider using the architecture detection to give the user a hint that maybe something might be inaccurate...

Also not sure about multi core there was a discussion/MR regarding this for the FreeRTOS in the past.
Also not sure what happens right now when we have two debug sessions at the same time. I have a picture in my head with two thread tables one after another, but absolutely not sure if this really happens...

@haneefdm
Copy link
Contributor

Very good question about multi-core. But this is a problem even within a gdb-server. I have no idea what OpenOCD/JLinkServer are actually telling GDB?

Also when you are multi-core, then we may have multiple RUNNING threads. Now, if we evaluate the '$reg' in the context of the right frame-id, we should get the right 'reg'. I believe we are not using the frame-id because we assume everything is global. If so, this is a flaw in itself -- but a don't care for single cores.

@haneefdm haneefdm mentioned this pull request Sep 28, 2025
@PhilippHaefele
Copy link
Collaborator

Was able to do some testing this week. It does in general work on ARMv7-M (M7) & ARMv8.1-M (M85). As I`m not yet into secure state stuff of the M85, I can't guarantee that it is still working properly when we have NON-Secure & Secure running. PSP works fine for both of them with Cortex Debug + an adapted version of the uC/OS-II support code (was quite easy to adapt).

From my side we can merge this and I will try to get support into to other RTOS implementations I maintain soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants