Skip to content

Conversation

ievenbach
Copy link
Contributor

There is a race condition in the TX path of PTP event packets. When the packet is transmitted, DMA is kicked off and then SKB is freed. However, in case of PTP packet, SKB is queued for timestamping, and is freed in the interrupt handler.

If PTP IRQ arrives before the call to sparx5_consume_skb(), it will free the packet, and then sparx5_consume_skb() will cause Oops, when testing whether SKB is PTP or not.

This change uses SKB reference counting to ensure SKB is freed regardless of order of events, and no crash occurs.

@ievenbach
Copy link
Contributor Author

@HoratiuVultur This one is actually hard to reproduce bug. I was seeing it with gPTP and preemptible kernel every once in a blue moon.

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach ,

Thanks for the patch. I think I understand what is the problem and how you try to fix it. From what I see, this would break the lan969x implementation. Because in that driver, the skb is free only in the PTP handle interrupt. So, now with this change that you increase the ref count of the skb then the skb will never be freed. (I have not tried it yet, I just look at the code, so I might be wrong).
I was wondering, can you move the 'sparx5_consume_skb' before the call 'sparx5_fdma_reload'? because the frame is freed before the HW starts transmitting it, and this change will not affect the lan969x. And then when the fdma is not enabled we will need a memory copy into a buffer and then free the skb.

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

ievenbach commented Jul 23, 2025

@HoratiuVultur I see napi_consume_skb() call at

if (!db->ptp)
napi_consume_skb(db->skb, weight);

My guess we'd have to remove if(!db->ptp) check there
Same with _pci version.

I think ref-counting way is a cleaner solution then trying to special-case PTP packets everywhere (and it reduces amount of code!)

I don't have lan969x hw to test though. I could update my patch to include those cleanups, of course.

@ievenbach
Copy link
Contributor Author

Eh. I just went ahead and added those changes :)

@ievenbach ievenbach force-pushed the vendor/microchip/bsp-6.12-2025/upstream/ptp-tx-oops branch from 7c079e6 to 895c0f6 Compare July 24, 2025 23:07
@ievenbach
Copy link
Contributor Author

Eh. I just went ahead and added those changes :)

And forgot to push them. LOL

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach,

Sorry for late reply and thanks for the new version I will have a close look at it later this week and I will try your changes also on lan969x.
I just had a quick look and I was wondering in case a frame that needs to be timestamped failed to be transmitted (for example the HW queue is busy or the HW doesn't have more space to place the frame) then it wouldn't be a problem? Because we increase the reference multiple times and then we free it only 1 time?

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

ievenbach commented Jul 25, 2025

There is a special case there that cleans out skb from timestamp waiting queue, if it was waiting for too long. Currently wr actually have this problem in our setup - some ptp packets fail to be transmitted. But i didn't observe any memory leaks.

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach

That is true, but that is the case when the HW says that it can transmit the frame but something goes totally wrong and the frame is not transmitted or timestamp. And that is not a problem because the frame gets freed.
But I was referring to the case where the function sparx5_fdma_xmit returns busy. In that case the timestamp is released in the function sparx5_ptp_txtstamp_release but the ref count to the skb is not decreased. Then the network stack will try to send it again and we increase the ref count again to the frame, so now the frame will never be released.
I understand that you have this problem and I want to get this fix, I just don't want to get from one problem to another one.

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

Hi @ievenbach

That is true, but that is the case when the HW says that it can transmit the frame but something goes totally wrong and the frame is not transmitted or timestamp. And that is not a problem because the frame gets freed.
But I was referring to the case where the function sparx5_fdma_xmit returns busy. In that case the timestamp is released in the function sparx5_ptp_txtstamp_release but the ref count to the skb is not decreased. Then the network stack will try to send it again and we increase the ref count again to the frame, so now the frame will never be released.
I understand that you have this problem and I want to get this fix, I just don't want to get from one problem to another one.

Thanks,
/Horatiu

I will look at this on monday. I have a feeling this might be a problem in current code as well.

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach ,

Did you have the time too look at this?

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

Oops. Forgot tp write my findings. I don't have files and lines handy, but my reading of the code is that current version also has the problem you described for all non-ptp packets.
I can try to work on the fix.

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach,

Hm.. I had tried to look at the code but I can't find the problem with non-ptp frames. When you have the time please let me know.

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

Hi @ievenbach,

Hm.. I had tried to look at the code but I can't find the problem with non-ptp frames. When you have the time please let me know.

Thanks, /Horatiu

If, as you say, sparx5_fdma_xmit returns an error before starting DMA. e.g. skb_padto() or fdma_db_is_done or fdma_db_get fail (

if (skb_put_padto(skb, ETH_ZLEN))
return NETDEV_TX_OK;
if (!fdma_db_is_done(fdma_db_get(fdma, fdma->dcb_index, 0)))
return NETDEV_TX_BUSY;
)
If in that case skb isn't freed, it's not going to be freed for NON-PTP packets in current implementation (PTP event packets will be freed by timeout handler)

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach,

With the current code, will not be a problem if the function 'sparx5_fdma_xmit' will return NETDEV_TX_BUSY because then the frame will be transmitted again. And if the frame was supposed to be timestamped then when the function 'sparx5_fdma_xmit' it would release the timestamp.
With the current code, will be a problem if the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto'. If the frame is not supposed to be timestamped then it would have wrong statistics counters and the frame will not be released. If the frame was supposed to be timestamped then the issue with the statistics is still there and also the timestamp id is not released.

With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto', it would behave the same as before. And that is fine because we are not trying to fix this issue in this PR.
With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_BUSY then it would be transmitted again. But if the frame is supposed to be timestamp then you increase the ref count to the frame in the function 'sparx5_ptp_txtstamp_request' but then when the timestamp is released then the ref count is not decreased and I think here is the problem.

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

Shall retry of the transfer cause additional request to timestamp the skb?

Hi @ievenbach,

With the current code, will not be a problem if the function 'sparx5_fdma_xmit' will return NETDEV_TX_BUSY because then the frame will be transmitted again. And if the frame was supposed to be timestamped then when the function 'sparx5_fdma_xmit' it would release the timestamp.
With the current code, will be a problem if the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto'. If the frame is not supposed to be timestamped then it would have wrong statistics counters and the frame will not be released. If the frame was supposed to be timestamped then the issue with the statistics is still there and also the timestamp id is not released.

With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto', it would behave the same as before. And that is fine because we are not trying to fix this issue in this PR.
With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_BUSY then it would be transmitted again. But if the frame is supposed to be timestamp then you increase the ref count to the frame in the function 'sparx5_ptp_txtstamp_request' but then when the timestamp is released then the ref count is not decreased and I think here is the problem.

Thanks,
/Horatiu

@HoratiuVultur
Copy link
Collaborator

I will need to double check the code but I would say so.

@ievenbach
Copy link
Contributor Author

Actually, now that I think about it, if that is the case, the skb will end up on the queue twice, and will be freed by timeout thread twice, this bringing ref count to zero, so it's still ok

@HoratiuVultur
Copy link
Collaborator

But if we return NETDEV_TX_BUSY then we remove the skb from the queue and release the id.
With your changes we still do that but we just increase the reference without decreasing it. What I think it is missing from there is to decrease ref in the function sparx5_ptp_txtstamp_release.

There is a race condition in the TX path of PTP event packets.
When the packet is transmitted, DMA is kicked off and then SKB is freed.
However, in case of PTP packet, SKB is queued for timestamping, and is
freed in the interrupt handler.

If PTP IRQ arrives before the call to sparx5_consume_skb(), it will free
the packet, and then sparx5_consume_skb() will cause Oops, when testing
whether SKB is PTP or not.

This change uses SKB reference counting to ensure SKB is freed regardless
of order of events, and no crash occurs.
@ievenbach ievenbach force-pushed the vendor/microchip/bsp-6.12-2025/upstream/ptp-tx-oops branch from 895c0f6 to 1b6fd11 Compare August 11, 2025 23:14
@ievenbach
Copy link
Contributor Author

But if we return NETDEV_TX_BUSY then we remove the skb from the queue and release the id. With your changes we still do that but we just increase the reference without decreasing it. What I think it is missing from there is to decrease ref in the function sparx5_ptp_txtstamp_release.

Ah! I missed that goto!
OK. Updated the ..._release() function to do an skb_free().

That should do the trick.

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach,

Thanks for updating this, I had a look and it seems to look OK. I will give it a try later this week and I will let you know.

Thanks,
/Horatiu

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach,

I have tried this on lan969x and it seems to be working fine. I have picked this commit and add it in our internal kernel tree. This will be part of 2025.09 release.

Thanks,
/Horatiu

@HoratiuVultur
Copy link
Collaborator

HoratiuVultur commented Aug 14, 2025

Hi @ievenbach,

Unfortunately, I need to reopen this because I found a case on lan969x that it stopped working.
If you start 2 ptp instances on the same interface, then lan969x crashes because the skb reference count is different than 1 when calling pskb_expand_head.
Here is the stack trace
[ 30.403520] ------------[ cut here ]------------
[ 30.408038] kernel BUG at net/core/skbuff.c:2266!
[ 30.412727] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 30.419496] Modules linked in:
[ 30.422537] CPU: 0 UID: 0 PID: 168 Comm: ptp4l Not tainted 6.12.34-00260-ga477c1243163-dirty #1543
[ 30.431474] Hardware name: lan969x ev23x71a (pcb8398) (DT)
[ 30.436943] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 30.443886] pc : pskb_expand_head+0x2b0/0x3a4
[ 30.448226] lr : lan969x_fdma_xmit+0x15c/0x364
[ 30.452653] sp : ffff80008192b8b0
[ 30.455951] x29: ffff80008192b8b0 x28: 0000000000000044 x27: ffff800080b7a0e0
[ 30.463069] x26: ffff80008192b9c4 x25: 0000000000000000 x24: 0000000000000140
[ 30.470186] x23: ffff000063b60080 x22: ffff800081153ad0 x21: ffff00006342e100
[ 30.477304] x20: 0000000000000000 x19: ffff00006342e100 x18: 0000000000000006
[ 30.484421] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008192b3a0
[ 30.491539] x14: 0000000000000000 x13: 3120302030207469 x12: 6d785f616d64665f
[ 30.498657] x11: ffff80008109e700 x10: ffff80008109e700 x9 : 0000000000000177
[ 30.505774] x8 : ffff8000810f6700 x7 : 0000000000017fe8 x6 : 00000000fffff000
[ 30.512892] x5 : 80000000fffff000 x4 : ffff000096164200 x3 : 0000000000000820
[ 30.520009] x2 : 0000000000000140 x1 : 0000000000000002 x0 : ffff00006342e100
[ 30.527128] Call trace:
[ 30.529558] pskb_expand_head+0x2b0/0x3a4
[ 30.533550] lan969x_fdma_xmit+0x15c/0x364
[ 30.537630] sparx5_port_xmit_impl+0xc8/0x288
[ 30.541970] dev_hard_start_xmit+0x98/0x118
[ 30.546136] sch_direct_xmit+0x88/0x370
[ 30.549955] __dev_queue_xmit+0x8b0/0xdf8
[ 30.553948] packet_xmit+0xc0/0x13c
[ 30.557420] packet_sendmsg+0x800/0x1360
[ 30.561326] __sys_sendto+0x114/0x170
[ 30.564972] __arm64_sys_sendto+0x28/0x38
[ 30.568965] invoke_syscall.constprop.0+0x50/0xe4
[ 30.573652] do_el0_svc+0x40/0xc8
[ 30.576950] el0_svc+0x38/0x158
[ 30.580075] el0t_64_sync_handler+0x120/0x12c
[ 30.584415] el0t_64_sync+0x190/0x194
[ 30.588066] Code: 52800021 97fffee7 17ffffb9 d4210000 (d4210000)
[ 30.594139] ---[ end trace 0000000000000000 ]---
[ 30.598740] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[ 30.606119] Kernel Offset: disabled
[ 30.609587] CPU features: 0x00,00000000,00200000,0200420b
[ 30.614969] Memory Limit: none
[ 30.618012] Rebooting in 1 seconds..
We didn't have this issue before because we didn't use the skb_get when doing timestamping.

@HoratiuVultur HoratiuVultur reopened this Aug 14, 2025
@ievenbach
Copy link
Contributor Author

Hmm. Why is it caused by two instances on the same interface?

@HoratiuVultur
Copy link
Collaborator

Hi @ievenbach,

Apparently when two instances are running on the same interface then the function 'skb_header_cloned' returns true and because this is true then the function 'lan969x_fdma_xmit' will call 'pskb_expand_head' and this function doesn't allow to be called when the skb ref is different than 1.
I will need to understand better that part of the code and see how it can be modified.

Thanks,
/Horatiu

@ievenbach
Copy link
Contributor Author

Hi @ievenbach,

Apparently when two instances are running on the same interface then the function 'skb_header_cloned' returns true and because this is true then the function 'lan969x_fdma_xmit' will call 'pskb_expand_head' and this function doesn't allow to be called when the skb ref is different than 1. I will need to understand better that part of the code and see how it can be modified.

Thanks, /Horatiu

Reproduced this locally on my Sparx5 HW. Will get back to you with a fix.

For PTP event packets, sparx5_ptp_txtstamp_request increases reference
count, which makes skb_put_padto BUG() - reference count should be
exactly one for that one.
@ievenbach
Copy link
Contributor Author

ievenbach commented Aug 26, 2025

I reproduced this on sparx5. Here the code path was ->fdma_xmit -> skb_put_padto-> pskb_expand_head
This code path also exists on lanXXX versions. I fixed it by moving skb_put_padto call outside of fdma_xmit, before request for PTP TX timestamps increases ref count.

However, I see other code paths on lanXXX that lead to another call to pskb_expand_head. I am not sure if these paths are taken in the PTP case or not. If you could test it on actual HW, it'd be great.

In any case, with this patch applied, I can't repro it on sparx5 any more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants