[sparx5] Use SKB reference counting for PTP tx path #16

ievenbach · 2025-07-22T20:45:12Z

There is a race condition in the TX path of PTP event packets. When the packet is transmitted, DMA is kicked off and then SKB is freed. However, in case of PTP packet, SKB is queued for timestamping, and is freed in the interrupt handler.

If PTP IRQ arrives before the call to sparx5_consume_skb(), it will free the packet, and then sparx5_consume_skb() will cause Oops, when testing whether SKB is PTP or not.

This change uses SKB reference counting to ensure SKB is freed regardless of order of events, and no crash occurs.

ievenbach · 2025-07-22T20:46:42Z

@HoratiuVultur This one is actually hard to reproduce bug. I was seeing it with gPTP and preemptible kernel every once in a blue moon.

HoratiuVultur · 2025-07-23T07:28:24Z

Hi @ievenbach ,

Thanks for the patch. I think I understand what is the problem and how you try to fix it. From what I see, this would break the lan969x implementation. Because in that driver, the skb is free only in the PTP handle interrupt. So, now with this change that you increase the ref count of the skb then the skb will never be freed. (I have not tried it yet, I just look at the code, so I might be wrong).
I was wondering, can you move the 'sparx5_consume_skb' before the call 'sparx5_fdma_reload'? because the frame is freed before the HW starts transmitting it, and this change will not affect the lan969x. And then when the fdma is not enabled we will need a memory copy into a buffer and then free the skb.

Thanks,
/Horatiu

ievenbach · 2025-07-23T18:27:36Z

@HoratiuVultur I see napi_consume_skb() call at

linux/drivers/net/ethernet/microchip/sparx5/lan969x/lan969x_fdma.c

Lines 80 to 81 in e0e964b

    
           if (!db->ptp) 
        
           	napi_consume_skb(db->skb, weight);

My guess we'd have to remove if(!db->ptp) check there
Same with _pci version.

I think ref-counting way is a cleaner solution then trying to special-case PTP packets everywhere (and it reduces amount of code!)

I don't have lan969x hw to test though. I could update my patch to include those cleanups, of course.

ievenbach · 2025-07-23T19:33:06Z

Eh. I just went ahead and added those changes :)

ievenbach · 2025-07-24T23:08:15Z

Eh. I just went ahead and added those changes :)

And forgot to push them. LOL

HoratiuVultur · 2025-07-25T06:29:50Z

Hi @ievenbach,

Sorry for late reply and thanks for the new version I will have a close look at it later this week and I will try your changes also on lan969x.
I just had a quick look and I was wondering in case a frame that needs to be timestamped failed to be transmitted (for example the HW queue is busy or the HW doesn't have more space to place the frame) then it wouldn't be a problem? Because we increase the reference multiple times and then we free it only 1 time?

Thanks,
/Horatiu

ievenbach · 2025-07-25T13:12:25Z

There is a special case there that cleans out skb from timestamp waiting queue, if it was waiting for too long. Currently wr actually have this problem in our setup - some ptp packets fail to be transmitted. But i didn't observe any memory leaks.

HoratiuVultur · 2025-07-26T13:47:54Z

Hi @ievenbach

That is true, but that is the case when the HW says that it can transmit the frame but something goes totally wrong and the frame is not transmitted or timestamp. And that is not a problem because the frame gets freed.
But I was referring to the case where the function sparx5_fdma_xmit returns busy. In that case the timestamp is released in the function sparx5_ptp_txtstamp_release but the ref count to the skb is not decreased. Then the network stack will try to send it again and we increase the ref count again to the frame, so now the frame will never be released.
I understand that you have this problem and I want to get this fix, I just don't want to get from one problem to another one.

Thanks,
/Horatiu

ievenbach · 2025-07-26T14:02:42Z

Hi @ievenbach

That is true, but that is the case when the HW says that it can transmit the frame but something goes totally wrong and the frame is not transmitted or timestamp. And that is not a problem because the frame gets freed.
But I was referring to the case where the function sparx5_fdma_xmit returns busy. In that case the timestamp is released in the function sparx5_ptp_txtstamp_release but the ref count to the skb is not decreased. Then the network stack will try to send it again and we increase the ref count again to the frame, so now the frame will never be released.
I understand that you have this problem and I want to get this fix, I just don't want to get from one problem to another one.

Thanks,
/Horatiu

I will look at this on monday. I have a feeling this might be a problem in current code as well.

HoratiuVultur · 2025-08-02T12:34:14Z

Hi @ievenbach ,

Did you have the time too look at this?

Thanks,
/Horatiu

ievenbach · 2025-08-02T12:49:46Z

Oops. Forgot tp write my findings. I don't have files and lines handy, but my reading of the code is that current version also has the problem you described for all non-ptp packets.
I can try to work on the fix.

HoratiuVultur · 2025-08-03T10:54:01Z

Hi @ievenbach,

Hm.. I had tried to look at the code but I can't find the problem with non-ptp frames. When you have the time please let me know.

Thanks,
/Horatiu

ievenbach · 2025-08-04T14:46:52Z

Hi @ievenbach,

Hm.. I had tried to look at the code but I can't find the problem with non-ptp frames. When you have the time please let me know.

Thanks, /Horatiu

If, as you say, sparx5_fdma_xmit returns an error before starting DMA. e.g. skb_padto() or fdma_db_is_done or fdma_db_get fail (

linux/drivers/net/ethernet/microchip/sparx5/sparx5_fdma.c

Lines 283 to 287 in e0e964b

    
           if (skb_put_padto(skb, ETH_ZLEN)) 
        
           	return NETDEV_TX_OK; 
        
           if (!fdma_db_is_done(fdma_db_get(fdma, fdma->dcb_index, 0))) 
        
           	return NETDEV_TX_BUSY;

)
If in that case skb isn't freed, it's not going to be freed for NON-PTP packets in current implementation (PTP event packets will be freed by timeout handler)

HoratiuVultur · 2025-08-07T06:39:36Z

Hi @ievenbach,

With the current code, will not be a problem if the function 'sparx5_fdma_xmit' will return NETDEV_TX_BUSY because then the frame will be transmitted again. And if the frame was supposed to be timestamped then when the function 'sparx5_fdma_xmit' it would release the timestamp.
With the current code, will be a problem if the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto'. If the frame is not supposed to be timestamped then it would have wrong statistics counters and the frame will not be released. If the frame was supposed to be timestamped then the issue with the statistics is still there and also the timestamp id is not released.

With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto', it would behave the same as before. And that is fine because we are not trying to fix this issue in this PR.
With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_BUSY then it would be transmitted again. But if the frame is supposed to be timestamp then you increase the ref count to the frame in the function 'sparx5_ptp_txtstamp_request' but then when the timestamp is released then the ref count is not decreased and I think here is the problem.

Thanks,
/Horatiu

ievenbach · 2025-08-08T03:38:31Z

Shall retry of the transfer cause additional request to timestamp the skb?

Hi @ievenbach,

With the current code, will not be a problem if the function 'sparx5_fdma_xmit' will return NETDEV_TX_BUSY because then the frame will be transmitted again. And if the frame was supposed to be timestamped then when the function 'sparx5_fdma_xmit' it would release the timestamp.
With the current code, will be a problem if the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto'. If the frame is not supposed to be timestamped then it would have wrong statistics counters and the frame will not be released. If the frame was supposed to be timestamped then the issue with the statistics is still there and also the timestamp id is not released.

With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_OK because of 'skb_put_padto', it would behave the same as before. And that is fine because we are not trying to fix this issue in this PR.
With your changes (if I read them correctly, so please correct me if I am wrong). If the function 'sparx5_fdma_xmit' returns NETDEV_TX_BUSY then it would be transmitted again. But if the frame is supposed to be timestamp then you increase the ref count to the frame in the function 'sparx5_ptp_txtstamp_request' but then when the timestamp is released then the ref count is not decreased and I think here is the problem.

Thanks,
/Horatiu

HoratiuVultur · 2025-08-08T05:44:17Z

I will need to double check the code but I would say so.

ievenbach · 2025-08-08T12:30:30Z

Actually, now that I think about it, if that is the case, the skb will end up on the queue twice, and will be freed by timeout thread twice, this bringing ref count to zero, so it's still ok

HoratiuVultur · 2025-08-09T09:57:42Z

But if we return NETDEV_TX_BUSY then we remove the skb from the queue and release the id.
With your changes we still do that but we just increase the reference without decreasing it. What I think it is missing from there is to decrease ref in the function sparx5_ptp_txtstamp_release.

There is a race condition in the TX path of PTP event packets. When the packet is transmitted, DMA is kicked off and then SKB is freed. However, in case of PTP packet, SKB is queued for timestamping, and is freed in the interrupt handler. If PTP IRQ arrives before the call to sparx5_consume_skb(), it will free the packet, and then sparx5_consume_skb() will cause Oops, when testing whether SKB is PTP or not. This change uses SKB reference counting to ensure SKB is freed regardless of order of events, and no crash occurs.

ievenbach · 2025-08-11T23:16:51Z

But if we return NETDEV_TX_BUSY then we remove the skb from the queue and release the id. With your changes we still do that but we just increase the reference without decreasing it. What I think it is missing from there is to decrease ref in the function sparx5_ptp_txtstamp_release.

Ah! I missed that goto!
OK. Updated the ..._release() function to do an skb_free().

That should do the trick.

HoratiuVultur · 2025-08-12T13:23:09Z

Hi @ievenbach,

Thanks for updating this, I had a look and it seems to look OK. I will give it a try later this week and I will let you know.

Thanks,
/Horatiu

HoratiuVultur · 2025-08-13T09:08:34Z

Hi @ievenbach,

I have tried this on lan969x and it seems to be working fine. I have picked this commit and add it in our internal kernel tree. This will be part of 2025.09 release.

Thanks,
/Horatiu

HoratiuVultur · 2025-08-14T13:05:03Z

Hi @ievenbach,

Unfortunately, I need to reopen this because I found a case on lan969x that it stopped working.
If you start 2 ptp instances on the same interface, then lan969x crashes because the skb reference count is different than 1 when calling pskb_expand_head.
Here is the stack trace
[ 30.403520] ------------[ cut here ]------------
[ 30.408038] kernel BUG at net/core/skbuff.c:2266!
[ 30.412727] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 30.419496] Modules linked in:
[ 30.422537] CPU: 0 UID: 0 PID: 168 Comm: ptp4l Not tainted 6.12.34-00260-ga477c1243163-dirty #1543
[ 30.431474] Hardware name: lan969x ev23x71a (pcb8398) (DT)
[ 30.436943] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 30.443886] pc : pskb_expand_head+0x2b0/0x3a4
[ 30.448226] lr : lan969x_fdma_xmit+0x15c/0x364
[ 30.452653] sp : ffff80008192b8b0
[ 30.455951] x29: ffff80008192b8b0 x28: 0000000000000044 x27: ffff800080b7a0e0
[ 30.463069] x26: ffff80008192b9c4 x25: 0000000000000000 x24: 0000000000000140
[ 30.470186] x23: ffff000063b60080 x22: ffff800081153ad0 x21: ffff00006342e100
[ 30.477304] x20: 0000000000000000 x19: ffff00006342e100 x18: 0000000000000006
[ 30.484421] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008192b3a0
[ 30.491539] x14: 0000000000000000 x13: 3120302030207469 x12: 6d785f616d64665f
[ 30.498657] x11: ffff80008109e700 x10: ffff80008109e700 x9 : 0000000000000177
[ 30.505774] x8 : ffff8000810f6700 x7 : 0000000000017fe8 x6 : 00000000fffff000
[ 30.512892] x5 : 80000000fffff000 x4 : ffff000096164200 x3 : 0000000000000820
[ 30.520009] x2 : 0000000000000140 x1 : 0000000000000002 x0 : ffff00006342e100
[ 30.527128] Call trace:
[ 30.529558] pskb_expand_head+0x2b0/0x3a4
[ 30.533550] lan969x_fdma_xmit+0x15c/0x364
[ 30.537630] sparx5_port_xmit_impl+0xc8/0x288
[ 30.541970] dev_hard_start_xmit+0x98/0x118
[ 30.546136] sch_direct_xmit+0x88/0x370
[ 30.549955] __dev_queue_xmit+0x8b0/0xdf8
[ 30.553948] packet_xmit+0xc0/0x13c
[ 30.557420] packet_sendmsg+0x800/0x1360
[ 30.561326] __sys_sendto+0x114/0x170
[ 30.564972] __arm64_sys_sendto+0x28/0x38
[ 30.568965] invoke_syscall.constprop.0+0x50/0xe4
[ 30.573652] do_el0_svc+0x40/0xc8
[ 30.576950] el0_svc+0x38/0x158
[ 30.580075] el0t_64_sync_handler+0x120/0x12c
[ 30.584415] el0t_64_sync+0x190/0x194
[ 30.588066] Code: 52800021 97fffee7 17ffffb9 d4210000 (d4210000)
[ 30.594139] ---[ end trace 0000000000000000 ]---
[ 30.598740] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[ 30.606119] Kernel Offset: disabled
[ 30.609587] CPU features: 0x00,00000000,00200000,0200420b
[ 30.614969] Memory Limit: none
[ 30.618012] Rebooting in 1 seconds..
We didn't have this issue before because we didn't use the skb_get when doing timestamping.

ievenbach · 2025-08-14T15:37:21Z

Hmm. Why is it caused by two instances on the same interface?

HoratiuVultur · 2025-08-14T19:03:25Z

Hi @ievenbach,

Apparently when two instances are running on the same interface then the function 'skb_header_cloned' returns true and because this is true then the function 'lan969x_fdma_xmit' will call 'pskb_expand_head' and this function doesn't allow to be called when the skb ref is different than 1.
I will need to understand better that part of the code and see how it can be modified.

Thanks,
/Horatiu

ievenbach · 2025-08-26T16:55:18Z

Hi @ievenbach,

Apparently when two instances are running on the same interface then the function 'skb_header_cloned' returns true and because this is true then the function 'lan969x_fdma_xmit' will call 'pskb_expand_head' and this function doesn't allow to be called when the skb ref is different than 1. I will need to understand better that part of the code and see how it can be modified.

Thanks, /Horatiu

Reproduced this locally on my Sparx5 HW. Will get back to you with a fix.

For PTP event packets, sparx5_ptp_txtstamp_request increases reference count, which makes skb_put_padto BUG() - reference count should be exactly one for that one.

ievenbach · 2025-08-26T18:54:04Z

I reproduced this on sparx5. Here the code path was ->fdma_xmit -> skb_put_padto-> pskb_expand_head
This code path also exists on lanXXX versions. I fixed it by moving skb_put_padto call outside of fdma_xmit, before request for PTP TX timestamps increases ref count.

However, I see other code paths on lanXXX that lead to another call to pskb_expand_head. I am not sure if these paths are taken in the PTP case or not. If you could test it on actual HW, it'd be great.

In any case, with this patch applied, I can't repro it on sparx5 any more

ievenbach force-pushed the vendor/microchip/bsp-6.12-2025/upstream/ptp-tx-oops branch from 7c079e6 to 895c0f6 Compare July 24, 2025 23:07

ievenbach force-pushed the vendor/microchip/bsp-6.12-2025/upstream/ptp-tx-oops branch from 895c0f6 to 1b6fd11 Compare August 11, 2025 23:14

HoratiuVultur closed this Aug 13, 2025

HoratiuVultur reopened this Aug 14, 2025

[sparx5] Pad the SKB to minimum size before calling request_tx_ts

ca3756d

For PTP event packets, sparx5_ptp_txtstamp_request increases reference count, which makes skb_put_padto BUG() - reference count should be exactly one for that one.

[sparx5] Use SKB reference counting for PTP tx path #16

Are you sure you want to change the base?

[sparx5] Use SKB reference counting for PTP tx path #16

Uh oh!

Conversation

ievenbach commented Jul 22, 2025

Uh oh!

ievenbach commented Jul 22, 2025

Uh oh!

HoratiuVultur commented Jul 23, 2025

Uh oh!

ievenbach commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ievenbach commented Jul 23, 2025

Uh oh!

ievenbach commented Jul 24, 2025

Uh oh!

HoratiuVultur commented Jul 25, 2025

Uh oh!

ievenbach commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HoratiuVultur commented Jul 26, 2025

Uh oh!

ievenbach commented Jul 26, 2025

Uh oh!

HoratiuVultur commented Aug 2, 2025

Uh oh!

ievenbach commented Aug 2, 2025

Uh oh!

HoratiuVultur commented Aug 3, 2025

Uh oh!

ievenbach commented Aug 4, 2025

Uh oh!

HoratiuVultur commented Aug 7, 2025

Uh oh!

ievenbach commented Aug 8, 2025

Uh oh!

HoratiuVultur commented Aug 8, 2025

Uh oh!

ievenbach commented Aug 8, 2025

Uh oh!

HoratiuVultur commented Aug 9, 2025

Uh oh!

ievenbach commented Aug 11, 2025

Uh oh!

HoratiuVultur commented Aug 12, 2025

Uh oh!

HoratiuVultur commented Aug 13, 2025

Uh oh!

HoratiuVultur commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ievenbach commented Aug 14, 2025

Uh oh!

HoratiuVultur commented Aug 14, 2025

Uh oh!

ievenbach commented Aug 26, 2025

Uh oh!

ievenbach commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ievenbach commented Jul 23, 2025 •

edited

Loading

ievenbach commented Jul 25, 2025 •

edited

Loading

HoratiuVultur commented Aug 14, 2025 •

edited

Loading

ievenbach commented Aug 26, 2025 •

edited

Loading