Add reference tracking to netdev and release netdev reference in nondl on shutdown#255
Add reference tracking to netdev and release netdev reference in nondl on shutdown#255zhuyifei1999 wants to merge 2 commits intoXilinx-CNS:masterfrom
Conversation
Netdev refcounting was added a couple years back in 5.17 [1], but onload only uses the untracked versions. This patch adds support to the ones that are low-hanging fruit. For sfc, kernel compatibility for netdevice_tracker is already provided by its kernel_compat.h. Tracking in tc_encap_actions.c is already upstream [2] so this is backporting from there. For rx_common.c I just sent a patch upstream [3] and backported it here. Outside sfc I added the same support to ci/driver/kernel_compat.h. [1] https://lore.kernel.org/netdev/20211205042217.982127-4-eric.dumazet@gmail.com/ [2] torvalds/linux@7e5e7d800011ad#diff-9ffb1b01a8a11d0d7b6976a10eede3bebdcfaf256ef26d0d95714fe8aa03c89fR156 [3] https://lore.kernel.org/netdev/20241217224717.1711626-1-zhuyifei@google.com/T/ Signed-off-by: YiFei Zhu <zhuyifei@google.com>
There was a problem hiding this comment.
Hi, thanks for your PR!
Overall, I think the netdev reference counting is a helpful addition. Once your upstream changes are accepted and have been pulled into out our sfc-linux-net repo, we will import the net driver with your changes this way.
As for your changes to the reboot notifier, it makes sense to me but I think there is an opportunity for this to race with the delete_module syscall (this is the only place I could find in the kernel source where mod->exit() is called). You have mentioned that shutdown shouldn't be concerned with this but I imagine the race would cause kernel warnings at the least. I'd like some of the other members of my team to weigh in on this.
I had a bit of consideration for this. Both cleanup functions hold RTNL lock: onload/src/driver/linux_resource/nondl_driver.c Lines 95 to 100 in 467ee38 onload/src/driver/linux_resource/nondl_resource.c Lines 90 to 109 in 467ee38 onload/src/driver/linux_resource/nondl_resource.c Lines 193 to 214 in 467ee38 So data races aren't possible. Both functions are also in the format of So both functions are essentially no-ops if they have already been called. The only place I see that it could cause crash / warn is |
fdbe123 to
2958798
Compare
ligallag-amd
left a comment
There was a problem hiding this comment.
Good points.
Regarding EFRM_ASSERT(nondl_driver == driver);, you could check if nondl_driver == NULL and exit early.
On shutdown / reboot, if netdevs don't have a refcount of 1, the
kernel loops with message:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
[...]
With netdev refcount tracking, the traces are visible:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
ref_tracker: eth%d@ff3cf768c8344548 has 1/2 users at
efrm_nic_add+0x29b/0x460 [sfc_resource]
efrm_nondl_add_device+0x124/0x1c0 [sfc_resource]
efrm_nondl_try_add_device+0x27/0xf0 [sfc_resource]
efrm_nondl_register_netdev+0x106/0x160 [sfc_resource]
nondl_register_store+0x174/0x1e0 [sfc_resource]
kernfs_fop_write_iter+0x128/0x1c0
vfs_write+0x308/0x420
ksys_write+0x5f/0xe0
do_syscall_64+0x7b/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
ref_tracker: eth%d@ff3cf768c8344548 has 1/2 users at
efrm_nondl_register_netdev+0xa9/0x160 [sfc_resource]
nondl_register_store+0x174/0x1e0 [sfc_resource]
kernfs_fop_write_iter+0x128/0x1c0
vfs_write+0x308/0x420
ksys_write+0x5f/0xe0
do_syscall_64+0x7b/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This patch makes sfc_resource register a reboot notifier that calls
efrm_nondl_unregister & efrm_nondl_shutdown that handles each of the two
refcounts. Considering that the only other caller of those functions
is the cleanup_sfc_resource module exit function, I don't think shutdown
should be concerned about module unload racing under normal circumstances.
In any case, efrm_nondl_unregister_driver is made to exit early before
the assert, in the event that the race does end up happening.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
2958798 to
c632bb4
Compare
Done. |
|
Hi all. Please take another look at this fix when you have time. Thanks. |
|
Hi @zhuyifei1999, I was doing some testing of this change but there seems to be a kernel crash when rebooting. I've been testing using a rebase of this branch onto master: https://github.com/ligallag-amd/onload/tree/yifeinetdevref (I managed to get your sfc_linux_net changes backported). I see the following fault on fault: |
|
Weird. What's the line number of the crash? It should show up in the line that has but the line is truncated. (I'll take a look when I'm more awake) |
|
Good spot - I was lazy when copy pasting from my terminal and must not have noticed this! I've updated my comment with the correct log. |
|
Huh, that would be onload/src/lib/efrm/driver_object.c Lines 178 to 179 in bbfd98d This means that by the time reboot is happening, onload is still being used. But systemd-shutdown literally broadcasts SIGKILL so nothing should be able to survive: Is there some sort of lingering state somewhere? I'll try to reproduce. |
I've tested again and I still see the panic. The machine has just come back up after reboot so should be in a clean state. All I've done prior to installing onload is load netconsole so the paired machine can see the dmesg output. Testing on master, I don't see this issue. |
|
Hmm, I think the reason I don't repro is because my assertions are off... I build with onload/src/include/ci/efrm/debug_linux.h Lines 96 to 113 in bbfd98d Anyways I looked at the code to see how it works. The list entry is removed upon the last reference to onload/src/lib/efrm/driver_object.c Line 497 in bbfd98d so I added tracking to when the reference count gets increased / decreased: diff --git a/src/lib/efrm/driver_object.c b/src/lib/efrm/driver_object.c
index 7e74afc6..437214d6 100644
--- a/src/lib/efrm/driver_object.c
+++ b/src/lib/efrm/driver_object.c
@@ -346,6 +346,8 @@ static struct efrm_client_callbacks efrm_null_callbacks = {
static void efrm_client_init_from_nic(struct efrm_nic *rnic,
struct efrm_client *client)
{
+ pr_emerg("efrm_client_init_from_nic: %px\n", client);
+ dump_stack();
client->nic = &rnic->efhw_nic;
client->ref_count = 1;
client->flags = 0;
@@ -489,6 +491,8 @@ void efrm_client_put(struct efrm_client *client)
EFRM_ASSERT(client->ref_count > 0);
spin_lock_bh(&efrm_nic_tablep->lock);
+ pr_emerg("efrm_client_put: %px %d->%d\n", client, client->ref_count, client->ref_count - 1);
+ dump_stack();
if (--client->ref_count > 0)
client = NULL;
else
@@ -503,6 +507,8 @@ void efrm_client_add_ref(struct efrm_client *client)
{
EFRM_ASSERT(client->ref_count > 0);
spin_lock_bh(&efrm_nic_tablep->lock);
+ pr_emerg("efrm_client_add_ref: %px %d->%d\n", client, client->ref_count, client->ref_count + 1);
+ dump_stack();
++client->ref_count;
spin_unlock_bh(&efrm_nic_tablep->lock);
}Normally, the initial reference happens with stack, when the nic is registered wit the sysfs file: And the final reference is when the onload module, not the sfc module, is unloaded: Because this patch only concerns itself with Anyhow, I was able to repro after adding |
What somewhat worries me is that, where And then maybe I could consider going for an It seems that the unregister would fail if the device is up: onload/src/driver/linux_resource/nondl_resource.c Lines 178 to 179 in bbfd98d ... yet when I grep for |
We are using onload in AF_XDP mode, and one thing we noticed was that as long as onload was loaded, shutdown / reboot would loop with console message:
This is because module cleanup functions only get called in the delete_module syscall and not reboot. I added reference counting and fixed the issue by adding a reboot notifier that calls the appropriate cleanup functions necessary to release the netdev references.