Do not allow hw filter stealing in no-5tuple support scenarios#279
Do not allow hw filter stealing in no-5tuple support scenarios#279mina wants to merge 1 commit intoXilinx-CNS:masterfrom
Conversation
Currently we're running into the following behavior in AF_XDP based deployment (which is 3-tuple only support): - Onload stack A starts and listens on port X. - Onload stack A exits; its cleanup of hw filters is delayed. - Onload stack B starts and listens on the same port X. - Onload stack B sees the 3-tuple hw filter inserted by stack A, which has not yet been cleaned. - In the current code, stack B will reuse the hw filter of stack A, but that 3-tuple hw filter points to the wrong stack, and the wrong RX queue. - This causes all the sockets created by stack B to become unaccelerated, because they land on the rxq that belonged to stack A, not the rxq that belongs to stack B. The root cause of this issue is that oof_socket_add_wild expects oof_local_port_addr_fixup_wild to fix hw filters that point to different stacks, but that will not actually happen if (!oof_socket_can_share_hw_filter && fm->fm_hwports_no5tuple). This creates the undesirable behavior that all sockets in this new stack being unaccelerated, even when the user passed EF_NO_FAIL=0 (!) Fix this by returning an error in this edge case. This means that a new onload stack will not reuse a hw filter that doesn't belong to it. The user can restart the workload when the old onload stack has finished cleanup of hw filters. Note that this patch will change the behavior for all fm->fm_hwports_no5tuple deployments. This seems beneficial as all those deployments should see the same bug, but other options are possible: - We could add a module parameter that configures this behavior. - We may be able to only disable this behavior for AF_XDP deployments. tested by running these commands with onload in a tight loop: onload tcp_rr -F 1000 onload udp_rr -F 1000 tcp_rr and udp_rr reuse the same ports by default. In all cases, onload will start successfully if there is no existing hw filter left over from the previous onload invocation, and will fail with -EEXIST if there is a left over filter from the previous stack. Change-Id: I6e196ac55cc81efbc537206cdc6e20b2068a7e3d
|
The way that sort of scenario is expected to be handled is through use of the existing onload module parameters |
|
Thanks for the suggestion, I've tried The behavior with
This all works as expected, except in the case where the new onload stack receives data between steps 2 and 3. In this case, I still see some unaccelerated sockets: What happened in this run is that 573 sockets were opened before oof_local_port_addr_fixup_wild ran, and those were accelerated, and then 428 sockets were open after oof_local_port_addr_fixup_wild and fixed up the hw filters, so those became accelerated. The solution in this patch still seems valid to me, let me know if you disagree. Basically we make it so that oof_socket_add_wild sees an existing filter pointing to another onload stack, then it will fail creating the new stack, rather than relying to a future call to oof_local_port_addr_fixup_wild to fix the hw filters (and there is a time window where the hw filter is not yet stolen, which means unaccelerated sockets). We will share the fix we found for the stealing in oof_local_port_addr_fixup_wild to work properly separately. |
Is #282 |
Currently we're running into the following behavior in AF_XDP based deployment (which is 3-tuple only support):
The root cause of this issue is that oof_socket_add_wild expects oof_local_port_addr_fixup_wild to fix hw filters that point to different stacks, but that will not actually happen if
(!oof_socket_can_share_hw_filter && fm->fm_hwports_no5tuple). This creates the undesirable behavior that all sockets in this new stack being unaccelerated, even when the user passed EF_NO_FAIL=0 (!)
Fix this by returning an error in this edge case. This means that a new onload stack will not reuse a hw filter that doesn't belong to it. The user can restart the workload when the old onload stack has finished cleanup of hw filters.
Note that this patch will change the behavior for all fm->fm_hwports_no5tuple deployments. This seems beneficial as all those deployments should see the same bug, but other options are possible:
tested by running these commands with onload in a tight loop:
onload tcp_rr -F 1000
onload udp_rr -F 1000
tcp_rr and udp_rr reuse the same ports by default. In all cases, onload will start successfully if there is no existing hw filter left over from the previous onload invocation, and will fail with -EEXIST if there is a left over filter from the previous stack.
Change-Id: I6e196ac55cc81efbc537206cdc6e20b2068a7e3d