-
Notifications
You must be signed in to change notification settings - Fork 161
Keep requested subchannels below maximum queue limit and add subchannel request failed logging #2430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…el request error logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a validation bug in the netvsp device's subchannel allocation logic and adds diagnostic logging for failed subchannel requests. The core issue was that the validation incorrectly allowed requesting a number of subchannels equal to max_queues, which would exceed the total queue capacity when accounting for the primary channel.
Key Changes
- Corrects the subchannel count validation from
<=to<to ensure total channels (subchannels + 1 primary) don't exceedmax_queues - Adds warning-level logging when subchannel requests fail, including the operation type, requested count, and maximum allowed subchannels
|
You have a repro environment. Did you deploy these changes and see the Guest performance improve due to better RSS spreading? My worry is that the Guest will request more queues than available, get a Failure, and give up on RSS (in which case the performance is still bad). As long as the Guest comes back after Failure with a smaller allocation that gets Success, we're fine. If not, then we need to determine the number of queues they are requesting and if it might be netvsp's fault for retuning the incorrect number when the guest queries rss capabilities. |
Yes, I did test on the repro environment, from the data, the synthetic path on ARM with MaxProcessorNumber = 63 reaches only 15–23% of the accelerated path’s throughput up to 8 threads, then flattens around 16% beyond that point. |
…penvmm into user/yuqliu/subchannelrequest
The number of requested subchannels has to stay below the maximum queue limit because one queue is always reserved for the primary channel. In other words, the subchannels plus the primary channel must fit within the max_queues value, which means subchannels + 1 ≤ max_queues, so the subchannel count must be strictly less than max_queues.
Test result:
Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 32
produced the following warning:
[4.903119] netvsp: WARN Subchannel request failed: request operation ALLOCATE, requested 32 subchannels, the maximum number of supported subchannels is 31
Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 63
produced the following warning:
[584.376225] netvsp: WARN Subchannel request failed: request operation ALLOCATE, requested 47 subchannels, the maximum number of supported subchannels is 31
Note: 48 is the maximum processors in a single CPU group, so netcsv trimmed 63 down to 47.
Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 31
produced no log output.
Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 15
produced no log output.