Skip to content

Conversation

@yuqiong6
Copy link
Contributor

The number of requested subchannels has to stay below the maximum queue limit because one queue is always reserved for the primary channel. In other words, the subchannels plus the primary channel must fit within the max_queues value, which means subchannels + 1 ≤ max_queues, so the subchannel count must be strictly less than max_queues.

Test result:

  1. Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 32
    produced the following warning:
    [4.903119] netvsp: WARN Subchannel request failed: request operation ALLOCATE, requested 32 subchannels, the maximum number of supported subchannels is 31

  2. Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 63
    produced the following warning:
    [584.376225] netvsp: WARN Subchannel request failed: request operation ALLOCATE, requested 47 subchannels, the maximum number of supported subchannels is 31
    Note: 48 is the maximum processors in a single CPU group, so netcsv trimmed 63 down to 47.

  3. Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 31
    produced no log output.

  4. Running: Set-NetAdapterRss -Name "Ethernet" -MaxProcessorNumber 15
    produced no log output.

@yuqiong6 yuqiong6 requested a review from a team as a code owner November 18, 2025 16:45
Copilot AI review requested due to automatic review settings November 18, 2025 16:45
Copilot finished reviewing on behalf of yuqiong6 November 18, 2025 16:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a validation bug in the netvsp device's subchannel allocation logic and adds diagnostic logging for failed subchannel requests. The core issue was that the validation incorrectly allowed requesting a number of subchannels equal to max_queues, which would exceed the total queue capacity when accounting for the primary channel.

Key Changes

  • Corrects the subchannel count validation from <= to < to ensure total channels (subchannels + 1 primary) don't exceed max_queues
  • Adds warning-level logging when subchannel requests fail, including the operation type, requested count, and maximum allowed subchannels

@erfrimod
Copy link
Contributor

You have a repro environment. Did you deploy these changes and see the Guest performance improve due to better RSS spreading? My worry is that the Guest will request more queues than available, get a Failure, and give up on RSS (in which case the performance is still bad). As long as the Guest comes back after Failure with a smaller allocation that gets Success, we're fine. If not, then we need to determine the number of queues they are requesting and if it might be netvsp's fault for retuning the incorrect number when the guest queries rss capabilities.

@yuqiong6
Copy link
Contributor Author

You have a repro environment. Did you deploy these changes and see the Guest performance improve due to better RSS spreading? My worry is that the Guest will request more queues than available, get a Failure, and give up on RSS (in which case the performance is still bad). As long as the Guest comes back after Failure with a smaller allocation that gets Success, we're fine. If not, then we need to determine the number of queues they are requesting and if it might be netvsp's fault for retuning the incorrect number when the guest queries rss capabilities.

Yes, I did test on the repro environment, from the data, the synthetic path on ARM with MaxProcessorNumber = 63 reaches only 15–23% of the accelerated path’s throughput up to 8 threads, then flattens around 16% beyond that point.
In contrast, with MaxProcessorNumber = 7, the synthetic path scales effectively from 303 MB/s at 1 thread to about 6.6 GB/s at 64–96 threads, steadily rising from 16% to nearly full parity with the accelerated path by 64 threads.
The low throughput observed with MaxProcessorNumber = 63 occurs because RSS failed to activate multiple queues, causing all traffic to flow through a single primary channel. I listed the detail data in the email thread: "RE: Low throughput for Underhill on Windows Arm gen9.3"

@github-actions
Copy link

@github-actions
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants