-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port BitArray.And, Or, Xor and Not to Vector<T> #72471
Conversation
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsI started the learning process by porting very simple To my surprise, there is a notable regression: x64BenchmarkDotNet=v0.13.1.1799-nightly, OS=Windows 11 (10.0.22000.795/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.6.22352.1
[Host] : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT
The assembly diff for arm64BenchmarkDotNet=v0.13.1.1786-nightly, OS=ubuntu 20.04
Unknown processor
.NET SDK=7.0.100-rc.1.22368.24
[Host] : .NET 7.0.0 (7.0.22.36704), Arm64 RyuJIT
@tannergooding I don't want to merge this PR as is, but rather find out whether I am doing everything right and what could be done to close the gap. cc @stephentoub
|
} | ||
} | ||
else if (Vector128.IsHardwareAccelerated) | ||
if (Vector.IsHardwareAccelerated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Vector<T>
is likely a non-starter right now.
Not only can it not be used with R2R (crossgen) code today since it is variable sized and therefore forces the code to be jitted, but it also is missing certain APIs that are available to Vector128/256<T>
. A few of these functions, like ExtractMostSignificantBits
can't be exposed on Vector<T>
, others like LoadUnsafe
could be but aren't today.
There is a design being considered (dotnet/designs#268) where we can extend Vector<T>
to better work with such scenarios and better enable its usage in other scenarios, but that isn't available at the moment and its not 100% clear what shape that will end up as.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case in particular, Vector<T>
would likely be similar in perf if we had LoadUnsafe
and StoreUnsafe
APIs available.
However, it would still come with the restriction that it could not participate in R2R (and therefore forces jitting on first use) and it may regress scenarios where the backing data is typically "small".
For example, Vector<T>
on most modern x64 hardware is equivalent to only having the Vector256<T>
path. On Arm64, its equivalent to only having the Vector128<T>
path. For x64, this means that inputs less than 32-bytes (and potentially 64-bytes in the future) will behave "worse" than the equivalent on Arm64 as they'll execute as "scalar" rather than as "vector". Likewise, depending on data layout, alignment, and processor, it may behave "worse" for inputs up to ~256 bytes as well.
As indicated, these are scenarios that are being looked at and considered, but its not something that we can easily do today.
I was able to reduce the gap using BenchmarkDotNet=v0.13.1.1828-nightly, OS=Windows 11 (10.0.22000.795/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.6.22352.1
[Host] : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT AVX2
|
I started the learning process by porting very simple
BitArray
APIs: And, Or, Xor and Not.To my surprise, there is a notable regression:
x64
The assembly diff for
Not
can be found herearm64
@tannergooding I don't want to merge this PR as is, but rather find out whether I am doing everything right and what could be done to close the gap.
cc @stephentoub