Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: ARM64/NEON Support #769

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JVital2013
Copy link

Fixes NEON support on Windows ARM64. Most of the changes deal with the somewhat odd way in which Microsoft deals with the uint8x16_t, etc types.

Additionally, MSVC on ARM/ARM64 does not support any inline assembly. Therefore __VOLK_ASM does not work, so a few workarounds have been implemented. One of these workarounds is to assume NEON is present if you're using MSVC, and building for ARM64 instead of running a check. NEON is a requirement for Windows on ARM as shown here: https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170.

Note that NEONv7 on Windows is not supported in this PR, as it would require re-wiring all the kernels into MASM (or some other format) to build on MSVC. Windows on 32-bit ARM is being deprecated anyway, so I'm thinking the neonv7 kernels can be ignored on Windows.

Signed-off-by: Jamie Vital <[email protected]>
@@ -299,7 +299,11 @@ volk_32f_index_max_32u_neon(uint32_t* target, const float* src0, uint32_t num_po
if (maxValuesBuffer[number] > max) {
index = maxIndexesBuffer[number];
max = maxValuesBuffer[number];
#ifdef _MSC_VER
} else if (maxValues.n128_f32[number] == max) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I don't quite understand that; what type is float32x4_t on MSVC/aarch64/neon?

Comment on lines +224 to +227
if(MSVC)
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "ARM")
overrule_arch(neonv8 "Compiler doesn't support neonv8")
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

My comments are rather critical because the changes are specific to MSVC, which is notoriously non-standard compliant. This might affect the result for other compilers.

I'd really like to merge this PR but make sure, that we don't see a performance regression.

The whole assume NEON is present etc. reasoning is good. Maybe, you could add a comment next to the checks to make sure this knowledge doesn't get lost over time.

Comment on lines 238 to 242
have_neonv7_result)
check_c_source_compiles(
"#include <volk/volk_common.h>\n int main(){__VOLK_ASM(\"sub v1.4s,v1.4s,v1.4s\");}"
have_neonv8_result)

if(NOT have_neonv7_result)
overrule_arch(neonv7 "Compiler doesn't support neonv7")
endif()

if(NOT have_neonv8_result)
overrule_arch(neonv8 "Compiler doesn't support neonv8")
endif()
else(neon_compile_result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you want to remove the non-MSVC checks here? Looking at this section makes me think, it might be a good time to unify this section. Just a thought.

Comment on lines -294 to +303
#if defined(__aarch64__)
#ifdef _MSC_VER
#define DO_RBIT \
*out_ptr = _byteswap_ulong(*in_ptr); \
*out_ptr = ((*out_ptr & 0x55555555) << 1) | ((*out_ptr & 0xAAAAAAAA) >> 1); \
*out_ptr = ((*out_ptr & 0x33333333) << 2) | ((*out_ptr & 0xCCCCCCCC) >> 2); \
*out_ptr = ((*out_ptr & 0x0F0F0F0F) << 4) | ((*out_ptr & 0xF0F0F0F0) >> 4); \
in_ptr++; \
out_ptr++;
#elif defined(__aarch64__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section indicates that we already had special treatment for some conditions. Now, it looks like we exchange them for another set of conditions. This change would warrant a comment on the specifics of different platforms. This will help anyone looking at this in the future.

Comment on lines -265 to +267
const uint8x16_t idx = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
uint8x16_t idx;
const uint8_t idx_data[] = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
idx = vld1q_u8(idx_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? The data type implies 16 values of the same type. Does MSVC fail to perform the correct aggregate initialization? This indirection may negatively impact optimizations.
I'd like to see this unified in one line. Would that be possible?
Would it be possible to have a godbolt/compiler explorer comparison of this for GCC/Clang vs. MSVC?

int32x4_t toint_a = { 0, 0, 0, 0 }, toint_b = { 0, 0, 0, 0 };
int32x4_t toint_a = { 0 }, toint_b = { 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, could you change this to give every variable their own statement?

Comment on lines 231 to +235
float32x4_t in_vec;
float32x4_t out_vec0 = { 0.f, 0.f, 0.f, 0.f };
float32x4_t out_vec1 = { 0.f, 0.f, 0.f, 0.f };
float32x4_t out_vec2 = { 0.f, 0.f, 0.f, 0.f };
float32x4_t out_vec3 = { 0.f, 0.f, 0.f, 0.f };
float32x4_t out_vec0 = { 0.f };
float32x4_t out_vec1 = { 0.f };
float32x4_t out_vec2 = { 0.f };
float32x4_t out_vec3 = { 0.f };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this affects initialization. Are we sure all values are initialized correctly for all compilers?

Comment on lines +287 to +291
#ifdef _MSC_VER
} else if (minValues.n128_f32[number] == min) {
#else
} else if (minValues[number] == min) {
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like smth very special. Could you add a comment with some context? Maybe a link that explains the necessary change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants