You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should create a simd_read_array intrinsic because for sizeof(Simd<T, N>) > sizeof([T; N]) (which can happen until #319 is fixed) read_unaligned is probably UB due to being able to read the bytes beyond the end of the input array -- the padding in the Simd<T, N>.
We need an intrinsic rather than just using memcpy because the intrinsic will generate llvm's load instruction with vector type (llvm guarantees vector load won't read padding if the load's align is small enough), whereas memcpy may end up using less efficient array-typed loads which sometimes use scalar code.