-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
granular parallel generic kernel for 64u_byteswap #679
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Marcus Müller <[email protected]>
651abe0
to
0a17287
Compare
is a good addition to make the byteswap somewhat performant on non-x86 platforms, especially in light of #680 |
Does this PR touch the intend of #606 ? I know, the concern different implementations. |
No, it's unrelated. However, #680 addressed that quite directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR LGTM.
However, this whole kernel has issues. The include guards are at least confusing. The _a
include guard is around the _u
kernels and vice versa. Tail handling creates copypasta code. Loop variables are defined outside of loops. All in all, this kernel needs even more clean up. This is beyond this PR though.
One concern: We just removed all the _a_generic
kernels. The diff looks like you rename one of these kernels.
Could you rebase your PR first before we merge it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rebase this PR onto the current main
? I'd say we can go with it then.
This simplifies (at least to me) understanding what the generic kernel does, and it's also about 1.5 times faster on my machines.