-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make compress_block function public #13
Comments
Very interesting. Yes, we could totally expose it. The current private compression APIs include some complexity that might not be relevant for your use case:
It sounds you're only looking to expose 1x compression, in which case the looping optimization isn't necessary, because there's nothing to transpose. And you don't need the stride parameter. You presumably still want to do runtime feature detection (whether to use the One issue I notice in the spec you linked, is that the number of rounds is a free parameter. That clashes with this implementation in this crate, which hardcodes all the rounds so that they all get inlined. There are a number of different SIMD tricks that get deployed to load message words efficiently, which differ from round to round (here's the code). It's possible we could split all that out into a big switch statement, but I also worry that branch prediction isn't going to work well there, and that performance would suffer. That's the sort of thing I'd want to test very carefully. A side note: API described in the spec you linked is a little less flexible than it could be, because while it takes a variable number of rounds, it always assumes it's starting from round 1. A more flexible API might be able to express something like "just do one round of compression, but make it round number 3." Another side note: If you're exposing the BLAKE2b compression function, you might also want to consider exposing BLAKE2s. The relative performance of BLAKE2b and BLAKE2s on 64-bit CPUs is somewhat complicated once SIMD enters the picture, and BLAKE2b isn't always faster. |
Thank you for the quick and very insightful reply!
There is ifunc trick to do cpu detection only on the first call.
I see, that's a valid concern and I agree the impact needs to be measured.
I think the current spec tries to expose the bare minimum and can be extended later. |
There are some needs to use the compression function from RFC 7693. Would you be willing to expose it in your awesome crate or extract it in a lower-level crate that other libs can use?
The text was updated successfully, but these errors were encountered: