-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve JPEG encode_block perf by about 25% with faster integer encoding #1912
base: main
Are you sure you want to change the base?
Conversation
Looks like NonZeroU32::unsigned_abs was stabilized in 1.64... let's see if there's a workaround |
We can also bump MSRV for a perf win, it's not a set-in-stone restriction. Bump here and as indicated in the comment to do it with the PR: https://github.com/image-rs/image/blob/master/Cargo.toml#L7-L8 |
I see MSRV has been bumped to 1.67 since, which is higher than this PR's requirement of 1.64. I understand there is nothing blocking this, and it can be merged? The change is simple, covered by tests, and the performance gains are tantalizing. |
Will look into reviving this PR once |
…ed_abs. This has a significant speed impact on leading zero calc on x86
Rebased since this change had sat there for a while |
There's still a CI test_toolchain with 1.63 that needs to be upgraded before this can be integrated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR!
I think it is kind of doing a lot and would be better split into a bunch of PRs. Perhaps:
- Adding benchmarks to give a baseline performance number and justify subsequent PRs
- Optimizations to
BitWriter
- New version of
write_block
- New version of
encode_coefficient
@@ -765,7 +855,8 @@ fn build_quantization_segment(m: &mut Vec<u8>, precision: u8, identifier: u8, qt | |||
} | |||
} | |||
|
|||
fn encode_coefficient(coefficient: i32) -> (u8, u16) { | |||
#[cfg(feature = "benchmarks")] | |||
fn encode_coefficient_old(coefficient: i32) -> (u8, u16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should keep around an old implementation simply to benchmark against
I license past and future contributions under the dual MIT/Apache-2.0 license,
allowing licensees to choose either at their option.
Also add benches for tracking performance of JPEG block encoding.
pre:
test codecs::jpeg::encoder::tests::bench_encode_block ... bench: 548 ns/iter (+/- 131) = 135 MB/s
post:
test codecs::jpeg::encoder::tests::bench_encode_block ... bench: 410 ns/iter (+/- 140) = 180 MB/s