Skip to content

NEON-optimize 9-7 IDWT#1629

Merged
rouault merged 1 commit into
uclouvain:masterfrom
nico:neon-irreversible
Apr 24, 2026
Merged

NEON-optimize 9-7 IDWT#1629
rouault merged 1 commit into
uclouvain:masterfrom
nico:neon-irreversible

Conversation

@nico

@nico nico commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

Takes bin/bench_dwt -I from 0.865 s to 0.672 s on my system.


"My system" being a Apple M4 Max MacBook Pro.

This code is hot when rendering https://archive.org/details/disquisitionesa00gaus in pdfium. This speeds up pdfium_test --pages=1-50 ~/Downloads/disquisitionesa00gaus.pdf by ca 10%.

This is similar to the existing SSE fast path.

@nico nico mentioned this pull request Apr 12, 2026
@nico

nico commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

I ran openjpeg's test suite with this, and the same 50 files fail with and without this PR. I also ran pdfium's test suite with this and everything still passes with it.

@rouault

rouault commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

this breaks the build: see https://my.cdash.org/builds/3496357/build

@nico

nico commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for triggering CI!

Is there a way to see the full build command, with all flags?

It builds fine on my system, using cmake 4.2.1 and Xcode 26.2's clang. Here's the compile command that runs on my machine for dwt.c:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -DMUTEX_pthread -I/Users/thakis/src/openjpeg/build/src/lib/openjp2 -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -mmacosx-version-min=15.1 -MD -MT src/lib/openjp2/CMakeFiles/openjp2_static.dir/dwt.c.o -MF src/lib/openjp2/CMakeFiles/openjp2_static.dir/dwt.c.o.d -o src/lib/openjp2/CMakeFiles/openjp2_static.dir/dwt.c.o -c /Users/thakis/src/openjpeg/src/lib/openjp2/dwt.c

If I manually add -Wdeclaration-after-statement, I do see the warning shown on https://my.cdash.org/builds/3496357/build (but as a warning, not an error). I'll fix it. It'd be good to learn how to build locally the same way that CI does :)

@rouault

rouault commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

Is there a way to see the full build command, with all flags?

see

set( CCFLAGS_WARNING "-Wall -Wextra -Wconversion -Wno-unused-parameter -Wdeclaration-after-statement -Werror=declaration-after-statement")

@nico

nico commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

Thanks! Any reason this isn't set in the cmake files themselves, so that people get that behavior locally automatically? :)

I have fixed the warnings locally. Do you squash or merge? I.e. should I amend my existing commit and push -f, or do you prefer a separate commit for the follow-up?

@rouault

rouault commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

Thanks! Any reason this isn't set in the cmake files themselves, so that people get that behavior locally automatically?

I don't know. Wasn't involved yet in the project when this was set up. Openjpeg is a very slowly moving project nowadays

I have fixed the warnings locally. Do you squash or merge? I.e. should I amend my existing commit and push -f, or do you prefer a separate commit for the follow-up?

you can amend your commit and force push

@nico

nico commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

(Also, if I set CMAKE_C_FLAGS to -Wall -Wextra -Wconversion -Wno-unused-parameter -Wdeclaration-after-statement -Werror=declaration-after-statement as the CI config, I get the following existing warnings locally:

/Users/thakis/src/openjpeg/src/lib/openjp2/mqc.c:373:20: warning: unused function 'opj_mqc_renorme' [-Wunused-function]
  373 | static INLINE void opj_mqc_renorme(opj_mqc_t *mqc)
      |                    ^~~~~~~~~~~~~~~
1 warning generated.
[22/110] Building C object src/lib/openjp2/CMakeFiles/openjp2.dir/pi.c.o
/Users/thakis/src/openjpeg/src/lib/openjp2/pi.c:1415:23: warning: variable 'l_tccp' set but not used [-Wunused-but-set-variable]
 1415 |     const opj_tccp_t *l_tccp = 00;
      |                       ^
/Users/thakis/src/openjpeg/src/lib/openjp2/pi.c:1638:23: warning: variable 'l_tccp' set but not used [-Wunused-but-set-variable]
 1638 |     const opj_tccp_t *l_tccp = 00;
      |                       ^
2 warnings generated.
[23/110] Building C object src/lib/openjp2/CMakeFiles/openjp2_static.dir/mqc.c.o
/Users/thakis/src/openjpeg/src/lib/openjp2/mqc.c:373:20: warning: unused function 'opj_mqc_renorme' [-Wunused-function]
  373 | static INLINE void opj_mqc_renorme(opj_mqc_t *mqc)
      |                    ^~~~~~~~~~~~~~~
1 warning generated.
[38/110] Building C object src/lib/openjp2/CMakeFiles/openjp2_static.dir/pi.c.o
/Users/thakis/src/openjpeg/src/lib/openjp2/pi.c:1415:23: warning: variable 'l_tccp' set but not used [-Wunused-but-set-variable]
 1415 |     const opj_tccp_t *l_tccp = 00;
      |                       ^
/Users/thakis/src/openjpeg/src/lib/openjp2/pi.c:1638:23: warning: variable 'l_tccp' set but not used [-Wunused-but-set-variable]
 1638 |     const opj_tccp_t *l_tccp = 00;
      |                       ^
2 warnings generated.
[39/110] Building C object src/lib/openjp2/CMakeFiles/openjp2.dir/tcd.c.o
/Users/thakis/src/openjpeg/src/lib/openjp2/tcd.c:2474:24: warning: variable 'l_img_comp' set but not used [-Wunused-but-set-variable]
 2474 |     opj_image_comp_t * l_img_comp = 00;
      |                        ^
1 warning generated.
[52/110] Building C object src/lib/openjp2/CMakeFiles/openjp2_static.dir/tcd.c.o
/Users/thakis/src/openjpeg/src/lib/openjp2/tcd.c:2474:24: warning: variable 'l_img_comp' set but not used [-Wunused-but-set-variable]
 2474 |     opj_image_comp_t * l_img_comp = 00;
      |                        ^
1 warning generated.

Want me to make a PR fixing those, or should I leave it alone? Also, https://my.cdash.org/builds/3496357/build claims that -Werror is on, but the travis line you linked to doesn't enable -Werror except for one specific warning.)

Takes `bin/bench_dwt -I` from 0.865 s to 0.672 s on my system.
@nico nico force-pushed the neon-irreversible branch from 5e887a6 to 0fc6e24 Compare April 24, 2026 00:01
@nico

nico commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

you can amend your commit and force push

Done, thanks!

@rouault

rouault commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Want me to make a PR fixing those

(separate from this one) PR welcome

@rouault

rouault commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

what about ARMv7 ? I'm vaguely aware it handles a subset of ARMv8 NEON. We don't have CI testing for it, and I wouldn't want we discover build breakage when packagers start to package this. (in another project I've cross build testing of armhf: https://github.com/OSGeo/gdal/tree/master/.github/workflows/armhf)

@nico

nico commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

what about ARMv7 ? I'm vaguely aware it handles a subset of ARMv8 NEON. We don't have CI testing for it, and I wouldn't want we discover build breakage when packagers start to package this. (in another project I've cross build testing of armhf: https://github.com/OSGeo/gdal/tree/master/.github/workflows/armhf)

That's a good question. My upstream CL is here https://pdfium-review.googlesource.com/c/pdfium/+/144450 It does have green android_32 bots, and those use armv7a. So I think it should work.

Are there particular sysroots you'd like me to test against?

@rouault

rouault commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

So I think it should work.

ok fine, thanks

@rouault rouault merged commit 33d594d into uclouvain:master Apr 24, 2026
14 checks passed
@nico

nico commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

For completeness: I also tried with a linux sysroot, and the file builds fine there too:

% ~/src/chrome/src/third_party/llvm-build/Release+Asserts/bin/clang -c ~/src/chrome/src/hello.c --target=armv7a-linux-gnueabihf --sysroot ~/src/chrome/src/build/linux/debian_bullseye_armhf-sysroot
thakis@M4xBook-Pro build % ~/src/chrome/src/third_party/llvm-build/Release+Asserts/bin/clang -DMUTEX_pthread -I/Users/thakis/src/openjpeg/build/src/lib/openjp2 -O3 -DNDEBUG -MD -MT src/lib/openjp2/CMakeFiles/openjp2_static.dir/dwt.c.o -MF src/lib/openjp2/CMakeFiles/openjp2_static.dir/dwt.c.o.d -o src/lib/openjp2/CMakeFiles/openjp2_static.dir/dwt.c.o -c /Users/thakis/src/openjpeg/src/lib/openjp2/dwt.c -Wdeclaration-after-statement --target=armv7a-linux-gnueabihf --sysroot ~/src/chrome/src/build/linux/debian_bullseye_armhf-sysroot

@nico nico deleted the neon-irreversible branch April 24, 2026 15:12
@nico nico mentioned this pull request Apr 24, 2026
@nico

nico commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

(separate from this one) PR welcome

-> #1632

Apparently we also have a bunch of downstream patches in pdfium (see list here: https://source.chromium.org/chromium/chromium/src/+/main:third_party/pdfium/third_party/libopenjpeg/README.pdfium). Would you be interested in me trying to upstream some of those?

@rouault

rouault commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Would you be interested in me trying to upstream some of those?

I don't know what to answer to that. This code base / JPEG2000 topic is very hard to comprehend. Anything non trivial requires me time&energy that I don't have... There's effectively no other people left in the team, and I've never signed to be the sole maintainer.

@nico

nico commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

Ah, the old open-source trap :) I can relate. As can many others, but we never talk to each other. There should be an open source maintainer support group somewhere.

I saw https://felixge.de/2013/03/11/the-pull-request-hack/ a while ago. I never tried it, but it's something you could consider. (To be clear, I'm not looking for commit access myself.)

If you want, I could only send out smaller patches that add early-outs for things found by fuzzers. All that code has been shipping in Chrome and Android for years and is hopefully not completely busted, at least. But if you prefer we keep things downstream, that's fine too of course.

@rouault

rouault commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

If you want, I could only send out smaller patches that add early-outs for things found by fuzzers.

sounds good. hopefully most of them might be straightforward to assess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants