-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove excessive floating-point divides #4312
base: main
Are you sure you want to change the base?
Conversation
Loft the loop-invariant divide outside the hot loop, and/or invert the variable to turn FDIV into FMUL.
Do you have timing values from your tests? |
Co-authored-by: Stefan Weil <[email protected]>
It will be CPU specific, but I see +10% on my Ampere Altra. |
That's a very significant improvement! I wonder how this ARM64 cpu compares to Intel / AMD cpus with Tesseract recognition and training. |
If there are standard tests that you run, please do share the results. I was using |
Does Ampere Altra offer additional opcodes which could be used to make Tesseract's neural network code faster? We currently use Neon code for ARM64 (see src/arch/*neon.cpp). |
You can run Here are my results on a Mac mini M2 for running
|
Loft the loop-invariant divide outside the hot loops, and/or invert the variable to turn FDIV into FMUL.
Most CPUs are slower at FP division compared to FP multiplication. This should provide some uplift in performance. I was testing with the integer models.