Remove excessive floating-point divides #4312

heshpdx · 2024-09-03T04:05:05Z

Loft the loop-invariant divide outside the hot loops, and/or invert the variable to turn FDIV into FMUL.

Most CPUs are slower at FP division compared to FP multiplication. This should provide some uplift in performance. I was testing with the integer models.

Loft the loop-invariant divide outside the hot loop, and/or invert the variable to turn FDIV into FMUL.

src/textord/pithsync.cpp

src/lstm/networkio.cpp

stweil · 2024-09-03T05:01:33Z

Do you have timing values from your tests?

Co-authored-by: Stefan Weil <[email protected]>

heshpdx · 2024-09-03T05:16:35Z

Do you have timing values from your tests?

It will be CPU specific, but I see +10% on my Ampere Altra.

stweil · 2024-09-03T05:18:10Z

It will be CPU specific, but I see +10% on my Ampere Altra.

That's a very significant improvement! I wonder how this ARM64 cpu compares to Intel / AMD cpus with Tesseract recognition and training.

heshpdx · 2024-09-03T05:21:05Z

If there are standard tests that you run, please do share the results.

I was using -l deu --tessdata-dir ./tessdata_orig --oem 0 and -l Arabic --tessdata-dir ./tessdata_fast. The -l rus --tessdata-dir ./tessdata_orig --oem 2 did not show much improvement.

stweil · 2024-09-03T13:17:06Z

Does Ampere Altra offer additional opcodes which could be used to make Tesseract's neural network code faster? We currently use Neon code for ARM64 (see src/arch/*neon.cpp).

stweil · 2024-09-03T13:54:48Z

If there are standard tests that you run, please do share the results.

You can run make check (after installing required packages, repositories and git submodules) and compare the times for the single tests. lstm_test is the test with the longest execution time.

Here are my results on a Mac mini M2 for running time ./lstm_test:

# git main branch, extract from lstm_test.log and log message from `time`.
[==========] 11 tests from 1 test suite ran. (278833 ms total)
./lstm_test  274,78s user 2,87s system 99% cpu 4:38,88 total

# git main branch with PR applied, extract from lstm_test.log and log message from `time`.
[==========] 11 tests from 1 test suite ran. (276981 ms total)
./lstm_test  273,60s user 2,50s system 99% cpu 4:37,03 total

Remove excessive floating-point divides

41faf69

Loft the loop-invariant divide outside the hot loop, and/or invert the variable to turn FDIV into FMUL.

stweil requested changes Sep 3, 2024

View reviewed changes

heshpdx and others added 2 commits September 2, 2024 22:05

Apply suggestions from code review

7ab67f9

Co-authored-by: Stefan Weil <[email protected]>

Fix function definition and comment to match cpp file

ff0a38d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove excessive floating-point divides #4312

Remove excessive floating-point divides #4312

heshpdx commented Sep 3, 2024

stweil commented Sep 3, 2024

heshpdx commented Sep 3, 2024

stweil commented Sep 3, 2024 •

edited

Loading

heshpdx commented Sep 3, 2024

stweil commented Sep 3, 2024

stweil commented Sep 3, 2024 •

edited

Loading

Remove excessive floating-point divides #4312

Are you sure you want to change the base?

Remove excessive floating-point divides #4312

Conversation

heshpdx commented Sep 3, 2024

stweil commented Sep 3, 2024

heshpdx commented Sep 3, 2024

stweil commented Sep 3, 2024 • edited Loading

heshpdx commented Sep 3, 2024

stweil commented Sep 3, 2024

stweil commented Sep 3, 2024 • edited Loading

stweil commented Sep 3, 2024 •

edited

Loading

stweil commented Sep 3, 2024 •

edited

Loading