Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Why Does Increasing the Number of CPU Cores Not Improve Performance? #23747

Open
Serenagirl opened this issue Feb 19, 2025 · 4 comments
Labels
performance issues related to performance regressions

Comments

@Serenagirl
Copy link

Describe the issue

I have 128 cpu cores,when I use onnx to inference, OMP_NUM_THREADS=1 numactl -c 1 python xxx and python xxx got same performance,why?

To reproduce

start = time.perf_counter()
output = session.run()
print(time.perf_counter()-start)

Urgency

No response

Platform

Linux

OS Version

openeuler

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19.2

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

@Serenagirl Serenagirl added the performance issues related to performance regressions label Feb 19, 2025
@tianleiwu
Copy link
Contributor

ORT does not use OMP_NUM_THREADS. Try set inter_op_num_threads instead.
See https://onnxruntime.ai/docs/performance/tune-performance/threading.html#set-number-of-intra-op-threads.

@Serenagirl
Copy link
Author

Serenagirl commented Feb 19, 2025

@tianleiwu thank you,
I also have a problem. I run a single-core single instance on a 92.8 cpu. The computing power required for the model is 654.66 GFlops. The theoretical reasoning time is 654.66/92.8 = 7s. However, the actual test takes 15s. Is the gap normal? Why?

@tianleiwu
Copy link
Contributor

@Serenagirl, usually the latency included both IO and computation. The GFlops is only contains the computation part.

@Serenagirl
Copy link
Author

@Serenagirl, usually the latency included both IO and computation. The GFlops is only contains the computation part.

ok,thankyou,I'll analyze it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

2 participants