Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal : increase GPU duty-cycle during inference #9507

Closed
ggerganov opened this issue Sep 16, 2024 · 1 comment · Fixed by #9698
Closed

metal : increase GPU duty-cycle during inference #9507

ggerganov opened this issue Sep 16, 2024 · 1 comment · Fixed by #9698
Assignees
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) help wanted Extra attention is needed performance Speed related topics

Comments

@ggerganov
Copy link
Member

ggerganov commented Sep 16, 2024

Apparently there is a significant GPU downtime between Metal compute encoders within a single ggml_metal_graph_compute():

image

See #6506 for instructions how to generate the trace from the picture.

My expectation was that enqueuing the command buffers in parallel would make them execute without any downtime. The goal of this issue is to understand where this overhead comes from and if there is a way to avoid it.

Obviously, using a single command buffer will avoid all the GPU downtime, but it is much slower to construct it in a single thread. Ideally, we want to continue queuing multiple encoders, but not have the gaps in-between during execution.

@ggerganov ggerganov added help wanted Extra attention is needed performance Speed related topics Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 16, 2024
@ggerganov ggerganov moved this to Todo in ggml : roadmap Sep 16, 2024
@ggerganov ggerganov moved this from Todo to In Progress in ggml : roadmap Sep 30, 2024
@ggerganov ggerganov self-assigned this Sep 30, 2024
@ggerganov
Copy link
Member Author

My analysis show that the gaps in these plots are some sort of XCode profiler artifact and they don't really exist. I guess they merely serve as a visual separator between the command buffers.

Anyway, the changes in #9698 should resolve most of the Metal encoding overhead, which was the goal of this issue.

@ggerganov ggerganov moved this from In Progress to Done in ggml : roadmap Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) help wanted Extra attention is needed performance Speed related topics
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant