Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding llama-bench and llama-parallel commands #12106

Open
vineelabhinav opened this issue Feb 28, 2025 · 2 comments
Open

Regarding llama-bench and llama-parallel commands #12106

vineelabhinav opened this issue Feb 28, 2025 · 2 comments

Comments

@vineelabhinav
Copy link

vineelabhinav commented Feb 28, 2025

Hello @ggerganov @ngxson ,
I have queries regarding the two commands:

  1. llama-bench : How to do test which involves prompt processing(pp) + text generation at same time(tg) ? As of now this command supports only -p option and -n option which gives separate evaluation values for pp and tg but not combined. I see there is -pg option but its not working (says option not found, I dint understand correct format to give for it)
  2. llama-parallel: How parallelism is done on the batch dimension ? Assume I have input of shape [batch_size, M, N] and also there is for loop running over each dimension. Does lllama-parallel parallelizes the batch_size dimension's for loop using openmp parallel for pragma ? If its not case how it does parallelism? Can you mention file where this parallelism code is written?
@vineelabhinav
Copy link
Author

@ggerganov @ngxson any information regarding this?

@ggerganov
Copy link
Member

You can use:

llama-bench -m model.gguf -pg 512,32

You can also use llama-batched-bench

llama-batched-bench --help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants