Closed
Description
Hello @ggerganov @ngxson ,
I have queries regarding the two commands:
- llama-bench : How to do test which involves prompt processing(pp) + text generation at same time(tg) ? As of now this command supports only -p option and -n option which gives separate evaluation values for pp and tg but not combined. I see there is -pg option but its not working (says option not found, I dint understand correct format to give for it)
- llama-parallel: How parallelism is done on the batch dimension ? Assume I have input of shape [batch_size, M, N] and also there is for loop running over each dimension. Does lllama-parallel parallelizes the batch_size dimension's for loop using openmp parallel for pragma ? If its not case how it does parallelism? Can you mention file where this parallelism code is written?
Metadata
Metadata
Assignees
Labels
No labels