Skip to content

[WIP]Add Func: npugraph_batch_size auto-adjust to different model #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chris668899
Copy link

What this PR does / why we need it?

This PR add new function of : npugraph_batch_size can dynamic adjust to different model; before this PR, the npugraph_batch_sizes given from vllm to vllm-ascend always too large, and that may result in ERROR while running on different, with the information: "The resources are insufficient".
Now, with this PR, the code can dynamic adjust npugraph_batch_sizes depend on the model hidden_layer_nums and parallel config, for example:
a. for Qwen2.5-7B, the npugraph_batch_size length is 33 total;
b. for Qwen2.5-72B, the npugraph_batch_size length is 11 total;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant