-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whisper 支持in-flight batch 了吗 #710
Comments
看起来确实是还没有实现,@yuekaizhang 有计划实现吗 |
现在的代码是支持的哦,默认是 python bindings 版本的 infligh batch, cpp 版本的 infligh batch 在这里 https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/whisper.md |
我测了下python bindings 版本的,好像遇到了eot_id 也没有返回,还在decoding,这个是正常的吗,感觉如果是正常的这个 infligh batch 好像就没有意义了 @yuekaizhang |
嗯,sherpa/triton/whisper 现在的实现不会立即返回,但是这个 request 不会再参与以后 batch 的计算了。对吞吐提升还是有帮助的。 https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/whisper/whisper_bls/1/model.py#L219, 可以试试这个版本,想要计算完毕以后立即返回,需要用到 triton decouple mode 这个 feature |
不会再参与以后 batch 的计算了是指iteration-level batching 过程吗, 比如生成的tokenlist[1,2,3, eot_id, eot_id, eot_id, eot_id],第二个eot_id 不是通过推理得到的,而是直接补冲的对吗 @yuekaizhang |
看代码好像不支持,但是好像有一个in-flight 的commit
The text was updated successfully, but these errors were encountered: