Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper 支持in-flight batch 了吗 #710

Open
wangjxxxhi12 opened this issue Feb 13, 2025 · 5 comments
Open

whisper 支持in-flight batch 了吗 #710

wangjxxxhi12 opened this issue Feb 13, 2025 · 5 comments

Comments

@wangjxxxhi12
Copy link

看代码好像不支持,但是好像有一个in-flight 的commit

@wangjxxxhi12
Copy link
Author

看起来确实是还没有实现,@yuekaizhang 有计划实现吗

@yuekaizhang
Copy link
Collaborator

现在的代码是支持的哦,默认是 python bindings 版本的 infligh batch, cpp 版本的 infligh batch 在这里 https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/whisper.md

@wangjxxxhi12
Copy link
Author

我测了下python bindings 版本的,好像遇到了eot_id 也没有返回,还在decoding,这个是正常的吗,感觉如果是正常的这个 infligh batch 好像就没有意义了 @yuekaizhang

@yuekaizhang
Copy link
Collaborator

我测了下python bindings 版本的,好像遇到了eot_id 也没有返回,还在decoding,这个是正常的吗,感觉如果是正常的这个 infligh batch 好像就没有意义了 @yuekaizhang

嗯,sherpa/triton/whisper 现在的实现不会立即返回,但是这个 request 不会再参与以后 batch 的计算了。对吞吐提升还是有帮助的。

https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/whisper/whisper_bls/1/model.py#L219, 可以试试这个版本,想要计算完毕以后立即返回,需要用到 triton decouple mode 这个 feature

@wangjxxxhi12
Copy link
Author

wangjxxxhi12 commented Feb 17, 2025

嗯,sherpa/triton/whisper 现在的实现不会立即返回,但是这个 request 不会再参与以后 batch 的计算了。对吞吐提升还是有帮助的。

不会再参与以后 batch 的计算了是指iteration-level batching 过程吗, 比如生成的tokenlist[1,2,3, eot_id, eot_id, eot_id, eot_id],第二个eot_id 不是通过推理得到的,而是直接补冲的对吗 @yuekaizhang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants