-
Notifications
You must be signed in to change notification settings - Fork 195
Add Spec dec Bench example #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #474 +/- ##
==========================================
+ Coverage 73.46% 73.52% +0.06%
==========================================
Files 180 181 +1
Lines 18161 18207 +46
==========================================
+ Hits 13342 13387 +45
- Misses 4819 4820 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
de9df92 to
7cbcc0c
Compare
|
Need to mention this new example in https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst under 0.40 release section (new one) |
|
Missing requirements.txt and README.md in |
7cbcc0c to
8402f45
Compare
8402f45 to
4111724
Compare
I added a section in the readme for how to install. Creating a requirements would get nasty as it techincally supports vLLM, SGLang, and TRTLLM. Simpler to say run in an env which already has one of these installed. |
| self.out["TTFT Time"] = compute_statistics(ttft_time) | ||
| if tpot_time: | ||
| self.out["Generation Step Time"] = compute_statistics(tpot_time) | ||
| self.out["Generation Tokens Per Second"] = compute_statistics(gen_tp_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions: Is there a way to get per_user_tps and per_gpu_tps using specbench?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Generation TPS is per_user_tps technically. I can rename to add a Request prefix to these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high level question: TRTLLM-bench seems to have the same metrics available. Vllm also have a similar vllm bench: link.
Is there any difference between the result from spechbench and trtllm-bench, or shall we reuse existing functionalities from trtllm-bench?
Great question! Both of those benchmarks try to do the same thing, however they are not unified. With this one you can be guaranteed to send the exact same tokens to the engine in all cases, and have it respect the same chat template and tokenizer. trtllm-bench tends to ignore_eos (which is bad for specdec) and requires the input to already be tokenized (a big source of error depending on how you actually do it). |
172f919 to
0d55bb0
Compare
| # limitations under the License. | ||
|
|
||
| try: | ||
| import tiktoken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add non-deployment related requirements.txt for dependencies not part of modelopt (setup.py) like tiktoken and maybe others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine since all the 3 base docker images already have everything installed
Signed-off-by: Izzy Putterman <[email protected]>
0d55bb0 to
81c6e15
Compare
Signed-off-by: Keval Morabia <[email protected]>
## What does this PR do? **Type of change:** New feature <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** Specdec bench example ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes <!--- If No, explain why. --> - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> --------- Signed-off-by: Izzy Putterman <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: mxin <[email protected]>


What does this PR do?
Type of change: New feature
Overview: Specdec bench example
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information