Skip to content

Conversation

@IzzyPutterman
Copy link
Contributor

@IzzyPutterman IzzyPutterman commented Oct 28, 2025

What does this PR do?

Type of change: New feature

Overview: Specdec bench example

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Yes

Additional Information

@IzzyPutterman IzzyPutterman requested a review from a team as a code owner October 28, 2025 05:52
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.52%. Comparing base (8cf516e) to head (17f968e).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #474      +/-   ##
==========================================
+ Coverage   73.46%   73.52%   +0.06%     
==========================================
  Files         180      181       +1     
  Lines       18161    18207      +46     
==========================================
+ Hits        13342    13387      +45     
- Misses       4819     4820       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Collaborator

Need to mention this new example in https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst under 0.40 release section (new one)

@kevalmorabia97
Copy link
Collaborator

Missing requirements.txt and README.md in examples/specdec_bench

@IzzyPutterman IzzyPutterman force-pushed the iputterman/specdec-bench branch from 7cbcc0c to 8402f45 Compare November 5, 2025 02:24
@IzzyPutterman IzzyPutterman requested a review from a team as a code owner November 5, 2025 02:24
@IzzyPutterman IzzyPutterman force-pushed the iputterman/specdec-bench branch from 8402f45 to 4111724 Compare November 5, 2025 02:32
@IzzyPutterman
Copy link
Contributor Author

Missing requirements.txt and README.md in examples/specdec_bench

I added a section in the readme for how to install. Creating a requirements would get nasty as it techincally supports vLLM, SGLang, and TRTLLM. Simpler to say run in an env which already has one of these installed.

self.out["TTFT Time"] = compute_statistics(ttft_time)
if tpot_time:
self.out["Generation Step Time"] = compute_statistics(tpot_time)
self.out["Generation Tokens Per Second"] = compute_statistics(gen_tp_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions: Is there a way to get per_user_tps and per_gpu_tps using specbench?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Generation TPS is per_user_tps technically. I can rename to add a Request prefix to these.

Copy link
Contributor

@h-guo18 h-guo18 Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Is per_gpu_tps feasible to implement in specbench? It would help to plot figures like this, where we can understand the serving performance directly:
image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@h-guo18 h-guo18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level question: TRTLLM-bench seems to have the same metrics available. Vllm also have a similar vllm bench: link.

Is there any difference between the result from spechbench and trtllm-bench, or shall we reuse existing functionalities from trtllm-bench?

@IzzyPutterman
Copy link
Contributor Author

A high level question: TRTLLM-bench seems to have the same metrics available. Vllm also have a similar vllm bench: link.

Is there any difference between the result from spechbench and trtllm-bench, or shall we reuse existing functionalities from trtllm-bench?

Great question! Both of those benchmarks try to do the same thing, however they are not unified. With this one you can be guaranteed to send the exact same tokens to the engine in all cases, and have it respect the same chat template and tokenizer. trtllm-bench tends to ignore_eos (which is bad for specdec) and requires the input to already be tokenized (a big source of error depending on how you actually do it).
Also this provides easier ways to get more advanced metrics like how AR/AL changes over the time of a request.

@IzzyPutterman IzzyPutterman force-pushed the iputterman/specdec-bench branch 3 times, most recently from 172f919 to 0d55bb0 Compare November 6, 2025 03:00
# limitations under the License.

try:
import tiktoken
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add non-deployment related requirements.txt for dependencies not part of modelopt (setup.py) like tiktoken and maybe others

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine since all the 3 base docker images already have everything installed

@kevalmorabia97 kevalmorabia97 changed the title Draft: Specdec Bench: Initial Add Spec dec Bench example Nov 6, 2025
Signed-off-by: Izzy Putterman <[email protected]>
@IzzyPutterman IzzyPutterman force-pushed the iputterman/specdec-bench branch from 0d55bb0 to 81c6e15 Compare November 6, 2025 20:44
@IzzyPutterman IzzyPutterman requested a review from a team as a code owner November 6, 2025 20:44
@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) November 6, 2025 20:54
@kevalmorabia97 kevalmorabia97 merged commit 5adb9ba into main Nov 6, 2025
26 checks passed
@kevalmorabia97 kevalmorabia97 deleted the iputterman/specdec-bench branch November 6, 2025 21:21
mxinO pushed a commit that referenced this pull request Nov 11, 2025
## What does this PR do?

**Type of change:** New feature <!-- Use one of the following: Bug fix,
new feature, new example, new tests, documentation. -->

**Overview:** Specdec bench example

## Usage
<!-- You can potentially add a usage example below. -->

```python
# Add a code snippet demonstrating how to use this
```

## Testing
<!-- Mention how have you tested your change if applicable. -->

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->

## Additional Information
<!-- E.g. related issue. -->

---------

Signed-off-by: Izzy Putterman <[email protected]>
Signed-off-by: Keval Morabia <[email protected]>
Co-authored-by: Keval Morabia <[email protected]>
Signed-off-by: mxin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants