[NVBUG: 5617733] Update LLM generate API for modelopt LLM eval #498

cjluo-nv · 2025-11-04T07:01:49Z

What does this PR do?

Type of change: ? Bug fix

Overview: ?

Remove kv_cache_config in the generate API. It's no longer used in the code as well. We just estimate KV cache usage from other parameters
Add max_seq_len in the generate API to better estimate the real KV cache usage.
Assume default lm_eval max input sequence length to be 4096

Signed-off-by: Chenjie Luo <[email protected]>

codecov · 2025-11-04T07:14:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.43%. Comparing base (009bd1a) to head (997a111).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #498   +/-   ##
=======================================
  Coverage   73.43%   73.43%           
=======================================
  Files         180      180           
  Lines       18149    18149           
=======================================
  Hits        13328    13328           
  Misses       4821     4821

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…o chenjiel/update_generate

## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? 1) Remove kv_cache_config in the generate API. It's no longer used in the code as well. We just estimate KV cache usage from other parameters 2) Add max_seq_len in the generate API to better estimate the real KV cache usage. 3) Assume default lm_eval max input sequence length to be 4096 Signed-off-by: Chenjie Luo <[email protected]>

## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? 1) Remove kv_cache_config in the generate API. It's no longer used in the code as well. We just estimate KV cache usage from other parameters 2) Add max_seq_len in the generate API to better estimate the real KV cache usage. 3) Assume default lm_eval max input sequence length to be 4096 Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: mxin <[email protected]>

Update LLM generate API for modelopt LLM eval

152aa8f

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested review from a team as code owners November 4, 2025 07:01

cjluo-nv requested review from kevalmorabia97, meenchen and sugunav14 November 4, 2025 07:01

cjluo-nv changed the title ~~Update LLM generate API for modelopt LLM eval~~ [NVBUG: 5617733] Update LLM generate API for modelopt LLM eval Nov 4, 2025

Merge branch 'main' of github.com:NVIDIA/TensorRT-Model-Optimizer int…

997a111

…o chenjiel/update_generate

meenchen approved these changes Nov 4, 2025

View reviewed changes

cjluo-nv merged commit ce8ce22 into main Nov 4, 2025
26 checks passed

cjluo-nv deleted the chenjiel/update_generate branch November 4, 2025 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVBUG: 5617733] Update LLM generate API for modelopt LLM eval #498

[NVBUG: 5617733] Update LLM generate API for modelopt LLM eval #498

Uh oh!

cjluo-nv commented Nov 4, 2025

Uh oh!

codecov bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NVBUG: 5617733] Update LLM generate API for modelopt LLM eval #498

[NVBUG: 5617733] Update LLM generate API for modelopt LLM eval #498

Uh oh!

Conversation

cjluo-nv commented Nov 4, 2025

What does this PR do?

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 4, 2025 •

edited

Loading