-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: Cohere2ForCausalLM support (Command-A, Command-R7B)
#3128
opened Mar 27, 2025 by
aikitoria
Loading…
feat: FP8 Rowwise quantization support for Cohere models
#3127
opened Mar 27, 2025 by
aikitoria
Loading…
[draft]chore: update libcutlass library with FP4 quantize linear layout change
#3126
opened Mar 27, 2025 by
nv-guomingz
Loading…
chore: Stabilize ABI boundary for internal kernel library
#3117
opened Mar 27, 2025 by
tongyuantongyu
•
Draft
fix: Early exit cmake if find_library() does not find any lib
#3113
opened Mar 26, 2025 by
WilliamTambellini
Loading…
feat: Optionally split MoE inputs into chunks to reduce GPU memory usage
#3104
opened Mar 26, 2025 by
jinyangyuan-nvidia
•
Draft
refactor: Simplify disableLookahead and improve numDecodingEngineTokens handling
#3103
opened Mar 26, 2025 by
Funatiq
Loading…
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache
#3095
opened Mar 26, 2025 by
Tabrizian
Loading…
perf: Add optimizations for deepseek in min latency mode
#3093
opened Mar 26, 2025 by
zongfeijing
Loading…
feat: Run PyExecutor's inference flow to estimate max_num_tokens for kv_cache_manager
#3092
opened Mar 26, 2025 by
HuiGao-NV
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.