-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len
ready
ONLY add when PR is ready to merge/full CI is needed
structured-output
#13691
opened Feb 22, 2025 by
WangErXiao
Loading…
[V1][Minor] Use FakeAttentionMetadata for dummy run
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#13689
opened Feb 22, 2025 by
WoosukKwon
Loading…
[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms
#13688
opened Feb 21, 2025 by
njhill
Loading…
[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA
ready
ONLY add when PR is ready to merge/full CI is needed
#13687
opened Feb 21, 2025 by
2015aroras
Loading…
enable users to select triton fa for MLA backend
needs-rebase
rocm
#13685
opened Feb 21, 2025 by
qli88
Loading…
[Model] GPTBigCodeForEmbedding supporting token span classification
#13684
opened Feb 21, 2025 by
michaelrglass
Loading…
[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid…
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#13672
opened Feb 21, 2025 by
WangErXiao
Loading…
[Misc] Capture and log the time of loading weights
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#13666
opened Feb 21, 2025 by
waltforme
Loading…
Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size
ready
ONLY add when PR is ready to merge/full CI is needed
#13660
opened Feb 21, 2025 by
fabianlim
Loading…
[model][refactor] remove cuda hard code in models and layers
speculative-decoding
#13658
opened Feb 21, 2025 by
MengqingCao
Loading…
[ROCM] fix native attention function call
ready
ONLY add when PR is ready to merge/full CI is needed
#13650
opened Feb 21, 2025 by
gongdao123
Loading…
docs: Add a note on full CI run in contributing guide
documentation
Improvements or additions to documentation
#13646
opened Feb 21, 2025 by
terrytangyuan
Loading…
[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict
speculative-decoding
#13626
opened Feb 20, 2025 by
benchislett
Loading…
[Bugfix] Flush TunableOp results before worker processes are destroyed.
rocm
#13623
opened Feb 20, 2025 by
naromero77amd
Loading…
[Frontend] [Minor] Fix tqdm progress bar for n > 1
frontend
#13621
opened Feb 20, 2025 by
franzscherr
Loading…
[Misc] Bump compressed-tensors
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#13619
opened Feb 20, 2025 by
dsikka
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.