Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
93a41eb
Cross-wave splitk skinny gemm optimization
amd-hhashemi Nov 26, 2025
4d962d3
correction to enable cases
amd-hhashemi Nov 26, 2025
9e90868
remove debug overhead
amd-hhashemi Nov 26, 2025
bf31236
Adjustments to B[] staging and add M=64 tests
amd-hhashemi Nov 27, 2025
35a8c5a
optimize bias addition
amd-hhashemi Nov 27, 2025
d896a91
perf adjustments
amd-hhashemi Nov 28, 2025
1ad4865
Make race-hazard free by zero-initing, for minor penalty.
amd-hhashemi Nov 29, 2025
47dba69
Drop next B[] fetch of at end of k-shard.
amd-hhashemi Nov 29, 2025
20c1f4c
fix paramtization of GrpsShrB
amd-hhashemi Nov 29, 2025
8a14af8
cleanup
amd-hhashemi Dec 1, 2025
8e6b283
cleanup2
amd-hhashemi Dec 1, 2025
1f56a41
cleanup3
amd-hhashemi Dec 1, 2025
6a73423
lint fixes
amd-hhashemi Dec 1, 2025
584dd8b
Fix min() double upcast due to mix signs, without asm.
amd-hhashemi Dec 1, 2025
acdab3f
[TPU] add tpu_inference (#27277)
jcyang43 Nov 26, 2025
83c1648
[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878)
HDCharles Nov 27, 2025
869370c
[Attention][Async] Eliminate `seq_lens_cpu` in FlashAttention metadat…
MatthewBonanni Nov 27, 2025
c8c3a1e
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel (#28…
jinzhen-lin Nov 27, 2025
f97b092
add xpu supported model and model id for cpu (#29380)
louie-tsai Nov 27, 2025
e6f56f5
[Model Runner V2] Minor code cleanup (#29570)
WoosukKwon Nov 27, 2025
41120f8
[Model Runner V2] Minor cleanup for build_attn_metadata (#29576)
WoosukKwon Nov 27, 2025
91a3822
[DOC] Add vLLM Bangkok Meetup info (#29561)
tjtanaa Nov 27, 2025
bdd7d24
[cpu][fix] Fix Arm CI tests (#29552)
fadara01 Nov 27, 2025
0564ba2
[Model Runner V2] Refactor CudaGraphManager (#29583)
WoosukKwon Nov 27, 2025
02691a9
[Bugfix] Fix getting device for MoE LoRA (#29475)
jeejeelee Nov 27, 2025
34eae21
Fix tpu-inference platform path (#29554)
jcyang43 Nov 27, 2025
c66bce0
[ROCm][CI] Fix test_cpu_offloading for ROCm (#29548)
micah-wil Nov 27, 2025
9dab536
[Model Runner V2] Implement multi-step Eagle with CUDA graph (#29559)
WoosukKwon Nov 27, 2025
7da53ba
[Bugfix] Update Ultravox compatibility (#29588)
DarkLight1337 Nov 27, 2025
6ee4f6c
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up…
morrison-turnansky Nov 27, 2025
c7f20db
[Docs] Improve `priority` parameter documentation (#29572)
maang-h Nov 27, 2025
182047b
[Bugfix] Fix pre-commit (#29601)
DarkLight1337 Nov 27, 2025
3ad2f7c
[CI] Auto label CPU related issues (#29602)
bigPYJ1151 Nov 27, 2025
b535b80
[Bugfix] Fix HunyuanVL XD-RoPE (#29593)
ywang96 Nov 27, 2025
eb629d5
[LoRA] Continue optimizing MoE LoRA weight loading (#29322)
jeejeelee Nov 27, 2025
b4e519b
[CI/Build][Bugfix] Fix auto label issues for CPU (#29610)
bigPYJ1151 Nov 27, 2025
5c29515
[CI/Build] Skip ray tests on ROCm (#29556)
rjrock Nov 27, 2025
59ae3b0
[Doc]: fixing typos in diverse files (#29492)
didier-durand Nov 27, 2025
951d14b
[bugfix] avoid NIXL_ERR_REMOTE_DISCONNECT in nixl_connector when Pref…
hasB4K Nov 27, 2025
ae6dcad
[Attention] Update attention imports (#29540)
MatthewBonanni Nov 27, 2025
ffd4c7a
Update Transformers pin in CI to 4.57.3 (#29418)
hmellor Nov 27, 2025
c819f62
[BugFix] Optional tokenizer argument when loading GGUF models (#29582)
sts07142 Nov 27, 2025
d19fe7e
[Bugfix] Fix doc build on main (#29619)
DarkLight1337 Nov 27, 2025
d2e1f21
add skip_reading_prefix_cache in repr for PoolingParams (#29620)
guodongxiaren Nov 27, 2025
e24d35b
[Misc] Remove unused code from `protocol.py` (#29616)
DarkLight1337 Nov 27, 2025
c571ec5
[Deprecation] Advance deprecation status (#29617)
DarkLight1337 Nov 27, 2025
8720a2f
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing G…
Isotr0py Nov 27, 2025
50bd795
[CI] Add batched audios Whisper test (#29308)
NickLucche Nov 27, 2025
04dcf94
[BugFix] Fix `plan` API Mismatch when using latest FlashInfer (#29426)
askliar Nov 27, 2025
afdb311
[Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutpu…
WoosukKwon Nov 27, 2025
6bb2377
[BugFix] Fix new nightly failures (#29578)
LucasWilkinson Nov 27, 2025
c10c19c
[CPU]Update CPU PyTorch to 2.9.0 (#29589)
scydas Nov 28, 2025
15eb103
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)
xyang16 Nov 28, 2025
df22623
[Docs] Update supported models for Olmo 3 in tool calling documentati…
wilsonwu Nov 28, 2025
1722828
[BugFix] Fix ValueError in NewRequestData repr methods (#29392)
maang-h Nov 28, 2025
a13b52a
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qw…
EanWang211123 Nov 28, 2025
1347c35
Improve enable chunked_prefill & prefix_caching logic. (#26623)
noooop Nov 28, 2025
4c5eb9f
Revert "[CPU]Update CPU PyTorch to 2.9.0 (#29589)" (#29647)
DarkLight1337 Nov 28, 2025
0e62be8
[Feature][Bench] Add pareto visualization (#29477)
lengrongfu Nov 28, 2025
1d9f1fd
[Doc] Improve abnormal information string (#29655)
maang-h Nov 28, 2025
4b625b7
[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607)
juliendenize Nov 28, 2025
339ae53
Fix parameter order in GPT-OSS weight loading function for non-MXFP4 …
qGentry Nov 28, 2025
2128e4d
[Doc] Reorganize benchmark docs (#29658)
DarkLight1337 Nov 28, 2025
68a7544
[Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disab…
zhyajie Nov 28, 2025
699633a
[Docs] Add SPLADE and Ultravox models to supported models documentati…
wilsonwu Nov 28, 2025
c72f07e
[Misc] Remove redundant attention var constants (#29650)
DarkLight1337 Nov 28, 2025
35b3103
[mypy] Pass type checking for `vllm/utils` and `vllm/v1/pool` (#29666)
DarkLight1337 Nov 28, 2025
48c03b3
[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542)
njhill Nov 28, 2025
d7f5b75
[Optimization] Early return for `_apply_matches` and `_iter_placehold…
DarkLight1337 Nov 28, 2025
e4e5550
Revert "Supress verbose logs from model_hosting_container_standards (…
HappyAmazonian Nov 28, 2025
e270503
[CPU] Update torch 2.9.1 for CPU backend (#29664)
bigPYJ1151 Nov 28, 2025
a794ab7
Remove upstream fa checks (#29471)
Victor49152 Nov 28, 2025
04e2436
[Misc] Remove `yapf` directives (#29675)
DarkLight1337 Nov 28, 2025
ccc460d
Guard FlashInfer sampler using the same check as FlashInfer attention…
hmellor Nov 28, 2025
672a5ec
[mypy] Enable type checking for more directories (#29674)
DarkLight1337 Nov 28, 2025
a39c71e
add add_truncate_prompt_tokens in repr for PoolingParams (#29683)
guodongxiaren Nov 28, 2025
4a4403c
[Doc]: fixing typos in multiple files. (#29685)
didier-durand Nov 28, 2025
7ecdbdc
[V0 deprecation] Clean up legacy paged attention helper functions (#2…
Isotr0py Nov 28, 2025
ed63b9c
[Chore]: Reorganize model repo operating functions in `transformers_u…
Isotr0py Nov 28, 2025
41dc1ff
[Docs] Add CLI reference doc for `vllm bench sweep plot_pareto` (#29689)
hmellor Nov 28, 2025
88a6d68
[CI/Build] Rework CPU multimodal processor test (#29684)
Isotr0py Nov 28, 2025
4fec072
[Chore] Rename `Processor` to `InputProcessor` (#29682)
DarkLight1337 Nov 28, 2025
62a75e8
Remove `all_special_tokens_extended` from tokenizer code (#29686)
hmellor Nov 28, 2025
0fc6f9e
[Frontend] Remap -O to -cc commandline flag (#29557)
gmagogsfm Nov 28, 2025
a93c0a5
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597)
benchislett Nov 28, 2025
8ec465f
[CI/Build]: make it possible to build with a free-threaded interprete…
rgommers Nov 28, 2025
5c20729
[Misc] Remove redundant `ClassRegistry` (#29681)
DarkLight1337 Nov 28, 2025
eda5692
[Bugfix] fix dots.llm1.inst (#29687)
ZJY0516 Nov 28, 2025
6d8bb73
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)…
hl475 Nov 28, 2025
94e570a
bugfix: correct attn output with base 2 or e (#28840)
staugust Nov 28, 2025
ce9d611
[compile] Include `enable_sleep_mode` into caching factors. (#29696)
zhxchen17 Nov 28, 2025
6bceba6
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
mertunsall Nov 29, 2025
8e84d41
[ROCm][Bugfix] Patch for the `Multi-Modal Processor Test` group (#29702)
AndreasKaratzas Nov 29, 2025
06744a8
[Model Runner V2] Support penalties using bin counts (#29703)
WoosukKwon Nov 29, 2025
0657dc7
[Bugfix] Fix wrong mock attribute (#29704)
DarkLight1337 Nov 29, 2025
d9866ed
[Frontend] Perform offline path replacement to `tokenizer` (#29706)
a4lg Nov 29, 2025
730f5c4
[Model Runner V2] Refactor prefill token preparation (#29712)
WoosukKwon Nov 29, 2025
aba348f
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not ite…
LucasWilkinson Nov 29, 2025
635c242
Add gpu memory wait before test_async_tp (#28893)
angelayi Nov 29, 2025
7c2aac0
[Model Runner V2] Don't use UVA buffer for prefill_len (#29713)
WoosukKwon Nov 29, 2025
36bb438
[LoRA] Cleanup LoRA unused code (#29611)
jeejeelee Nov 29, 2025
824b6ed
[Model Runner V2] Add sample/ directory and reorganize files (#29719)
WoosukKwon Nov 29, 2025
77f184f
[Doc]: fixing typos in various files. (#29717)
didier-durand Nov 29, 2025
a3ed133
[Model Runner V2] Fuse penalties and temperature into single kernel (…
WoosukKwon Nov 29, 2025
d8d85bb
[Misc] Refactor tokenizer interface (#29693)
DarkLight1337 Nov 29, 2025
a9febb3
[Doc]: fix code block rendering (#29728)
dublc Nov 29, 2025
529fbc4
hfrunner.classify should return list[list[float]] not list[str] (#29671)
nwaughachukwuma Nov 29, 2025
ee44787
[Chore] Enable passing `tokenizer=None` into MM processor (#29724)
DarkLight1337 Nov 29, 2025
a49be91
[Chore] Move `detokenizer_utils` to `vllm/tokenizers` (#29727)
DarkLight1337 Nov 29, 2025
339801a
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
jinzhen-lin Nov 29, 2025
c917590
[Bugfix] Revert test_tokenization.py (#29729)
jeejeelee Nov 29, 2025
0258d3e
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
xyang16 Nov 30, 2025
c54e0d9
[Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732)
Isotr0py Nov 30, 2025
ed31a75
Fix AttributeError about _use_fi_prefill (#29734)
hl475 Nov 30, 2025
d7f2968
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) …
Flink-ddd Nov 30, 2025
8010d52
[Doc]: Fix typo in fused_moe layer (#29731)
BowTen Nov 30, 2025
7ca1757
[Misc] Update `TokenizerLike` interface and move `get_cached_tokenize…
DarkLight1337 Nov 30, 2025
e2dfa9c
[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Isotr0py Nov 30, 2025
5590253
[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741)
DarkLight1337 Nov 30, 2025
1e56f61
[ROCm][Attention] Sliding window support for `AiterFlashAttentionBack…
ganyi1996ppo Nov 30, 2025
c6ecf95
Fix RoPE failures in Transformers nightly (#29700)
hmellor Nov 30, 2025
2d27859
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004)
omera-nv Nov 30, 2025
7a940a5
[Misc]Remove redundant hidden_size property in ModelConfig (#29749)
charlotte12l Nov 30, 2025
2f1fa67
[Model Runner V2] Use packed mask for prompt bin counts (#29756)
WoosukKwon Nov 30, 2025
81f8af5
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141)
wenscarl Dec 1, 2025
5708618
Make PyTorch profiler gzip and CUDA time dump configurable (#29568)
zhangruoxu Dec 1, 2025
1e2c183
[CI] Skip paddleocr_vl for transformer 4.57.3 (#29758)
hl475 Dec 1, 2025
2ed5a38
[Frontend] Resettle pooling entrypoints (#29634)
noooop Dec 1, 2025
c465977
[Frontend] Add tool filtering support to ToolServer (#29224)
daniel-salib Dec 1, 2025
0208a3d
[crashfix] Eagle + multimodal can crash on mm cache miss (#29750)
mickaelseznec Dec 1, 2025
a200481
[Misc] Unify tokenizer registration (#29767)
DarkLight1337 Dec 1, 2025
6255b07
[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774)
faaany Dec 1, 2025
e222576
[multimodal][test] Reduce memory utilization for test_siglip to avoid…
zhxchen17 Dec 1, 2025
73e617d
[v1] Add real sliding window calculation to FlexAttention direct Bloc…
Isotr0py Dec 1, 2025
f7c34d5
[Bugfix] TypeError: 'NoneType' object is not callable (#29414)
mostrowskix Dec 1, 2025
c267517
[CI] Renovation of nightly wheel build & generation (#29690)
Harry-Chen Dec 1, 2025
9a55742
[Doc] fix heading levels (#29783)
KKKZOZ Dec 1, 2025
0c663e2
[CI] fix url-encoding behavior in nightly metadata generation (#29787)
Harry-Chen Dec 1, 2025
9cade32
[Doc] Update description disable_any_whitespace (#29784)
FredericOdermatt Dec 1, 2025
7686bdf
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Tran…
sangbumlikeagod Dec 1, 2025
54fd2b1
[Hardware][AMD] Remove ROCm skip conditions for transformers backend …
Abdennacer-Badaoui Dec 1, 2025
3cc1894
[Bugfix] Missing cached item in the MultiModalReceiverCache (#28525)
knlnguyen1802 Dec 1, 2025
45dbac4
[ci] Make distributed 8 gpus test optional (#29801)
khluu Dec 1, 2025
d87431e
[Core][Observability] Add KV cache residency metrics (#27793)
shivampr Dec 1, 2025
1baadae
Update FAQ on interleaving sliding windows support (#29796)
finbarrtimbers Dec 1, 2025
58507ee
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not en…
leo-pony Dec 1, 2025
9504f29
lint fix
amd-hhashemi Dec 1, 2025
9953038
lint fix2
amd-hhashemi Dec 1, 2025
f4c2bf9
Target K=2880 case only.
amd-hhashemi Dec 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
46 changes: 0 additions & 46 deletions .buildkite/generate_index.py

This file was deleted.

16 changes: 1 addition & 15 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ steps:
commands:
# #NOTE: torch_cuda_arch_list is derived from upstream PyTorch build files here:
# https://github.com/pytorch/pytorch/blob/main/.ci/aarch64_linux/aarch64_ci_build.sh#L7
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg VLLM_MAIN_CUDA_VERSION=12.9 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/scripts/upload-wheels.sh"
Expand All @@ -30,19 +30,6 @@ steps:
DOCKER_BUILDKIT: "1"

# x86 + CUDA builds
- label: "Build wheel - CUDA 12.8"
depends_on: ~
id: build-wheel-cuda-12-8
agents:
queue: cpu_queue_postmerge
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.8.1 --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/scripts/upload-wheels.sh"
env:
DOCKER_BUILDKIT: "1"

- label: "Build wheel - CUDA 12.9"
depends_on: ~
id: build-wheel-cuda-12-9
Expand Down Expand Up @@ -109,7 +96,6 @@ steps:
- label: "Annotate release workflow"
depends_on:
- create-multi-arch-manifest
- build-wheel-cuda-12-8
id: annotate-release-workflow
agents:
queue: cpu_queue_postmerge
Expand Down
Loading