We provide portable builds of vLLM with AMD ROCm 7.12 acceleration. Each release is a self-contained archive that bundles a relocatable CPython interpreter, vLLM, PyTorch, and all required ROCm user-space libraries as pip packages — no system Python, PyTorch, or ROCm install required. Our automated pipeline targets integration with Lemonade.
Important
Early Development: This project is in active development. ROCm support for consumer AMD GPUs (RDNA) in vLLM is experimental. We welcome issue reports and contributions.
| GPU Target | Architecture | Devices |
|---|---|---|
| gfx1151 | STX Halo APU | Ryzen AI MAX+ Pro 395 |
| gfx1150 | STX Point APU | Ryzen AI 300 |
| gfx120X | RDNA4 GPUs | RX 9070 XT, RX 9070, RX 9060 XT, RX 9060 |
| gfx110X | RDNA3 GPUs | RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT, RX 7600 XT/7600 |
All builds include ROCm 7.12 user-space built-in — no separate ROCm installation required. You still need a Linux kernel with a working amdgpu driver for your GPU; for gfx1151 specifically this means kernel 6.18.4+ (see Lemonade's gfx1151 notes).
- Download both parts of the build for your GPU from the latest release. Releases are split into
.part00.tar.gz+.part01.tar.gzbecause each build exceeds GitHub's 2 GB per-asset limit. - Extract the archive (concatenate the parts and pipe into tar):
mkdir -p ~/vllm-rocm cat vllm0.19.0-rocm7.12.0-gfx1151-x64.part00.tar.gz \ vllm0.19.0-rocm7.12.0-gfx1151-x64.part01.tar.gz \ | tar xz -C ~/vllm-rocm
- Run the server:
~/vllm-rocm/bin/vllm-server --model meta-llama/Llama-3.2-1B --port 8000 - Test with curl:
curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model": "meta-llama/Llama-3.2-1B", "prompt": "Hello", "max_tokens": 50}'
Lemonade Integration: These builds are designed to work as a backend for Lemonade, which manages downloading, launching, and routing requests to vLLM automatically.
Each release archive extracts to a relocatable CPython 3.12 distribution with all deps pre-installed into site-packages:
bin/
vllm-server # Launcher shim (sets LD_LIBRARY_PATH, execs api_server)
python3.12 # Bundled CPython interpreter (python-build-standalone)
lib/
libpython3.12.so # Python runtime
python3.12/
site-packages/
vllm/ # pip-installed from wheels.vllm.ai/rocm/
torch/ # pip-installed from repo.amd.com/rocm/whl/<arch>/
_rocm_sdk_core/lib/ # ROCm core user-space (hip, hsa, comgr, clang, llvm)
_rocm_sdk_libraries_gfx<arch>/lib/
# Per-arch ROCm math libs (rocblas, hipblas, rccl, MIOpen, ...)
transformers/, numpy/, ... # Python deps
The top-level lib/ holds the Python stdlib and libpython3.12.so; ROCm libraries (e.g. libamdhip64.so, librocblas.so) live under the bundled site-packages. The bin/vllm-server shim puts those directories on LD_LIBRARY_PATH before exec-ing python3 -m vllm.entrypoints.openai.api_server.
Our GitHub Actions workflow:
- Downloads a relocatable CPython 3.12 from
astral-sh/python-build-standalone - Installs PyTorch ROCm from AMD's pip index (
https://repo.amd.com/rocm/whl/<target>/) - Installs vLLM ROCm (pre-built wheel) from AMD's vLLM wheel index (
https://wheels.vllm.ai/rocm/), which pulls the matchingrocm-sdk-coreandrocm-sdk-libraries-gfx<target>wheels as transitive deps - Generates a
bin/vllm-servershim that wires upLD_LIBRARY_PATH/PYTHONPATHat startup - Tars the result, splits it into
< 2 GBparts, and tests on self-hosted AMD GPU hardware before releasing
| GPU Target | Ubuntu |
|---|---|
| gfx1151 | |
| gfx1150 | |
| gfx120X | |
| gfx110X |
Linux (gfx1150/APU): OOM despite free VRAM? Add
ttm.pages_limit=12582912(48 GB) to the kernel cmdline (e.g. GRUB), runupdate-grub, then reboot. See TheRock FAQ.
- vLLM — high-throughput LLM serving engine (ROCm wheel from
wheels.vllm.ai/rocm/) - PyTorch — tensor compute (ROCm wheel from
repo.amd.com/rocm/whl/<target>/) - ROCm SDK wheels — AMD's pip-packaged ROCm user-space (
rocm-sdk-core,rocm-sdk-libraries-gfx<target>, published alongside via TheRock) - python-build-standalone — relocatable CPython 3.12
- Ubuntu 22.04 GitHub Actions runner
pip(nocmake,ninja, orpatchelfinvolved — everything comes from pre-built wheels)
This project is licensed under the MIT License — see the LICENSE file for details.