vllm-rocm

We provide portable builds of vLLM with AMD ROCm 7.12 acceleration. Each release is a self-contained archive that bundles a relocatable CPython interpreter, vLLM, PyTorch, and all required ROCm user-space libraries as pip packages — no system Python, PyTorch, or ROCm install required. Our automated pipeline targets integration with Lemonade.

Important

Early Development: This project is in active development. ROCm support for consumer AMD GPUs (RDNA) in vLLM is experimental. We welcome issue reports and contributions.

Supported Devices

GPU Target	Architecture	Devices
gfx1151	STX Halo APU	Ryzen AI MAX+ Pro 395
gfx1150	STX Point APU	Ryzen AI 300
gfx120X	RDNA4 GPUs	RX 9070 XT, RX 9070, RX 9060 XT, RX 9060
gfx110X	RDNA3 GPUs	RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT, RX 7600 XT/7600

All builds include ROCm 7.12 user-space built-in — no separate ROCm installation required. You still need a Linux kernel with a working amdgpu driver for your GPU; for gfx1151 specifically this means kernel 6.18.4+ (see Lemonade's gfx1151 notes).

Quick Start

Download both parts of the build for your GPU from the latest release. Releases are split into .part00.tar.gz + .part01.tar.gz because each build exceeds GitHub's 2 GB per-asset limit.

Extract the archive (concatenate the parts and pipe into tar):

mkdir -p ~/vllm-rocm
cat vllm0.19.0-rocm7.12.0-gfx1151-x64.part00.tar.gz \
    vllm0.19.0-rocm7.12.0-gfx1151-x64.part01.tar.gz \
  | tar xz -C ~/vllm-rocm

Run the server:

~/vllm-rocm/bin/vllm-server --model meta-llama/Llama-3.2-1B --port 8000

Test with curl:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.2-1B", "prompt": "Hello", "max_tokens": 50}'

Lemonade Integration: These builds are designed to work as a backend for Lemonade, which manages downloading, launching, and routing requests to vLLM automatically.

What's Included

Each release archive extracts to a relocatable CPython 3.12 distribution with all deps pre-installed into site-packages:

bin/
  vllm-server                 # Launcher shim (sets LD_LIBRARY_PATH, execs api_server)
  python3.12                  # Bundled CPython interpreter (python-build-standalone)
lib/
  libpython3.12.so            # Python runtime
  python3.12/
    site-packages/
      vllm/                   # pip-installed from wheels.vllm.ai/rocm/
      torch/                  # pip-installed from repo.amd.com/rocm/whl/<arch>/
      _rocm_sdk_core/lib/     # ROCm core user-space (hip, hsa, comgr, clang, llvm)
      _rocm_sdk_libraries_gfx<arch>/lib/
                              # Per-arch ROCm math libs (rocblas, hipblas, rccl, MIOpen, ...)
      transformers/, numpy/, ...  # Python deps

The top-level lib/ holds the Python stdlib and libpython3.12.so; ROCm libraries (e.g. libamdhip64.so, librocblas.so) live under the bundled site-packages. The bin/vllm-server shim puts those directories on LD_LIBRARY_PATH before exec-ing python3 -m vllm.entrypoints.openai.api_server.

Automated Builds

Our GitHub Actions workflow:

Downloads a relocatable CPython 3.12 from astral-sh/python-build-standalone
Installs PyTorch ROCm from AMD's pip index (https://repo.amd.com/rocm/whl/<target>/)
Installs vLLM ROCm (pre-built wheel) from AMD's vLLM wheel index (https://wheels.vllm.ai/rocm/), which pulls the matching rocm-sdk-core and rocm-sdk-libraries-gfx<target> wheels as transitive deps
Generates a bin/vllm-server shim that wires up LD_LIBRARY_PATH / PYTHONPATH at startup
Tars the result, splits it into < 2 GB parts, and tests on self-hosted AMD GPU hardware before releasing

GPU Target	Ubuntu
gfx1151
gfx1150
gfx120X
gfx110X

Linux (gfx1150/APU): OOM despite free VRAM? Add ttm.pages_limit=12582912 (48 GB) to the kernel cmdline (e.g. GRUB), run update-grub, then reboot. See TheRock FAQ.

Dependencies

Runtime (bundled in the release)

vLLM — high-throughput LLM serving engine (ROCm wheel from wheels.vllm.ai/rocm/)
PyTorch — tensor compute (ROCm wheel from repo.amd.com/rocm/whl/<target>/)
ROCm SDK wheels — AMD's pip-packaged ROCm user-space (rocm-sdk-core, rocm-sdk-libraries-gfx<target>, published alongside via TheRock)
python-build-standalone — relocatable CPython 3.12

Build (CI only)

Ubuntu 22.04 GitHub Actions runner
pip (no cmake, ninja, or patchelf involved — everything comes from pre-built wheels)

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
scripts		scripts
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-rocm

Supported Devices

Quick Start

What's Included

Automated Builds

Dependencies

Runtime (bundled in the release)

Build (CI only)

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-rocm

Supported Devices

Quick Start

What's Included

Automated Builds

Dependencies

Runtime (bundled in the release)

Build (CI only)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages