GitHub - qualcomm/GenieX: Run frontier LLMs and VLMs locally on Qualcomm devices across NPU, GPU, and CPU with a few lines of code

The easiest way to run frontier LLMs & VLMs locally on Qualcomm devices

Documentation · Quickstart · Models · Community

GenieX is an on-device Gen AI inference runtime for Qualcomm devices. Bring almost any GGUF model from Hugging Face — or a pre-compiled bundle from Qualcomm AI Hub — and run it locally on the Hexagon NPU, Adreno GPU, or CPU in a few lines of code. One C SDK underneath, exposed through a CLI, Python, Kotlin/Java, Docker, and an OpenAI-compatible server. It is the community version of Qualcomm GENIE.

GenieX architecture: CLI, Python, Java, Docker, and OpenAI-compatible Serve interfaces sit on a single GenieX SDK, which dispatches to the llama.cpp runtime (GGML over CPU / GPU / Hexagon HTP kernels) or the Qualcomm AI Engine Direct runtime on the NPU — across Windows, Android, and Linux.

Supported platforms

GenieX runs only on Qualcomm Snapdragon. Find your platform, then jump straight to the interface you want to use.

Platform	Example devices	Jump to a quickstart
🪟 Windows ARM64 (Compute)	Snapdragon X · X Elite	CLI · Python · Local server
🤖 Android (Mobile)	Snapdragon 8 Elite · 8 Elite Gen 5	Android SDK
🐧 Linux ARM64 (IoT)	Dragonwing QCS9075	CLI · Docker · Python

No device on hand? Spin up a remote session on Qualcomm Device Cloud.

Quickstart

Pick your interface below. Each one follows the same three steps — Install, Run, and Docs — and shows both runtimes: a GGUF model from Hugging Face (llama_cpp) and a pre-compiled bundle from Qualcomm AI Hub (qairt, NPU).

CLI

Install

Windows ARM64 — download the installer, run it, then open a new terminal.

Linux ARM64 — one line, no sudo:

curl -fsSL https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-geniex/install.sh | sh

Run — chat with any model in one line (drag in an image for VLMs):

# GGUF from Hugging Face → llama.cpp (NPU / GPU / CPU)
geniex infer google/gemma-4-E4B-it-qat-q4_0-gguf

# Pre-compiled bundle from Qualcomm AI Hub → Qualcomm AI Engine Direct (NPU)
geniex infer ai-hub-models/Qwen2.5-VL-7B-Instruct

📖 Docs — Install · Quickstart · Command reference

Python

Install

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple geniex

Run — mirrors Hugging Face transformers (from_pretrained() → .generate()):

# GGUF from Hugging Face → llama.cpp
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.5-2B-GGUF", precision="Q4_0")

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(messages, add_generation_prompt=True)

for chunk in model.generate(prompt, max_new_tokens=256, stream=True):
    print(chunk, end="", flush=True)

model.close()

# Pre-compiled bundle from Qualcomm AI Hub → Qualcomm AI Engine Direct (NPU)
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("ai-hub-models/Qwen3-4B")

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(messages, add_generation_prompt=True)

for chunk in model.generate(prompt, max_new_tokens=256, stream=True):
    print(chunk, end="", flush=True)

model.close()

📖 Docs — Install · Quickstart · API reference

OpenAI-compatible server

Install — ships with the CLI (install above).

Run — pull any model (GGUF or Qualcomm AI Hub bundle), then serve an OpenAI-compatible API:

geniex pull ai-hub-models/Qwen3-4B-Instruct-2507
geniex serve   # serves http://127.0.0.1:18181/v1

curl http://127.0.0.1:18181/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-hub-models/Qwen3-4B-Instruct-2507",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Point any OpenAI client at http://127.0.0.1:18181/v1 — no code changes.

📖 Docs — Local server guide

Android (Kotlin / Java)

Install — add the SDK to your app module's build.gradle.kts:

dependencies {
    implementation("com.qualcomm.qti:geniex-android:0.3.1")
}

Run — fastest path is the sample app (chat UI, model picker for GGUF + Qualcomm AI Hub bundles, VLM support):

The Android demo app lives in qualcomm/ai-hub-apps. Clone it, open the sample app in Android Studio, and hit Run.

📖 Docs — Install · Quickstart · API reference

Docker

Install

docker pull docker.io/qualcomm/geniex:latest

Run — the container wraps the CLI, so geniex infer … works exactly as above.

📖 Docs — Docker guide

C / C++ SDK

Install — link against the single C header sdk/include/geniex.h; every other interface is a thin wrapper over it.

📖 Docs — sdk/README.md · notes/build.md

Models

GenieX has two runtimes so you get broad model coverage and peak Snapdragon performance in one stack. Both LLMs and VLMs are supported.

	llama.cpp (`llama_cpp`)	Qualcomm AI Engine Direct (`qairt`)
Get models from	Hugging Face (any GGUF)	Qualcomm AI Hub (pre-compiled)
Format	GGUF	Per-chipset bundle
Compute units	NPU · GPU · CPU	NPU only
Best for	Bringing your own GGUF	Highest NPU performance

For llama.cpp, pick the Q4_0 precision when prompted — it has the best Hexagon NPU support. See the Models guide → for the full list, precisions, and how to run a local model.

🤝 Contributing

Contributions are welcome! Before opening a PR, please read CONTRIBUTING.md for branch naming, commit / PR title format, pre-commit checks, and the FFI-update rule for public SDK headers.


🏗️ Build the CLI, SDK, or Python bindings	notes/build.md
▶️ Run & select compute units / pull models	notes/run.md
🏷️ Release — SemVer tags, channels, HTP signing	notes/release.md
📚 All developer docs	docs/README.md

💬 Community & Contact

Questions, ideas, or want to show off what you built? Come say hi.

💬 Slack — ask questions and chat with the community in real time.
🐛 GitHub Issues — report a bug or request a feature.
🔗 LinkedIn — follow Qualcomm AI Hub for news and updates.

Contributors

Thanks to everyone building GenieX 💙

📄 License

BSD 3-Clause — see LICENSE and NOTICE.

Use of this project is also subject to Qualcomm's Terms of Use.

Name		Name	Last commit message	Last commit date
Latest commit History 1,660 Commits
.claude		.claude
.github		.github
bindings		bindings
cli		cli
docs		docs
examples/python		examples/python
notes		notes
scripts		scripts
sdk		sdk
tests		tests
third-party		third-party
.bazelignore		.bazelignore
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.mailmap		.mailmap
BUILD.bazel		BUILD.bazel
CLAUDE.md		CLAUDE.md
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
GenieX-Logo-Hor-1-Black.png		GenieX-Logo-Hor-1-Black.png
GenieX-Logo-Hor-1-White.png		GenieX-Logo-Hor-1-White.png
LICENSE		LICENSE
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
repolint.json		repolint.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The easiest way to run frontier LLMs & VLMs locally on Qualcomm devices

Supported platforms

Quickstart

CLI

Python

OpenAI-compatible server

Android (Kotlin / Java)

Docker

C / C++ SDK

Models

🤝 Contributing

💬 Community & Contact

Contributors

📄 License

About

Uh oh!

Releases 55

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The easiest way to run frontier LLMs & VLMs locally on Qualcomm devices

Supported platforms

Quickstart

CLI

Python

OpenAI-compatible server

Android (Kotlin / Java)

Docker

C / C++ SDK

Models

🤝 Contributing

💬 Community & Contact

Contributors

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 55

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages