Skip to content

qualcomm/GenieX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,660 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Qualcomm AI Hub GenieX

The easiest way to run frontier LLMs & VLMs locally on Qualcomm devices

Docs Release License: BSD-3-Clause Slack

Documentation Β· Quickstart Β· Models Β· Community


GenieX is an on-device Gen AI inference runtime for Qualcomm devices. Bring almost any GGUF model from Hugging Face β€” or a pre-compiled bundle from Qualcomm AI Hub β€” and run it locally on the Hexagon NPU, Adreno GPU, or CPU in a few lines of code. One C SDK underneath, exposed through a CLI, Python, Kotlin/Java, Docker, and an OpenAI-compatible server. It is the community version of Qualcomm GENIE.

GenieX architecture: CLI, Python, Java, Docker, and OpenAI-compatible Serve interfaces sit on a single GenieX SDK, which dispatches to the llama.cpp runtime (GGML over CPU / GPU / Hexagon HTP kernels) or the Qualcomm AI Engine Direct runtime on the NPU β€” across Windows, Android, and Linux.

Supported platforms

GenieX runs only on Qualcomm Snapdragon. Find your platform, then jump straight to the interface you want to use.

Platform Example devices Jump to a quickstart
πŸͺŸ Windows ARM64 (Compute) Snapdragon X Β· X Elite CLI Β· Python Β· Local server
πŸ€– Android (Mobile) Snapdragon 8 Elite Β· 8 Elite Gen 5 Android SDK
🐧 Linux ARM64 (IoT) Dragonwing QCS9075 CLI · Docker · Python

No device on hand? Spin up a remote session on Qualcomm Device Cloud.


Quickstart

Pick your interface below. Each one follows the same three steps β€” Install, Run, and Docs β€” and shows both runtimes: a GGUF model from Hugging Face (llama_cpp) and a pre-compiled bundle from Qualcomm AI Hub (qairt, NPU).

CLI

Windows ARM64 Linux ARM64

Install

  • Windows ARM64 β€” download the installer, run it, then open a new terminal.
  • Linux ARM64 β€” one line, no sudo:
    curl -fsSL https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-geniex/install.sh | sh

Run β€” chat with any model in one line (drag in an image for VLMs):

# GGUF from Hugging Face β†’ llama.cpp (NPU / GPU / CPU)
geniex infer google/gemma-4-E4B-it-qat-q4_0-gguf

# Pre-compiled bundle from Qualcomm AI Hub β†’ Qualcomm AI Engine Direct (NPU)
geniex infer ai-hub-models/Qwen2.5-VL-7B-Instruct

πŸ“– Docs β€” Install Β· Quickstart Β· Command reference

Python

Windows ARM64 Linux ARM64

Install

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple geniex

Run β€” mirrors Hugging Face transformers (from_pretrained() β†’ .generate()):

# GGUF from Hugging Face β†’ llama.cpp
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.5-2B-GGUF", precision="Q4_0")

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(messages, add_generation_prompt=True)

for chunk in model.generate(prompt, max_new_tokens=256, stream=True):
    print(chunk, end="", flush=True)

model.close()
# Pre-compiled bundle from Qualcomm AI Hub β†’ Qualcomm AI Engine Direct (NPU)
from geniex import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("ai-hub-models/Qwen3-4B")

messages = [{"role": "user", "content": "What is 2+2?"}]
prompt = model.tokenizer.apply_chat_template(messages, add_generation_prompt=True)

for chunk in model.generate(prompt, max_new_tokens=256, stream=True):
    print(chunk, end="", flush=True)

model.close()

πŸ“– Docs β€” Install Β· Quickstart Β· API reference

OpenAI-compatible server

Windows ARM64 Linux ARM64

Install β€” ships with the CLI (install above).

Run β€” pull any model (GGUF or Qualcomm AI Hub bundle), then serve an OpenAI-compatible API:

geniex pull ai-hub-models/Qwen3-4B-Instruct-2507
geniex serve   # serves http://127.0.0.1:18181/v1
curl http://127.0.0.1:18181/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-hub-models/Qwen3-4B-Instruct-2507",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Point any OpenAI client at http://127.0.0.1:18181/v1 β€” no code changes.

πŸ“– Docs β€” Local server guide

Android (Kotlin / Java)

Android

Install β€” add the SDK to your app module's build.gradle.kts:

dependencies {
    implementation("com.qualcomm.qti:geniex-android:0.3.1")
}

Run β€” fastest path is the sample app (chat UI, model picker for GGUF + Qualcomm AI Hub bundles, VLM support):

The Android demo app lives in qualcomm/ai-hub-apps. Clone it, open the sample app in Android Studio, and hit Run.

πŸ“– Docs β€” Install Β· Quickstart Β· API reference

Docker

Linux ARM64

Install

docker pull docker.io/qualcomm/geniex:latest

Run β€” the container wraps the CLI, so geniex infer … works exactly as above.

πŸ“– Docs β€” Docker guide

C / C++ SDK

Windows ARM64 Linux ARM64 Android

Install β€” link against the single C header sdk/include/geniex.h; every other interface is a thin wrapper over it.

πŸ“– Docs β€” sdk/README.md Β· notes/build.md


Models

GenieX has two runtimes so you get broad model coverage and peak Snapdragon performance in one stack. Both LLMs and VLMs are supported.

llama.cpp (llama_cpp) Qualcomm AI Engine Direct (qairt)
Get models from Hugging Face (any GGUF) Qualcomm AI Hub (pre-compiled)
Format GGUF Per-chipset bundle
Compute units NPU Β· GPU Β· CPU NPU only
Best for Bringing your own GGUF Highest NPU performance

For llama.cpp, pick the Q4_0 precision when prompted β€” it has the best Hexagon NPU support. See the Models guide β†’ for the full list, precisions, and how to run a local model.

🀝 Contributing

Contributions are welcome! Before opening a PR, please read CONTRIBUTING.md for branch naming, commit / PR title format, pre-commit checks, and the FFI-update rule for public SDK headers.

πŸ—οΈ Build the CLI, SDK, or Python bindings notes/build.md
▢️ Run & select compute units / pull models notes/run.md
🏷️ Release β€” SemVer tags, channels, HTP signing notes/release.md
πŸ“š All developer docs docs/README.md

πŸ’¬ Community & Contact

Questions, ideas, or want to show off what you built? Come say hi.

  • πŸ’¬ Slack β€” ask questions and chat with the community in real time.
  • πŸ› GitHub Issues β€” report a bug or request a feature.
  • πŸ”— LinkedIn β€” follow Qualcomm AI Hub for news and updates.

Contributors

Thanks to everyone building GenieX πŸ’™

GenieX contributors

πŸ“„ License

BSD 3-Clause β€” see LICENSE and NOTICE.

Use of this project is also subject to Qualcomm's Terms of Use.

About

Run frontier LLMs and VLMs locally on Qualcomm devices across NPU, GPU, and CPU with a few lines of code

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors