Hi,
I've been maintaining Box, a community fork of AI Edge Gallery, and wanted to
introduce it and gauge interest in contributing some of its features back upstream.
What is Box?
Box layers additional capabilities on top of AI Edge Gallery. It ships as two
builds — stock Android and GrapheneOS/custom ROM support — and has a growing
user base.
What Box adds on top of upstream:
| Area |
What Box adds |
| Inference engines |
llama.cpp (GGUF LLMs), stable-diffusion.cpp (image gen), whisper.cpp (STT) alongside LiteRT |
| Model import |
Import any local GGUF file — not limited to the curated download list |
| NPU / TPU |
All Snapdragon / Tensor / MediaTek variants bundled in one APK |
| Voice mode |
Free talk (continuous hands-free loop) and Vision talk (live camera + voice) |
| Image generation |
On-device Stable Diffusion via GGUF |
| Speech-to-text |
On-device Whisper STT |
| Document analysis |
Attach text files directly in chat |
| Chat history |
Persisted to a SQLCipher-encrypted Room database, resumable across sessions |
| Security |
Biometric app lock, hard offline mode, prompt sanitisation, audit log |
| Agent skills |
20 built-in skills (upstream has 9) |
| Math rendering |
LaTeX expressions rendered as Unicode in chat |
Question for the team:
Some of these are Box-specific (GGUF, security, GrapheneOS support) but others
feel like natural fits for upstream — particularly voice-to-voice (Whisper STT +
streaming TTS), vision input in AI Chat, and document attachment. Would any of
these be welcome as pull requests? Happy to discuss scope and implementation
before opening anything formally.
Thanks for building AI Edge Gallery — it's been a great foundation.
Hi,
I've been maintaining Box, a community fork of AI Edge Gallery, and wanted to
introduce it and gauge interest in contributing some of its features back upstream.
What is Box?
Box layers additional capabilities on top of AI Edge Gallery. It ships as two
builds — stock Android and GrapheneOS/custom ROM support — and has a growing
user base.
What Box adds on top of upstream:
Question for the team:
Some of these are Box-specific (GGUF, security, GrapheneOS support) but others
feel like natural fits for upstream — particularly voice-to-voice (Whisper STT +
streaming TTS), vision input in AI Chat, and document attachment. Would any of
these be welcome as pull requests? Happy to discuss scope and implementation
before opening anything formally.
Thanks for building AI Edge Gallery — it's been a great foundation.