Skip to content

[Discussion] Box — community fork with voice, vision, image gen & more — interest in upstream contributions? #779

@jegly

Description

@jegly

Hi,

I've been maintaining Box, a community fork of AI Edge Gallery, and wanted to
introduce it and gauge interest in contributing some of its features back upstream.

What is Box?
Box layers additional capabilities on top of AI Edge Gallery. It ships as two
builds — stock Android and GrapheneOS/custom ROM support — and has a growing
user base.


What Box adds on top of upstream:

Area What Box adds
Inference engines llama.cpp (GGUF LLMs), stable-diffusion.cpp (image gen), whisper.cpp (STT) alongside LiteRT
Model import Import any local GGUF file — not limited to the curated download list
NPU / TPU All Snapdragon / Tensor / MediaTek variants bundled in one APK
Voice mode Free talk (continuous hands-free loop) and Vision talk (live camera + voice)
Image generation On-device Stable Diffusion via GGUF
Speech-to-text On-device Whisper STT
Document analysis Attach text files directly in chat
Chat history Persisted to a SQLCipher-encrypted Room database, resumable across sessions
Security Biometric app lock, hard offline mode, prompt sanitisation, audit log
Agent skills 20 built-in skills (upstream has 9)
Math rendering LaTeX expressions rendered as Unicode in chat

Question for the team:
Some of these are Box-specific (GGUF, security, GrapheneOS support) but others
feel like natural fits for upstream — particularly voice-to-voice (Whisper STT +
streaming TTS), vision input in AI Chat, and document attachment. Would any of
these be welcome as pull requests? Happy to discuss scope and implementation
before opening anything formally.

Thanks for building AI Edge Gallery — it's been a great foundation.

Metadata

Metadata

Assignees

Labels

type:featureRequest for new functionality or enhancement.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions