[Discussion] Box — community fork with voice, vision, image gen & more — interest in upstream contributions?

 Hi,                  

  I've been maintaining Box, a community fork of AI Edge Gallery, and wanted to                                                                                                           
  introduce it and gauge interest in contributing some of its features back upstream.
                                                                                                                                                                                          
  **What is Box?**     
  Box layers additional capabilities on top of AI Edge Gallery. It ships as two
  builds — stock Android and GrapheneOS/custom ROM support — and has a growing                                                                                                            
  user base.
                                                                                                                                                                                          
                                                                                                                                                                                          
  ---
                                                                                                                                                                                          
  **What Box adds on top of upstream:**
                                       
  | Area | What Box adds |
  |---|---|               
  | Inference engines | llama.cpp (GGUF LLMs), stable-diffusion.cpp (image gen), whisper.cpp (STT) alongside LiteRT |
  | Model import | Import any local GGUF file — not limited to the curated download list |                           
  | NPU / TPU | All Snapdragon / Tensor / MediaTek variants bundled in one APK |                                                                                                          
  | Voice mode | Free talk (continuous hands-free loop) and Vision talk (live camera + voice) |
  | Image generation | On-device Stable Diffusion via GGUF |                                                                                                                              
  | Speech-to-text | On-device Whisper STT |                
  | Document analysis | Attach text files directly in chat |                                                                                                                              
  | Chat history | Persisted to a SQLCipher-encrypted Room database, resumable across sessions |
  | Security | Biometric app lock, hard offline mode, prompt sanitisation, audit log |                                                                                                    
  | Agent skills | 20 built-in skills (upstream has 9) |                              
  | Math rendering | LaTeX expressions rendered as Unicode in chat |                                                                                                                      
                                                                    
  ---                                                                                                                                                                                     
                       
  **Question for the team:**
  Some of these are Box-specific (GGUF, security, GrapheneOS support) but others
  feel like natural fits for upstream — particularly voice-to-voice (Whisper STT +
  streaming TTS), vision input in AI Chat, and document attachment. Would any of                                                                                                          
  these be welcome as pull requests? Happy to discuss scope and implementation                                                                                                            
  before opening anything formally.                                                                                                                                                       
                                                                                                                                                                                          
  Thanks for building AI Edge Gallery — it's been a great foundation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Box — community fork with voice, vision, image gen & more — interest in upstream contributions? #779

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Area	What Box adds
Inference engines	llama.cpp (GGUF LLMs), stable-diffusion.cpp (image gen), whisper.cpp (STT) alongside LiteRT
Model import	Import any local GGUF file — not limited to the curated download list
NPU / TPU	All Snapdragon / Tensor / MediaTek variants bundled in one APK
Voice mode	Free talk (continuous hands-free loop) and Vision talk (live camera + voice)
Image generation	On-device Stable Diffusion via GGUF
Speech-to-text	On-device Whisper STT
Document analysis	Attach text files directly in chat
Chat history	Persisted to a SQLCipher-encrypted Room database, resumable across sessions
Security	Biometric app lock, hard offline mode, prompt sanitisation, audit log
Agent skills	20 built-in skills (upstream has 9)
Math rendering	LaTeX expressions rendered as Unicode in chat

[Discussion] Box — community fork with voice, vision, image gen & more — interest in upstream contributions? #779

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions