Add comprehensive mobile on-device AI tester app planning document#1
Add comprehensive mobile on-device AI tester app planning document#1
Conversation
Research and architecture plan for a cross-platform (Flutter) mobile app that enables users to: - Download and manage on-device AI models from Hugging Face - Run LLMs locally using llama.cpp (GGUF format) - Test vision models, audio/speech, and embeddings - Benchmark model performance on device Includes tech stack decisions, architecture diagrams, implementation phases, and references to key frameworks (TensorFlow Lite, Core ML, MLC LLM).
There was a problem hiding this comment.
Pull request overview
This PR adds a comprehensive 830-line planning document for a cross-platform mobile application focused on on-device AI model testing. However, there is a critical discrepancy: the existing repository is for a web-based Gemini API Tester, while this document describes a completely different Flutter mobile app for local AI model management.
Key additions in the document:
- Technology stack analysis comparing Flutter vs React Native for on-device AI
- Detailed architecture with native bridges for llama.cpp integration on Android/iOS
- Code examples for model downloading, inference services, and platform-specific implementations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
MOBILE_APP_PLAN.md
Outdated
| hive_flutter: ^1.1.0 | ||
|
|
||
| # ML/AI | ||
| flutter_llama: ^0.1.0 # llama.cpp binding |
There was a problem hiding this comment.
The package 'flutter_llama: ^0.1.0' appears to be specified with a very early version (0.1.0). This version may not be stable or feature-complete for production use. Consider verifying that this is the correct and most recent version available, and note in the documentation that this is an early-stage dependency that may require updates or have breaking changes.
| flutter_llama: ^0.1.0 # llama.cpp binding | |
| flutter_llama: ^0.1.0 # llama.cpp binding (early-stage; verify latest stable version and expect breaking changes before production) |
| Future<List<ModelInfo>> searchGGUFModels(String query) async { | ||
| final response = await _dio.get( | ||
| '$_baseUrl/models', | ||
| queryParameters: { | ||
| 'search': query, | ||
| 'filter': 'gguf', | ||
| 'sort': 'downloads', | ||
| 'direction': -1, | ||
| 'limit': 50, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
The Hugging Face API endpoint uses '/api/models' but the actual Hugging Face API base URL should be 'https://huggingface.co/api' (which is correct), however, the query parameters used may not match the actual Hugging Face API specification. Specifically, the 'filter' parameter with value 'gguf' is not a standard Hugging Face API parameter. The correct approach would be to use tags or search within model card content. Please verify this API call matches the actual Hugging Face API specification.
| Future<void> downloadModel(ModelInfo model) async { | ||
| final task = DownloadTask( | ||
| id: uuid.v4(), | ||
| modelId: model.id, | ||
| url: model.downloadUrl, | ||
| totalSize: model.sizeBytes, | ||
| downloadedSize: 0, | ||
| status: DownloadStatus.pending, | ||
| ); | ||
|
|
||
| await _taskBox.put(task.id, task); | ||
|
|
||
| final response = await _dio.download( | ||
| model.downloadUrl, | ||
| model.localPath, | ||
| onReceiveProgress: (received, total) { | ||
| _updateProgress(task.id, received, total); | ||
| }, | ||
| options: Options( | ||
| headers: { | ||
| 'Range': 'bytes=${task.downloadedSize}-', // Resume support | ||
| }, | ||
| ), | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
The download manager code lacks error handling for network failures, disk space issues, or invalid URLs. The download method should include try-catch blocks to handle DioError exceptions, check available disk space before downloading, and validate the downloaded file. Additionally, there's no cleanup logic if the download fails partway through.
| val modelPath = call.argument<String>("path")!! | ||
| val nCtx = call.argument<Int>("contextSize") ?: 2048 | ||
|
|
||
| llamaContext = LlamaCpp.loadModel(modelPath, nCtx) | ||
| result.success(llamaContext != 0L) | ||
| } | ||
| "startGeneration" -> { | ||
| val prompt = call.argument<String>("prompt")!! | ||
| val config = GenerationConfig.fromMap(call.arguments as Map<*, *>) | ||
|
|
There was a problem hiding this comment.
The Kotlin plugin code uses non-null assertions ('!!') which will crash the app if the arguments are null or missing. While Flutter should send the correct arguments, using non-null assertions is risky in plugin code. Consider using safe calls with proper error handling instead, or at minimum add validation before the assertion.
| val modelPath = call.argument<String>("path")!! | |
| val nCtx = call.argument<Int>("contextSize") ?: 2048 | |
| llamaContext = LlamaCpp.loadModel(modelPath, nCtx) | |
| result.success(llamaContext != 0L) | |
| } | |
| "startGeneration" -> { | |
| val prompt = call.argument<String>("prompt")!! | |
| val config = GenerationConfig.fromMap(call.arguments as Map<*, *>) | |
| val modelPath = call.argument<String>("path") | |
| if (modelPath.isNullOrEmpty()) { | |
| result.error( | |
| "INVALID_ARGUMENT", | |
| "Argument 'path' is required for loadModel", | |
| null | |
| ) | |
| return | |
| } | |
| val nCtx = call.argument<Int>("contextSize") ?: 2048 | |
| llamaContext = LlamaCpp.loadModel(modelPath, nCtx) | |
| result.success(llamaContext != 0L) | |
| } | |
| "startGeneration" -> { | |
| val prompt = call.argument<String>("prompt") | |
| if (prompt.isNullOrEmpty()) { | |
| result.error( | |
| "INVALID_ARGUMENT", | |
| "Argument 'prompt' is required for startGeneration", | |
| null | |
| ) | |
| return | |
| } | |
| val arguments = call.arguments | |
| val config = if (arguments is Map<*, *>) { | |
| GenerationConfig.fromMap(arguments) | |
| } else { | |
| result.error( | |
| "INVALID_ARGUMENT", | |
| "Expected map arguments for startGeneration", | |
| null | |
| ) | |
| return | |
| } |
MOBILE_APP_PLAN.md
Outdated
| Stream<String> generateText(String prompt, GenerationConfig config) { | ||
| _channel.invokeMethod('startGeneration', { | ||
| 'prompt': prompt, | ||
| 'temperature': config.temperature, | ||
| 'maxTokens': config.maxTokens, | ||
| 'topP': config.topP, | ||
| }); | ||
|
|
||
| return _eventChannel.receiveBroadcastStream().map((token) => token as String); |
There was a problem hiding this comment.
The LlamaInferenceService implementation invokes 'startGeneration' but doesn't wait for a result or handle potential errors from the method call. The stream will be returned immediately, but if the native method fails, the stream might not emit any events or error signals. Consider checking the method call result and handling failures before returning the stream.
| Stream<String> generateText(String prompt, GenerationConfig config) { | |
| _channel.invokeMethod('startGeneration', { | |
| 'prompt': prompt, | |
| 'temperature': config.temperature, | |
| 'maxTokens': config.maxTokens, | |
| 'topP': config.topP, | |
| }); | |
| return _eventChannel.receiveBroadcastStream().map((token) => token as String); | |
| Stream<String> generateText(String prompt, GenerationConfig config) async* { | |
| try { | |
| await _channel.invokeMethod('startGeneration', { | |
| 'prompt': prompt, | |
| 'temperature': config.temperature, | |
| 'maxTokens': config.maxTokens, | |
| 'topP': config.topP, | |
| }); | |
| } catch (e) { | |
| yield* Stream.error(e); | |
| return; | |
| } | |
| yield* _eventChannel | |
| .receiveBroadcastStream() | |
| .map((token) => token as String); |
MOBILE_APP_PLAN.md
Outdated
| llamaContext = llama_load_model(modelPath, contextSize) | ||
| result(llamaContext != nil) | ||
|
|
||
| case "startGeneration": | ||
| guard let args = call.arguments as? [String: Any], | ||
| let prompt = args["prompt"] as? String else { | ||
| result(FlutterError(code: "INVALID_ARGS", message: nil, details: nil)) | ||
| return | ||
| } | ||
|
|
||
| DispatchQueue.global(qos: .userInitiated).async { [weak self] in | ||
| llama_generate(self?.llamaContext, prompt) { token in | ||
| DispatchQueue.main.async { | ||
| self?.eventSink?(token) | ||
| } | ||
| } | ||
| DispatchQueue.main.async { | ||
| self?.eventSink?(FlutterEndOfEventStream) | ||
| } | ||
| } | ||
| result(true) |
There was a problem hiding this comment.
The Swift code references 'llama_load_model' and 'llama_generate' functions that are assumed to exist but are not defined. These function signatures and their callback mechanisms need to be properly documented or defined, especially the closure parameter for 'llama_generate'. The implementation should clarify how the C++ bridge exposes these functions and their expected signatures.
| FOREIGN KEY (model_id) REFERENCES models(id) | ||
| ); | ||
|
|
||
| -- Messages table | ||
| CREATE TABLE messages ( | ||
| id TEXT PRIMARY KEY, | ||
| conversation_id TEXT NOT NULL, | ||
| role TEXT NOT NULL, -- user, assistant, system | ||
| content TEXT NOT NULL, | ||
| tokens INTEGER, | ||
| generation_time_ms INTEGER, | ||
| created_at INTEGER NOT NULL, | ||
| FOREIGN KEY (conversation_id) REFERENCES conversations(id) | ||
| ); | ||
|
|
||
| -- Benchmarks table | ||
| CREATE TABLE benchmarks ( | ||
| id TEXT PRIMARY KEY, | ||
| model_id TEXT NOT NULL, | ||
| device_info TEXT NOT NULL, -- JSON | ||
| prompt_eval_tps REAL, | ||
| generation_tps REAL, | ||
| memory_mb REAL, | ||
| load_time_ms INTEGER, | ||
| created_at INTEGER NOT NULL, | ||
| FOREIGN KEY (model_id) REFERENCES models(id) | ||
| ); |
There was a problem hiding this comment.
The database schema lacks proper cascade delete behavior for foreign key constraints. When a model, conversation, or other parent entity is deleted, related records could become orphaned. Consider adding 'ON DELETE CASCADE' or 'ON DELETE SET NULL' clauses to the foreign key definitions to maintain referential integrity.
MOBILE_APP_PLAN.md
Outdated
| # On-Device AI Model Tester - Mobile App Planning Document | ||
|
|
||
| ## Executive Summary | ||
|
|
||
| A cross-platform mobile application for Android and iOS that enables users to download, manage, and run on-device AI models locally. The app provides a unified interface for testing various model types (LLMs, vision, audio, embeddings) without requiring cloud connectivity. |
There was a problem hiding this comment.
There is a significant discrepancy between the PR content and the existing repository. The repository README indicates this is a "Gemini API Tester" for web-based testing of Google's Gemini and Veo APIs, but this planning document describes a completely different mobile app for on-device AI model testing. This planning document appears to be unrelated to the current project's purpose. Please clarify if this is intentional or if the document was added to the wrong repository.
| abstract class InferenceService { | ||
| Future<void> loadModel(String modelPath, ModelConfig config); | ||
| Future<void> unloadModel(); | ||
| Stream<String> generateText(String prompt, GenerationConfig config); | ||
| Future<List<double>> generateEmbedding(String text); |
There was a problem hiding this comment.
The InferenceService interface is missing critical lifecycle methods and error handling. Consider adding methods for: checking if a model is currently loaded, getting model information/metadata, handling memory warnings, and canceling ongoing inference operations. Additionally, all methods should specify what exceptions they might throw for proper error handling by consumers.
| abstract class InferenceService { | |
| Future<void> loadModel(String modelPath, ModelConfig config); | |
| Future<void> unloadModel(); | |
| Stream<String> generateText(String prompt, GenerationConfig config); | |
| Future<List<double>> generateEmbedding(String text); | |
| abstract class InferenceService { | |
| /// Loads a model from [modelPath] with the given [config]. | |
| /// | |
| /// Throws: | |
| /// - [ArgumentError] if [modelPath] or [config] is invalid. | |
| /// - [StateError] if another model is already loaded and cannot be replaced. | |
| /// - [Exception] for I/O or platform‐specific errors while loading the model. | |
| Future<void> loadModel(String modelPath, ModelConfig config); | |
| /// Unloads the currently loaded model and frees associated resources. | |
| /// | |
| /// Safe to call even if no model is currently loaded. | |
| /// | |
| /// Throws: | |
| /// - [Exception] if the underlying runtime fails to release resources. | |
| Future<void> unloadModel(); | |
| /// Returns `true` if a model is currently loaded and ready for inference. | |
| bool get isModelLoaded; | |
| /// Returns metadata for the currently loaded model, or `null` if none is loaded. | |
| /// | |
| /// Throws: | |
| /// - [Exception] if the underlying runtime cannot provide model information. | |
| Future<ModelInfo?> getLoadedModelInfo(); | |
| /// Notifies the service of memory pressure so it can proactively release | |
| /// caches or other non‑critical data. | |
| /// | |
| /// Implementations should make best‑effort to reduce memory usage without | |
| /// impacting correctness. | |
| /// | |
| /// Throws: | |
| /// - [Exception] if memory cleanup fails in a recoverable but noteworthy way. | |
| Future<void> handleMemoryWarning(); | |
| /// Cancels any ongoing inference operations managed by this service. | |
| /// | |
| /// Implementations should make this method idempotent. | |
| /// | |
| /// Throws: | |
| /// - [Exception] if cancellation fails or is not supported by the backend. | |
| Future<void> cancelAllInference(); | |
| /// Starts text generation for the given [prompt] and [config]. | |
| /// | |
| /// The returned [Stream] emits tokens (or partial text chunks) as they are | |
| /// produced by the underlying model. | |
| /// | |
| /// Throws: | |
| /// - [StateError] if no model is loaded. | |
| /// - [ArgumentError] if [prompt] or [config] is invalid. | |
| /// - [Exception] for runtime/platform‑level inference errors. | |
| Stream<String> generateText(String prompt, GenerationConfig config); | |
| /// Computes an embedding vector for the given [text]. | |
| /// | |
| /// Throws: | |
| /// - [StateError] if no model is loaded. | |
| /// - [ArgumentError] if [text] is invalid or empty when not allowed. | |
| /// - [Exception] for runtime/platform‑level inference errors. | |
| Future<List<double>> generateEmbedding(String text); | |
| /// Classifies the provided image data and returns a [ClassificationResult]. | |
| /// | |
| /// Throws: | |
| /// - [StateError] if no model is loaded. | |
| /// - [ArgumentError] if [imageData] is invalid or corrupted. | |
| /// - [Exception] for runtime/platform‑level inference errors. |
| thread { | ||
| LlamaCpp.generate(llamaContext, prompt, config) { token -> | ||
| mainHandler.post { | ||
| eventSink?.success(token) | ||
| } | ||
| } | ||
| mainHandler.post { | ||
| eventSink?.endOfStream() | ||
| } | ||
| } | ||
| result.success(true) |
There was a problem hiding this comment.
The Kotlin code uses 'thread { }' to launch a background thread, but this is not a safe or recommended approach for Flutter plugins. Consider using a proper executor or coroutines instead. Additionally, 'mainHandler' is referenced but never defined in this code snippet. For production code, you should use 'Handler(Looper.getMainLooper())' or migrate to Kotlin Coroutines with proper dispatchers.
- Research Apple Foundation Models framework (iOS 26+, ~3B on-device LLM) - Document guided generation, tool calling, and Swift macros - Recommend two separate apps due to platform constraints: 1. On-Device AI Tester (Flutter) - cross-platform, open-source models 2. Apple Intelligence Studio (Swift) - Apple-only, Foundation Models - Add App Builder concept for future iOS app generation - Include development timeline and tech stack for Apple app - Add comprehensive Apple Intelligence references and sources
Separate the combined planning document into two focused plans: 1. ON_DEVICE_AI_TESTER_FLUTTER.md (Cross-platform Flutter app) - Android + iOS support - Open-source models (Llama, Gemma, Phi, Whisper) - llama.cpp / GGUF format - Hugging Face Hub integration - Complete architecture and implementation details 2. APPLE_INTELLIGENCE_STUDIO.md (Native Swift app) - iOS/macOS only (requires iOS 26+) - Apple Foundation Models (~3B on-device LLM) - Guided generation with @generable macros - Tool calling playground - Writing Tools & Image Playground integration - Experimental App Builder for SwiftUI code generation - Siri/Shortcuts via App Intents Research confirmed: - Flutter CAN access Apple Foundation Models via platform packages - However, two apps make sense due to different target audiences - Apple-exclusive features require native Swift integration
Research and architecture plan for a cross-platform (Flutter) mobile app that enables users to:
Includes tech stack decisions, architecture diagrams, implementation phases,
and references to key frameworks (TensorFlow Lite, Core ML, MLC LLM).