Add comprehensive mobile on-device AI tester app planning document by zzfadi · Pull Request #1 · zzfadi/gemini-api-tester

zzfadi · 2025-12-24T09:43:55Z

Research and architecture plan for a cross-platform (Flutter) mobile app that enables users to:

Download and manage on-device AI models from Hugging Face
Run LLMs locally using llama.cpp (GGUF format)
Test vision models, audio/speech, and embeddings
Benchmark model performance on device

Includes tech stack decisions, architecture diagrams, implementation phases,
and references to key frameworks (TensorFlow Lite, Core ML, MLC LLM).

Research and architecture plan for a cross-platform (Flutter) mobile app that enables users to: - Download and manage on-device AI models from Hugging Face - Run LLMs locally using llama.cpp (GGUF format) - Test vision models, audio/speech, and embeddings - Benchmark model performance on device Includes tech stack decisions, architecture diagrams, implementation phases, and references to key frameworks (TensorFlow Lite, Core ML, MLC LLM).

Copilot

Pull request overview

This PR adds a comprehensive 830-line planning document for a cross-platform mobile application focused on on-device AI model testing. However, there is a critical discrepancy: the existing repository is for a web-based Gemini API Tester, while this document describes a completely different Flutter mobile app for local AI model management.

Key additions in the document:

Technology stack analysis comparing Flutter vs React Native for on-device AI
Detailed architecture with native bridges for llama.cpp integration on Android/iOS
Code examples for model downloading, inference services, and platform-specific implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T09:51:55Z

MOBILE_APP_PLAN.md

+  hive_flutter: ^1.1.0
+
+  # ML/AI
+  flutter_llama: ^0.1.0  # llama.cpp binding


The package 'flutter_llama: ^0.1.0' appears to be specified with a very early version (0.1.0). This version may not be stable or feature-complete for production use. Consider verifying that this is the correct and most recent version available, and note in the documentation that this is an early-stage dependency that may require updates or have breaking changes.

Suggested change

flutter_llama: ^0.1.0 # llama.cpp binding

flutter_llama: ^0.1.0 # llama.cpp binding (early-stage; verify latest stable version and expect breaking changes before production)

Copilot · 2025-12-24T09:51:55Z

docs/ON_DEVICE_AI_TESTER_FLUTTER.md

+  Future<List<ModelInfo>> searchGGUFModels(String query) async {
+    final response = await _dio.get(
+      '$_baseUrl/models',
+      queryParameters: {
+        'search': query,
+        'filter': 'gguf',
+        'sort': 'downloads',
+        'direction': -1,
+        'limit': 50,
+      },
+    );


The Hugging Face API endpoint uses '/api/models' but the actual Hugging Face API base URL should be 'https://huggingface.co/api' (which is correct), however, the query parameters used may not match the actual Hugging Face API specification. Specifically, the 'filter' parameter with value 'gguf' is not a standard Hugging Face API parameter. The correct approach would be to use tags or search within model card content. Please verify this API call matches the actual Hugging Face API specification.

Copilot · 2025-12-24T09:51:55Z

docs/ON_DEVICE_AI_TESTER_FLUTTER.md

+  Future<void> downloadModel(ModelInfo model) async {
+    final task = DownloadTask(
+      id: uuid.v4(),
+      modelId: model.id,
+      url: model.downloadUrl,
+      totalSize: model.sizeBytes,
+      downloadedSize: 0,
+      status: DownloadStatus.pending,
+    );
+
+    await _taskBox.put(task.id, task);
+
+    final response = await _dio.download(
+      model.downloadUrl,
+      model.localPath,
+      onReceiveProgress: (received, total) {
+        _updateProgress(task.id, received, total);
+      },
+      options: Options(
+        headers: {
+          'Range': 'bytes=${task.downloadedSize}-', // Resume support
+        },
+      ),
+    );
+  }
+}


The download manager code lacks error handling for network failures, disk space issues, or invalid URLs. The download method should include try-catch blocks to handle DioError exceptions, check available disk space before downloading, and validate the downloaded file. Additionally, there's no cleanup logic if the download fails partway through.

Copilot · 2025-12-24T09:51:56Z

docs/ON_DEVICE_AI_TESTER_FLUTTER.md

+                val modelPath = call.argument<String>("path")!!
+                val nCtx = call.argument<Int>("contextSize") ?: 2048
+
+                llamaContext = LlamaCpp.loadModel(modelPath, nCtx)
+                result.success(llamaContext != 0L)
+            }
+            "startGeneration" -> {
+                val prompt = call.argument<String>("prompt")!!
+                val config = GenerationConfig.fromMap(call.arguments as Map<*, *>)
+


The Kotlin plugin code uses non-null assertions ('!!') which will crash the app if the arguments are null or missing. While Flutter should send the correct arguments, using non-null assertions is risky in plugin code. Consider using safe calls with proper error handling instead, or at minimum add validation before the assertion.

Suggested change

val modelPath = call.argument<String>("path")!!

val nCtx = call.argument<Int>("contextSize") ?: 2048

llamaContext = LlamaCpp.loadModel(modelPath, nCtx)

result.success(llamaContext != 0L)

}

"startGeneration" -> {

val prompt = call.argument<String>("prompt")!!

val config = GenerationConfig.fromMap(call.arguments as Map<*, *>)

val modelPath = call.argument<String>("path")

if (modelPath.isNullOrEmpty()) {

result.error(

"INVALID_ARGUMENT",

"Argument 'path' is required for loadModel",

null

)

return

}

val nCtx = call.argument<Int>("contextSize") ?: 2048

llamaContext = LlamaCpp.loadModel(modelPath, nCtx)

result.success(llamaContext != 0L)

}

"startGeneration" -> {

val prompt = call.argument<String>("prompt")

if (prompt.isNullOrEmpty()) {

result.error(

"INVALID_ARGUMENT",

"Argument 'prompt' is required for startGeneration",

null

)

return

}

val arguments = call.arguments

val config = if (arguments is Map<*, *>) {

GenerationConfig.fromMap(arguments)

} else {

result.error(

"INVALID_ARGUMENT",

"Expected map arguments for startGeneration",

null

)

return

}

Copilot · 2025-12-24T09:51:56Z

MOBILE_APP_PLAN.md

+  Stream<String> generateText(String prompt, GenerationConfig config) {
+    _channel.invokeMethod('startGeneration', {
+      'prompt': prompt,
+      'temperature': config.temperature,
+      'maxTokens': config.maxTokens,
+      'topP': config.topP,
+    });
+
+    return _eventChannel.receiveBroadcastStream().map((token) => token as String);


The LlamaInferenceService implementation invokes 'startGeneration' but doesn't wait for a result or handle potential errors from the method call. The stream will be returned immediately, but if the native method fails, the stream might not emit any events or error signals. Consider checking the method call result and handling failures before returning the stream.

Suggested change

Stream<String> generateText(String prompt, GenerationConfig config) {

_channel.invokeMethod('startGeneration', {

'prompt': prompt,

'temperature': config.temperature,

'maxTokens': config.maxTokens,

'topP': config.topP,

});

return _eventChannel.receiveBroadcastStream().map((token) => token as String);

Stream<String> generateText(String prompt, GenerationConfig config) async* {

try {

await _channel.invokeMethod('startGeneration', {

'prompt': prompt,

'temperature': config.temperature,

'maxTokens': config.maxTokens,

'topP': config.topP,

});

} catch (e) {

yield* Stream.error(e);

return;

}

yield* _eventChannel

.receiveBroadcastStream()

.map((token) => token as String);

Copilot · 2025-12-24T09:51:56Z

MOBILE_APP_PLAN.md

+            llamaContext = llama_load_model(modelPath, contextSize)
+            result(llamaContext != nil)
+
+        case "startGeneration":
+            guard let args = call.arguments as? [String: Any],
+                  let prompt = args["prompt"] as? String else {
+                result(FlutterError(code: "INVALID_ARGS", message: nil, details: nil))
+                return
+            }
+
+            DispatchQueue.global(qos: .userInitiated).async { [weak self] in
+                llama_generate(self?.llamaContext, prompt) { token in
+                    DispatchQueue.main.async {
+                        self?.eventSink?(token)
+                    }
+                }
+                DispatchQueue.main.async {
+                    self?.eventSink?(FlutterEndOfEventStream)
+                }
+            }
+            result(true)


The Swift code references 'llama_load_model' and 'llama_generate' functions that are assumed to exist but are not defined. These function signatures and their callback mechanisms need to be properly documented or defined, especially the closure parameter for 'llama_generate'. The implementation should clarify how the C++ bridge exposes these functions and their expected signatures.

Copilot · 2025-12-24T09:51:57Z

docs/ON_DEVICE_AI_TESTER_FLUTTER.md

+    FOREIGN KEY (model_id) REFERENCES models(id)
+);
+
+-- Messages table
+CREATE TABLE messages (
+    id TEXT PRIMARY KEY,
+    conversation_id TEXT NOT NULL,
+    role TEXT NOT NULL,  -- user, assistant, system
+    content TEXT NOT NULL,
+    tokens INTEGER,
+    generation_time_ms INTEGER,
+    created_at INTEGER NOT NULL,
+    FOREIGN KEY (conversation_id) REFERENCES conversations(id)
+);
+
+-- Benchmarks table
+CREATE TABLE benchmarks (
+    id TEXT PRIMARY KEY,
+    model_id TEXT NOT NULL,
+    device_info TEXT NOT NULL,  -- JSON
+    prompt_eval_tps REAL,
+    generation_tps REAL,
+    memory_mb REAL,
+    load_time_ms INTEGER,
+    created_at INTEGER NOT NULL,
+    FOREIGN KEY (model_id) REFERENCES models(id)
+);


The database schema lacks proper cascade delete behavior for foreign key constraints. When a model, conversation, or other parent entity is deleted, related records could become orphaned. Consider adding 'ON DELETE CASCADE' or 'ON DELETE SET NULL' clauses to the foreign key definitions to maintain referential integrity.

Copilot · 2025-12-24T09:51:57Z

MOBILE_APP_PLAN.md

+# On-Device AI Model Tester - Mobile App Planning Document
+
+## Executive Summary
+
+A cross-platform mobile application for Android and iOS that enables users to download, manage, and run on-device AI models locally. The app provides a unified interface for testing various model types (LLMs, vision, audio, embeddings) without requiring cloud connectivity.


There is a significant discrepancy between the PR content and the existing repository. The repository README indicates this is a "Gemini API Tester" for web-based testing of Google's Gemini and Veo APIs, but this planning document describes a completely different mobile app for on-device AI model testing. This planning document appears to be unrelated to the current project's purpose. Please clarify if this is intentional or if the document was added to the wrong repository.

Copilot · 2025-12-24T09:51:57Z

docs/ON_DEVICE_AI_TESTER_FLUTTER.md

+abstract class InferenceService {
+  Future<void> loadModel(String modelPath, ModelConfig config);
+  Future<void> unloadModel();
+  Stream<String> generateText(String prompt, GenerationConfig config);
+  Future<List<double>> generateEmbedding(String text);


The InferenceService interface is missing critical lifecycle methods and error handling. Consider adding methods for: checking if a model is currently loaded, getting model information/metadata, handling memory warnings, and canceling ongoing inference operations. Additionally, all methods should specify what exceptions they might throw for proper error handling by consumers.

Suggested change

abstract class InferenceService {

Future<void> loadModel(String modelPath, ModelConfig config);

Future<void> unloadModel();

Stream<String> generateText(String prompt, GenerationConfig config);

Future<List<double>> generateEmbedding(String text);

abstract class InferenceService {

/// Loads a model from [modelPath] with the given [config].

///

/// Throws:

/// - [ArgumentError] if [modelPath] or [config] is invalid.

/// - [StateError] if another model is already loaded and cannot be replaced.

/// - [Exception] for I/O or platform‐specific errors while loading the model.

Future<void> loadModel(String modelPath, ModelConfig config);

/// Unloads the currently loaded model and frees associated resources.

///

/// Safe to call even if no model is currently loaded.

///

/// Throws:

/// - [Exception] if the underlying runtime fails to release resources.

Future<void> unloadModel();

/// Returns `true` if a model is currently loaded and ready for inference.

bool get isModelLoaded;

/// Returns metadata for the currently loaded model, or `null` if none is loaded.

///

/// Throws:

/// - [Exception] if the underlying runtime cannot provide model information.

Future<ModelInfo?> getLoadedModelInfo();

/// Notifies the service of memory pressure so it can proactively release

/// caches or other non‑critical data.

///

/// Implementations should make best‑effort to reduce memory usage without

/// impacting correctness.

///

/// Throws:

/// - [Exception] if memory cleanup fails in a recoverable but noteworthy way.

Future<void> handleMemoryWarning();

/// Cancels any ongoing inference operations managed by this service.

///

/// Implementations should make this method idempotent.

///

/// Throws:

/// - [Exception] if cancellation fails or is not supported by the backend.

Future<void> cancelAllInference();

/// Starts text generation for the given [prompt] and [config].

///

/// The returned [Stream] emits tokens (or partial text chunks) as they are

/// produced by the underlying model.

///

/// Throws:

/// - [StateError] if no model is loaded.

/// - [ArgumentError] if [prompt] or [config] is invalid.

/// - [Exception] for runtime/platform‑level inference errors.

Stream<String> generateText(String prompt, GenerationConfig config);

/// Computes an embedding vector for the given [text].

///

/// Throws:

/// - [StateError] if no model is loaded.

/// - [ArgumentError] if [text] is invalid or empty when not allowed.

/// - [Exception] for runtime/platform‑level inference errors.

Future<List<double>> generateEmbedding(String text);

/// Classifies the provided image data and returns a [ClassificationResult].

///

/// Throws:

/// - [StateError] if no model is loaded.

/// - [ArgumentError] if [imageData] is invalid or corrupted.

/// - [Exception] for runtime/platform‑level inference errors.

Copilot · 2025-12-24T09:51:57Z

docs/ON_DEVICE_AI_TESTER_FLUTTER.md

+                thread {
+                    LlamaCpp.generate(llamaContext, prompt, config) { token ->
+                        mainHandler.post {
+                            eventSink?.success(token)
+                        }
+                    }
+                    mainHandler.post {
+                        eventSink?.endOfStream()
+                    }
+                }
+                result.success(true)


The Kotlin code uses 'thread { }' to launch a background thread, but this is not a safe or recommended approach for Flutter plugins. Consider using a proper executor or coroutines instead. Additionally, 'mainHandler' is referenced but never defined in this code snippet. For production code, you should use 'Handler(Looper.getMainLooper())' or migrate to Kotlin Coroutines with proper dispatchers.

- Research Apple Foundation Models framework (iOS 26+, ~3B on-device LLM) - Document guided generation, tool calling, and Swift macros - Recommend two separate apps due to platform constraints: 1. On-Device AI Tester (Flutter) - cross-platform, open-source models 2. Apple Intelligence Studio (Swift) - Apple-only, Foundation Models - Add App Builder concept for future iOS app generation - Include development timeline and tech stack for Apple app - Add comprehensive Apple Intelligence references and sources

Separate the combined planning document into two focused plans: 1. ON_DEVICE_AI_TESTER_FLUTTER.md (Cross-platform Flutter app) - Android + iOS support - Open-source models (Llama, Gemma, Phi, Whisper) - llama.cpp / GGUF format - Hugging Face Hub integration - Complete architecture and implementation details 2. APPLE_INTELLIGENCE_STUDIO.md (Native Swift app) - iOS/macOS only (requires iOS 26+) - Apple Foundation Models (~3B on-device LLM) - Guided generation with @generable macros - Tool calling playground - Writing Tools & Image Playground integration - Experimental App Builder for SwiftUI code generation - Siri/Shortcuts via App Intents Research confirmed: - Flutter CAN access Apple Foundation Models via platform packages - However, two apps make sense due to different target audiences - Apple-exclusive features require native Swift integration

zzfadi requested a review from Copilot December 24, 2025 09:45

Copilot started reviewing on behalf of zzfadi December 24, 2025 09:45 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

claude added 2 commits December 24, 2025 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comprehensive mobile on-device AI tester app planning document#1

Add comprehensive mobile on-device AI tester app planning document#1
zzfadi wants to merge 3 commits intomainfrom
claude/ai-model-app-planning-L6pRI

zzfadi commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	flutter_llama: ^0.1.0 # llama.cpp binding
	flutter_llama: ^0.1.0 # llama.cpp binding (early-stage; verify latest stable version and expect breaking changes before production)

-                val modelPath = call.argument<String>("path")!!
-                val nCtx = call.argument<Int>("contextSize") ?: 2048
-                llamaContext = LlamaCpp.loadModel(modelPath, nCtx)
-                result.success(llamaContext != 0L)
-            }
-            "startGeneration" -> {
-                val prompt = call.argument<String>("prompt")!!
-                val config = GenerationConfig.fromMap(call.arguments as Map<*, *>)
+                val modelPath = call.argument<String>("path")
+                if (modelPath.isNullOrEmpty()) {
+                    result.error(
+                        "INVALID_ARGUMENT",
+                        "Argument 'path' is required for loadModel",
+                        null
+                    )
+                    return
+                }
+                val nCtx = call.argument<Int>("contextSize") ?: 2048
+                llamaContext = LlamaCpp.loadModel(modelPath, nCtx)
+                result.success(llamaContext != 0L)
+            }
+            "startGeneration" -> {
+                val prompt = call.argument<String>("prompt")
+                if (prompt.isNullOrEmpty()) {
+                    result.error(
+                        "INVALID_ARGUMENT",
+                        "Argument 'prompt' is required for startGeneration",
+                        null
+                    )
+                    return
+                }
+                val arguments = call.arguments
+                val config = if (arguments is Map<*, *>) {
+                    GenerationConfig.fromMap(arguments)
+                } else {
+                    result.error(
+                        "INVALID_ARGUMENT",
+                        "Expected map arguments for startGeneration",
+                        null
+                    )
+                    return
+                }

-abstract class InferenceService {
-  Future<void> loadModel(String modelPath, ModelConfig config);
-  Future<void> unloadModel();
-  Stream<String> generateText(String prompt, GenerationConfig config);
-  Future<List<double>> generateEmbedding(String text);
+abstract class InferenceService {
+  /// Loads a model from [modelPath] with the given [config].
+  ///
+  /// Throws:
+  /// - [ArgumentError] if [modelPath] or [config] is invalid.
+  /// - [StateError] if another model is already loaded and cannot be replaced.
+  /// - [Exception] for I/O or platform‐specific errors while loading the model.
+  Future<void> loadModel(String modelPath, ModelConfig config);
+  /// Unloads the currently loaded model and frees associated resources.
+  ///
+  /// Safe to call even if no model is currently loaded.
+  ///
+  /// Throws:
+  /// - [Exception] if the underlying runtime fails to release resources.
+  Future<void> unloadModel();
+  /// Returns `true` if a model is currently loaded and ready for inference.
+  bool get isModelLoaded;
+  /// Returns metadata for the currently loaded model, or `null` if none is loaded.
+  ///
+  /// Throws:
+  /// - [Exception] if the underlying runtime cannot provide model information.
+  Future<ModelInfo?> getLoadedModelInfo();
+  /// Notifies the service of memory pressure so it can proactively release
+  /// caches or other non‑critical data.
+  ///
+  /// Implementations should make best‑effort to reduce memory usage without
+  /// impacting correctness.
+  ///
+  /// Throws:
+  /// - [Exception] if memory cleanup fails in a recoverable but noteworthy way.
+  Future<void> handleMemoryWarning();
+  /// Cancels any ongoing inference operations managed by this service.
+  ///
+  /// Implementations should make this method idempotent.
+  ///
+  /// Throws:
+  /// - [Exception] if cancellation fails or is not supported by the backend.
+  Future<void> cancelAllInference();
+  /// Starts text generation for the given [prompt] and [config].
+  ///
+  /// The returned [Stream] emits tokens (or partial text chunks) as they are
+  /// produced by the underlying model.
+  ///
+  /// Throws:
+  /// - [StateError] if no model is loaded.
+  /// - [ArgumentError] if [prompt] or [config] is invalid.
+  /// - [Exception] for runtime/platform‑level inference errors.
+  Stream<String> generateText(String prompt, GenerationConfig config);
+  /// Computes an embedding vector for the given [text].
+  ///
+  /// Throws:
+  /// - [StateError] if no model is loaded.
+  /// - [ArgumentError] if [text] is invalid or empty when not allowed.
+  /// - [Exception] for runtime/platform‑level inference errors.
+  Future<List<double>> generateEmbedding(String text);
+  /// Classifies the provided image data and returns a [ClassificationResult].
+  ///
+  /// Throws:
+  /// - [StateError] if no model is loaded.
+  /// - [ArgumentError] if [imageData] is invalid or corrupted.
+  /// - [Exception] for runtime/platform‑level inference errors.

Conversation

zzfadi commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants