Skip to content

Add comprehensive mobile on-device AI tester app planning document#1

Open
zzfadi wants to merge 3 commits intomainfrom
claude/ai-model-app-planning-L6pRI
Open

Add comprehensive mobile on-device AI tester app planning document#1
zzfadi wants to merge 3 commits intomainfrom
claude/ai-model-app-planning-L6pRI

Conversation

@zzfadi
Copy link
Owner

@zzfadi zzfadi commented Dec 24, 2025

Research and architecture plan for a cross-platform (Flutter) mobile app that enables users to:

  • Download and manage on-device AI models from Hugging Face
  • Run LLMs locally using llama.cpp (GGUF format)
  • Test vision models, audio/speech, and embeddings
  • Benchmark model performance on device

Includes tech stack decisions, architecture diagrams, implementation phases,
and references to key frameworks (TensorFlow Lite, Core ML, MLC LLM).

Research and architecture plan for a cross-platform (Flutter) mobile app that enables users to:
- Download and manage on-device AI models from Hugging Face
- Run LLMs locally using llama.cpp (GGUF format)
- Test vision models, audio/speech, and embeddings
- Benchmark model performance on device

Includes tech stack decisions, architecture diagrams, implementation phases,
and references to key frameworks (TensorFlow Lite, Core ML, MLC LLM).
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive 830-line planning document for a cross-platform mobile application focused on on-device AI model testing. However, there is a critical discrepancy: the existing repository is for a web-based Gemini API Tester, while this document describes a completely different Flutter mobile app for local AI model management.

Key additions in the document:

  • Technology stack analysis comparing Flutter vs React Native for on-device AI
  • Detailed architecture with native bridges for llama.cpp integration on Android/iOS
  • Code examples for model downloading, inference services, and platform-specific implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hive_flutter: ^1.1.0

# ML/AI
flutter_llama: ^0.1.0 # llama.cpp binding
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package 'flutter_llama: ^0.1.0' appears to be specified with a very early version (0.1.0). This version may not be stable or feature-complete for production use. Consider verifying that this is the correct and most recent version available, and note in the documentation that this is an early-stage dependency that may require updates or have breaking changes.

Suggested change
flutter_llama: ^0.1.0 # llama.cpp binding
flutter_llama: ^0.1.0 # llama.cpp binding (early-stage; verify latest stable version and expect breaking changes before production)

Copilot uses AI. Check for mistakes.
Comment on lines +467 to +477
Future<List<ModelInfo>> searchGGUFModels(String query) async {
final response = await _dio.get(
'$_baseUrl/models',
queryParameters: {
'search': query,
'filter': 'gguf',
'sort': 'downloads',
'direction': -1,
'limit': 50,
},
);
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hugging Face API endpoint uses '/api/models' but the actual Hugging Face API base URL should be 'https://huggingface.co/api' (which is correct), however, the query parameters used may not match the actual Hugging Face API specification. Specifically, the 'filter' parameter with value 'gguf' is not a standard Hugging Face API parameter. The correct approach would be to use tags or search within model card content. Please verify this API call matches the actual Hugging Face API specification.

Copilot uses AI. Check for mistakes.
Comment on lines +402 to +427
Future<void> downloadModel(ModelInfo model) async {
final task = DownloadTask(
id: uuid.v4(),
modelId: model.id,
url: model.downloadUrl,
totalSize: model.sizeBytes,
downloadedSize: 0,
status: DownloadStatus.pending,
);

await _taskBox.put(task.id, task);

final response = await _dio.download(
model.downloadUrl,
model.localPath,
onReceiveProgress: (received, total) {
_updateProgress(task.id, received, total);
},
options: Options(
headers: {
'Range': 'bytes=${task.downloadedSize}-', // Resume support
},
),
);
}
}
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download manager code lacks error handling for network failures, disk space issues, or invalid URLs. The download method should include try-catch blocks to handle DioError exceptions, check available disk space before downloading, and validate the downloaded file. Additionally, there's no cleanup logic if the download fails partway through.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +516
val modelPath = call.argument<String>("path")!!
val nCtx = call.argument<Int>("contextSize") ?: 2048

llamaContext = LlamaCpp.loadModel(modelPath, nCtx)
result.success(llamaContext != 0L)
}
"startGeneration" -> {
val prompt = call.argument<String>("prompt")!!
val config = GenerationConfig.fromMap(call.arguments as Map<*, *>)

Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kotlin plugin code uses non-null assertions ('!!') which will crash the app if the arguments are null or missing. While Flutter should send the correct arguments, using non-null assertions is risky in plugin code. Consider using safe calls with proper error handling instead, or at minimum add validation before the assertion.

Suggested change
val modelPath = call.argument<String>("path")!!
val nCtx = call.argument<Int>("contextSize") ?: 2048
llamaContext = LlamaCpp.loadModel(modelPath, nCtx)
result.success(llamaContext != 0L)
}
"startGeneration" -> {
val prompt = call.argument<String>("prompt")!!
val config = GenerationConfig.fromMap(call.arguments as Map<*, *>)
val modelPath = call.argument<String>("path")
if (modelPath.isNullOrEmpty()) {
result.error(
"INVALID_ARGUMENT",
"Argument 'path' is required for loadModel",
null
)
return
}
val nCtx = call.argument<Int>("contextSize") ?: 2048
llamaContext = LlamaCpp.loadModel(modelPath, nCtx)
result.success(llamaContext != 0L)
}
"startGeneration" -> {
val prompt = call.argument<String>("prompt")
if (prompt.isNullOrEmpty()) {
result.error(
"INVALID_ARGUMENT",
"Argument 'prompt' is required for startGeneration",
null
)
return
}
val arguments = call.arguments
val config = if (arguments is Map<*, *>) {
GenerationConfig.fromMap(arguments)
} else {
result.error(
"INVALID_ARGUMENT",
"Expected map arguments for startGeneration",
null
)
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +447 to +455
Stream<String> generateText(String prompt, GenerationConfig config) {
_channel.invokeMethod('startGeneration', {
'prompt': prompt,
'temperature': config.temperature,
'maxTokens': config.maxTokens,
'topP': config.topP,
});

return _eventChannel.receiveBroadcastStream().map((token) => token as String);
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LlamaInferenceService implementation invokes 'startGeneration' but doesn't wait for a result or handle potential errors from the method call. The stream will be returned immediately, but if the native method fails, the stream might not emit any events or error signals. Consider checking the method call result and handling failures before returning the stream.

Suggested change
Stream<String> generateText(String prompt, GenerationConfig config) {
_channel.invokeMethod('startGeneration', {
'prompt': prompt,
'temperature': config.temperature,
'maxTokens': config.maxTokens,
'topP': config.topP,
});
return _eventChannel.receiveBroadcastStream().map((token) => token as String);
Stream<String> generateText(String prompt, GenerationConfig config) async* {
try {
await _channel.invokeMethod('startGeneration', {
'prompt': prompt,
'temperature': config.temperature,
'maxTokens': config.maxTokens,
'topP': config.topP,
});
} catch (e) {
yield* Stream.error(e);
return;
}
yield* _eventChannel
.receiveBroadcastStream()
.map((token) => token as String);

Copilot uses AI. Check for mistakes.
Comment on lines +552 to +572
llamaContext = llama_load_model(modelPath, contextSize)
result(llamaContext != nil)

case "startGeneration":
guard let args = call.arguments as? [String: Any],
let prompt = args["prompt"] as? String else {
result(FlutterError(code: "INVALID_ARGS", message: nil, details: nil))
return
}

DispatchQueue.global(qos: .userInitiated).async { [weak self] in
llama_generate(self?.llamaContext, prompt) { token in
DispatchQueue.main.async {
self?.eventSink?(token)
}
}
DispatchQueue.main.async {
self?.eventSink?(FlutterEndOfEventStream)
}
}
result(true)
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Swift code references 'llama_load_model' and 'llama_generate' functions that are assumed to exist but are not defined. These function signatures and their callback mechanisms need to be properly documented or defined, especially the closure parameter for 'llama_generate'. The implementation should clarify how the C++ bridge exposes these functions and their expected signatures.

Copilot uses AI. Check for mistakes.
Comment on lines +610 to +636
FOREIGN KEY (model_id) REFERENCES models(id)
);

-- Messages table
CREATE TABLE messages (
id TEXT PRIMARY KEY,
conversation_id TEXT NOT NULL,
role TEXT NOT NULL, -- user, assistant, system
content TEXT NOT NULL,
tokens INTEGER,
generation_time_ms INTEGER,
created_at INTEGER NOT NULL,
FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);

-- Benchmarks table
CREATE TABLE benchmarks (
id TEXT PRIMARY KEY,
model_id TEXT NOT NULL,
device_info TEXT NOT NULL, -- JSON
prompt_eval_tps REAL,
generation_tps REAL,
memory_mb REAL,
load_time_ms INTEGER,
created_at INTEGER NOT NULL,
FOREIGN KEY (model_id) REFERENCES models(id)
);
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database schema lacks proper cascade delete behavior for foreign key constraints. When a model, conversation, or other parent entity is deleted, related records could become orphaned. Consider adding 'ON DELETE CASCADE' or 'ON DELETE SET NULL' clauses to the foreign key definitions to maintain referential integrity.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# On-Device AI Model Tester - Mobile App Planning Document

## Executive Summary

A cross-platform mobile application for Android and iOS that enables users to download, manage, and run on-device AI models locally. The app provides a unified interface for testing various model types (LLMs, vision, audio, embeddings) without requiring cloud connectivity.
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a significant discrepancy between the PR content and the existing repository. The repository README indicates this is a "Gemini API Tester" for web-based testing of Google's Gemini and Veo APIs, but this planning document describes a completely different mobile app for on-device AI model testing. This planning document appears to be unrelated to the current project's purpose. Please clarify if this is intentional or if the document was added to the wrong repository.

Copilot uses AI. Check for mistakes.
Comment on lines +434 to +438
abstract class InferenceService {
Future<void> loadModel(String modelPath, ModelConfig config);
Future<void> unloadModel();
Stream<String> generateText(String prompt, GenerationConfig config);
Future<List<double>> generateEmbedding(String text);
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InferenceService interface is missing critical lifecycle methods and error handling. Consider adding methods for: checking if a model is currently loaded, getting model information/metadata, handling memory warnings, and canceling ongoing inference operations. Additionally, all methods should specify what exceptions they might throw for proper error handling by consumers.

Suggested change
abstract class InferenceService {
Future<void> loadModel(String modelPath, ModelConfig config);
Future<void> unloadModel();
Stream<String> generateText(String prompt, GenerationConfig config);
Future<List<double>> generateEmbedding(String text);
abstract class InferenceService {
/// Loads a model from [modelPath] with the given [config].
///
/// Throws:
/// - [ArgumentError] if [modelPath] or [config] is invalid.
/// - [StateError] if another model is already loaded and cannot be replaced.
/// - [Exception] for I/O or platform‐specific errors while loading the model.
Future<void> loadModel(String modelPath, ModelConfig config);
/// Unloads the currently loaded model and frees associated resources.
///
/// Safe to call even if no model is currently loaded.
///
/// Throws:
/// - [Exception] if the underlying runtime fails to release resources.
Future<void> unloadModel();
/// Returns `true` if a model is currently loaded and ready for inference.
bool get isModelLoaded;
/// Returns metadata for the currently loaded model, or `null` if none is loaded.
///
/// Throws:
/// - [Exception] if the underlying runtime cannot provide model information.
Future<ModelInfo?> getLoadedModelInfo();
/// Notifies the service of memory pressure so it can proactively release
/// caches or other non‑critical data.
///
/// Implementations should make best‑effort to reduce memory usage without
/// impacting correctness.
///
/// Throws:
/// - [Exception] if memory cleanup fails in a recoverable but noteworthy way.
Future<void> handleMemoryWarning();
/// Cancels any ongoing inference operations managed by this service.
///
/// Implementations should make this method idempotent.
///
/// Throws:
/// - [Exception] if cancellation fails or is not supported by the backend.
Future<void> cancelAllInference();
/// Starts text generation for the given [prompt] and [config].
///
/// The returned [Stream] emits tokens (or partial text chunks) as they are
/// produced by the underlying model.
///
/// Throws:
/// - [StateError] if no model is loaded.
/// - [ArgumentError] if [prompt] or [config] is invalid.
/// - [Exception] for runtime/platform‑level inference errors.
Stream<String> generateText(String prompt, GenerationConfig config);
/// Computes an embedding vector for the given [text].
///
/// Throws:
/// - [StateError] if no model is loaded.
/// - [ArgumentError] if [text] is invalid or empty when not allowed.
/// - [Exception] for runtime/platform‑level inference errors.
Future<List<double>> generateEmbedding(String text);
/// Classifies the provided image data and returns a [ClassificationResult].
///
/// Throws:
/// - [StateError] if no model is loaded.
/// - [ArgumentError] if [imageData] is invalid or corrupted.
/// - [Exception] for runtime/platform‑level inference errors.

Copilot uses AI. Check for mistakes.
Comment on lines +517 to +527
thread {
LlamaCpp.generate(llamaContext, prompt, config) { token ->
mainHandler.post {
eventSink?.success(token)
}
}
mainHandler.post {
eventSink?.endOfStream()
}
}
result.success(true)
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kotlin code uses 'thread { }' to launch a background thread, but this is not a safe or recommended approach for Flutter plugins. Consider using a proper executor or coroutines instead. Additionally, 'mainHandler' is referenced but never defined in this code snippet. For production code, you should use 'Handler(Looper.getMainLooper())' or migrate to Kotlin Coroutines with proper dispatchers.

Copilot uses AI. Check for mistakes.
- Research Apple Foundation Models framework (iOS 26+, ~3B on-device LLM)
- Document guided generation, tool calling, and Swift macros
- Recommend two separate apps due to platform constraints:
  1. On-Device AI Tester (Flutter) - cross-platform, open-source models
  2. Apple Intelligence Studio (Swift) - Apple-only, Foundation Models
- Add App Builder concept for future iOS app generation
- Include development timeline and tech stack for Apple app
- Add comprehensive Apple Intelligence references and sources
Separate the combined planning document into two focused plans:

1. ON_DEVICE_AI_TESTER_FLUTTER.md (Cross-platform Flutter app)
   - Android + iOS support
   - Open-source models (Llama, Gemma, Phi, Whisper)
   - llama.cpp / GGUF format
   - Hugging Face Hub integration
   - Complete architecture and implementation details

2. APPLE_INTELLIGENCE_STUDIO.md (Native Swift app)
   - iOS/macOS only (requires iOS 26+)
   - Apple Foundation Models (~3B on-device LLM)
   - Guided generation with @generable macros
   - Tool calling playground
   - Writing Tools & Image Playground integration
   - Experimental App Builder for SwiftUI code generation
   - Siri/Shortcuts via App Intents

Research confirmed:
- Flutter CAN access Apple Foundation Models via platform packages
- However, two apps make sense due to different target audiences
- Apple-exclusive features require native Swift integration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants