Production-Ready Python Implementation with modern tooling (Hatch, Ruff, Mypy) 62/62 Tests Passing • 81.76% Coverage • Zero Security Issues State-of-the-Art Algorithms & Trends for Edge AI and Embedded Systems
This project uses modern Python tooling with Hatch for dependency management and development workflows.
# Clone the repository
git clone https://github.com/umitkacar/ai-edge-computing-tiny-embedded.git
cd ai-edge-computing-tiny-embedded
# Install dependencies (using hatch)
pip install hatch
# Run tests
hatch run test
# Run full CI pipeline
hatch run ciModern Python Stack:
- Build System: Hatch - Modern Python project manager
- Linting: Ruff - Ultra-fast Python linter (100x faster than flake8)
- Formatting: Black - The uncompromising code formatter
- Type Checking: Mypy - Static type checker (strict mode)
- Testing: Pytest - Comprehensive test framework
- Security: Bandit - Security vulnerability scanner
- Pre-commit: Automated quality checks on commit/push
Available Commands:
# Linting & Formatting
hatch run lint # Run Ruff linter
hatch run format # Format code with Black
hatch run format-check # Check formatting without changes
# Type Checking
hatch run type-check # Run Mypy strict type checking
# Testing
hatch run test # Run tests (sequential)
hatch run test-parallel # Run tests with auto workers
hatch run test-parallel-cov # Parallel tests with coverage
# Security
hatch run security # Run Bandit security audit
# Complete CI Pipeline
hatch run ci # Run all checks (format, lint, type-check, security, test)ai-edge-computing-tiny-embedded/
├── src/ai_edge_tinyml/ # Source code (src layout)
│ ├── __init__.py # Package initialization
│ ├── quantization.py # INT8/INT4/FP16 quantization
│ ├── model_optimizer.py # Model optimization pipeline
│ ├── utils.py # Utility functions
│ └── py.typed # PEP 561 marker (typed package)
├── tests/ # Test suite (62 tests, 81.76% coverage)
│ ├── conftest.py # Pytest configuration & fixtures
│ ├── test_quantization.py # Quantization tests (21 tests)
│ ├── test_model_optimizer.py # Optimizer tests (19 tests)
│ └── test_utils.py # Utility tests (22 tests)
├── pyproject.toml # Project configuration (single source of truth)
├── .pre-commit-config.yaml # Pre-commit hooks configuration
├── CHANGELOG.md # Detailed change history
├── LESSONS-LEARNED.md # Best practices & insights
├── DEVELOPMENT.md # Development guidelines
└── README.md # This file
This project maintains production-ready code quality:
| Check | Status | Details |
|---|---|---|
| Ruff Linting | ✅ PASS | 50+ rules, zero errors |
| Black Formatting | ✅ PASS | Line length: 100 |
| Mypy Type Check | ✅ PASS | Strict mode enabled |
| Bandit Security | ✅ PASS | 0 vulnerabilities |
| Test Suite | ✅ PASS | 62/62 tests passing |
| Code Coverage | ✅ PASS | 81.76% (exceeds 80%) |
| Pre-commit Hooks | ✅ PASS | 15+ automated checks |
Test Results:
tests/test_quantization.py 21 passed
tests/test_model_optimizer.py 19 passed
tests/test_utils.py 22 passed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total: 62 passed in 0.50s ✅
Coverage: 81.76% (exceeds 80% threshold) ✅
- Bandit Security Audit: Zero vulnerabilities detected
- Type Safety: Full type annotations with mypy strict mode
- Dependency Scanning: Automated security checks in CI
- Pre-commit Hooks: Security validations before commit
- CHANGELOG.md - Detailed version history and changes
- LESSONS-LEARNED.md - Best practices, insights, and technical decisions
- DEVELOPMENT.md - Comprehensive development guidelines
- API Documentation: Auto-generated from Google-style docstrings
Quantization Support:
- ✅ INT8 Quantization (8-bit integers)
- ✅ INT4 Quantization (4-bit integers)
- ✅ FP16 Quantization (16-bit floats)
- ✅ Dynamic Quantization
- ✅ Symmetric & Asymmetric modes
- ✅ Per-tensor & per-channel quantization
Model Optimization:
- ✅ Weight quantization with 6 different modes
- ✅ Compression ratio analysis
- ✅ Model size calculation
- ✅ Type-safe APIs with full annotations
- ✅ Comprehensive error handling
Example Usage:
import numpy as np
from ai_edge_tinyml import Quantizer, QuantizationConfig, QuantizationMode
# Create quantization config
config = QuantizationConfig(
mode=QuantizationMode.INT8,
symmetric=True,
per_channel=False
)
# Initialize quantizer
quantizer = Quantizer(config)
# Quantize weights
weights = np.random.randn(100, 100).astype(np.float32)
quantized = quantizer.quantize(weights)
# Dequantize for inference
dequantized = quantizer.dequantize(quantized)
# Calculate compression
from ai_edge_tinyml.utils import calculate_compression_ratio
ratio = calculate_compression_ratio(weights, quantized)
print(f"Compression ratio: {ratio:.2f}x")
✨ Key Features:
📚 Resources: 📖 Ultralytics Docs → https://docs.ultralytics.com/models/
📄 YOLO Evolution → https://arxiv.org/html/2510.09653v2 |
📊 Performance Metrics:
📚 Resources: 📄 Paper → https://arxiv.org/pdf/2405.14458
📖 Docs → https://docs.ultralytics.com/models/yolov10/ |
🎯 First practical real-time detection transformer
| Model | AP Score | FPS | Device |
|---|---|---|---|
| RT-DETR | 53.1% | 108 | NVIDIA T4 |
| RT-DETRv2 | >55% | 108+ | NVIDIA T4 |
🔗 Resources:
graph LR
A[🖼️ Input Image] --> B[📱 MobileNetV4]
A --> C[⚡ EfficientViT]
B --> D[🎯 87% Accuracy]
C --> E[🔥 3.8ms Latency]
D --> F[📲 Edge TPU]
E --> F
style A fill:#e1f5ff
style B fill:#ffe1f5
style C fill:#f5ffe1
style D fill:#ffe1e1
style E fill:#e1ffe1
style F fill:#ffd700
🎨 Innovations:
📚 Resources: |
✨ Features:
|
|
📊 Variants: Model: Phi-3-mini
Parameters: 3.8B
Context: Up to 128K tokens
Deployment: GPU, CPU, Mobile
Status: ✅ Production Ready🎯 Optimized For:
🔗 Resources: |
📊 Specifications: Parameters: 1.1B
Target: Mobile/Edge devices
Performance: High for size class
Year: 2024
Status: ✅ Active✨ Highlights:
|
|
📱 On-device AI for Smartphones Variants:
🎯 Capabilities:
|
🖼️ Edge AI & Vision Capabilities Features:
🔗 Resources: |
🎨 Efficient vision-language model for mobile devices
Specifications:
- 🔹 mobileLLaMA: 2.7B parameters
- 🔹 Trained from scratch on open datasets
- 🔹 Fully optimized for mobile deployment
- 🔹 Vision + Language capabilities
🚀 Performance Highlights:
📊 Advantages: + ✅ Linear time complexity
+ ✅ 5x throughput improvement
+ ✅ Efficient long sequences
+ ✅ Lower memory footprint
- ❌ Newer architecture (less tested)📚 Resources: |
✨ Features: Design: End-to-end hardware acceleration
Target: Edge platforms
Complexity: Linear time
Status: 2024 Release🎯 Optimizations:
📚 Resources: |
📊 Performance: + 70% faster than llama.cpp on RTX 4090
+ State-of-the-art optimizations
+ Quality maintained across precisions✨ Features:
🔗 Resources: |
🎯 Innovations:
🖥️ Supported Hardware: AMD: GPU support
Google: TPU support
AWS: Inferentia support
Base: PyTorch🔗 Resources: |
Features:
💻 Hardware Support:
🔗 Resources: |
Advantages: + ✅ Lower memory usage
+ ✅ No GPU required
+ ✅ Fast generation
+ ✅ Cross-platform
+ ✅ Wide model support🔗 Comparison: |
|
Activation-aware Weight Quantization
Key Concept: # Not all weights are equal!
if is_salient(weight):
skip_quantization()
else:
quantize_weight()Features:
🔗 Resources: |
GPU-Focused Quantization Features:
Achievements: Models: BLOOM, OPT-175B
Precision: 4-bit
Platform: GPU optimized |
Efficient Fine-tuning Innovations:
Capability: + Fine-tune 65B model
+ On single GPU
+ Maintain quality |
🔥 Latest quantization innovation
Features:
- Built on BitsandBytes
- Dynamic parameter quantization
- Per-parameter optimization
📚 Comprehensive Guides:
🤖 Automate neural network architecture design
Concept: Train once, deploy everywhere
graph TD
A[🌐 Supernet Training] --> B[📦 Weight Sharing]
B --> C[📱 Mobile]
B --> D[💻 Desktop]
B --> E[⚡ Edge]
style A fill:#e1f5ff
style B fill:#ffe1f5
style C fill:#f5ffe1
style D fill:#ffe1e1
style E fill:#ffd700
Features:
- 🔹 Weight-sharing supernetwork
- 🔹 Represents any architecture in search space
- 🔹 Massive computational savings
- 🔹 Applied to ImageNet with ProxylessNAS & MobileNetV3
🔗 Resources:
Performance Metrics: Accuracy: 96.8% of BERT-base
Size: 7.5x smaller (4 layers)
Energy: Lowest variability (0.1032 kWh SD)
Stages: Task-agnostic + Task-specificAdvantages:
|
Performance Metrics: Accuracy: 97% of BERT
Size Reduction: 40% smaller
Speed: 60% faster
Use Case: General-purposeRecent Research (2025):
|
📚 Resources:
|
Foundation:
|
Achievements: ImageNet: 71.8% accuracy
Visual Wake: >90% (32kB SRAM)
Capability: Object detection
Platform: Tiny devices |
Latest:
|
|
🔧 TinyTL
|
⚙️ PockEngine
|
📚 Resources:
🎯 Evolution from TinyML to deep learning on edge
Focus Areas:
- 🔹 Deep learning on ultra-constrained hardware
- 🔹 Power consumption in mW range
- 🔹 On-device sensor analytics
- 🔹 Real-time inference
📄 Resources:
|
Specifications: Compute: 67 INT8 TOPS
Performance: 1.7x vs previous Orin
Price: $249
Release: Late 2024
Status: ✅ AvailableFeatures:
|
Hardware Platforms:
|
| Platform | Architecture | Use Case |
|---|---|---|
| 🔧 ARM CPUs | ARM Cortex | General compute |
| 📡 Mobile DSPs | Qualcomm/MediaTek | Signal processing |
| 🎮 Mobile GPUs | Mali/Adreno | Graphics + AI |
| 🧠 NPUs | Custom ASICs | Neural processing |
Cross-platform inference with ONNX models
Tools & Resources:
🔥 Click to expand YOLO implementations
- ⚡ YOLOv8-TensorRT-CPP
- 🔧 TensorRT C++ API
- 🐍 YOLOv8-TensorRT (Python + C++)
- 🤸 YOLO Pose C++
- 📚 TensorRT Samples
- 📺 YOLOv8 TensorRT Tutorial
🚀 NVIDIA's high-performance deep learning inference optimizer
Resources:
Resources:
Features:
Resources: |
Resources:
Resources: |
🎨 Machine learning framework for iOS/macOS
📦 Click to expand CoreML resources
Resources:
Resources:
Resources: |
Resources:
Resources:
Resources: |
🎯 AI-Driven Optimizer for Deep Neural Networks
Focus:
| ⚡ Faster Inference |
📦 Smaller Models |
🔋 Energy Efficient |
☁️ Cloud to Edge |
🎯 Maintain Accuracy |
🔗 Resources:
Resources: |
Resources: |
Resources: |
Resources: |
|
Mobile & |
Open-Source |
NVIDIA |
Apple |
Cross- |
Graphics & |
🔍 Click to expand research papers
- 📄 Deep Learning With Edge Computing: A Review
- 📄 Convergence of Edge Computing and Deep Learning
- 📄 Machine Learning at the Network Edge
- 📄 Edge Deep Learning in CV & Medical Diagnostics
- 📄 From Tiny ML to Tiny DL: A Survey (2024)
- 📄 EtinyNet: Extremely Tiny Network
- 📄 Ultra-low Power TinyML System
- 📄 Mamba: Linear-Time Sequence Modeling
- 📄 Mamba-360: Survey of SSMs
- 📄 eMamba: Efficient Edge Acceleration
- 📄 MobileNetV4 (ECCV 2024)
- 📄 ViT for Mobile/Edge Devices
- 📄 YOLO Evolution: v5 to YOLO26
- 📄 YOLOv10: Real-Time Detection
This repository serves as a comprehensive resource for AI edge computing and TinyML practitioners.
Contributions, updates, and corrections are welcome! 🚀
TinyML • Edge AI • Embedded ML • Model Compression • Quantization • Neural Architecture Search • YOLO • MobileNet • Transformer • State Space Models • ONNX Runtime • TensorRT • Inference Optimization • MCU • IoT • Real-Time AI
January 2025
