A comprehensive survey on Edge AI๏ผcovering hardware, software, frameworks, applications, performance optimization, and the deployment of LLMs on edge devices.
The listed models are base model limited to either of the following:
- Parameter โค 10B
- Officially claimed edge models
| Model | Size | Org | Time | Download | Paper |
|---|---|---|---|---|---|
| SmalLM3 | 3B | Hugging Face | 2025.7.9 | ๐ค | ๐ |
| MiniCPM4 | 8B | OpenBMB | 2025.6.6 | ๐ค | |
| Qwen2.5-Omni | 7B | Qwen | 2025.3.26 | ๐ค | |
| MiniCPM-o 2.6 | 8B | OpenBMB | 2025.1.14 | ๐ค | - |
| Phi-4 | 14B | Microsoft | 2025.1.9 2024.12.12(release) |
๐ค | |
| VITA-1.5 | 7B | VITA | 2025.1.6 | - | |
| Megrez-3B-Omni | 3B | Infinigence | 2024.12.16 | ๐ค | - |
| OmniAudio | 2.6B | Nexa AI | 2024.12.12 | ๐ค | ๐ |
| InternVL 2.5 | 8B | OpenGVLab | 2024.12.5 | ๐ค | - |
| GLM-Edge | 1.5B 2B 4B 5B | THUDM | 2024.11.29 | ๐ค | - |
| SmalVLM | 2B | Hugging Face | 2024.11.26 | ๐ค | ๐ |
| SmalLM2 | 135M 360M 1.7B | Hugging Face | 2024.11.1 | ๐ค | ๐ |
| Ministral | 3B 8B | Mistral AI | 2024.10.16 | ๐ค | ๐ |
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B | Qwen | 2024.9.19 | ๐ค | ๐ |
| Pixtral 12B | 12B | Mistral AI | 2024.9.17 | ๐ค | ๐ |
| Qwen2-VL | 2B 7B | Qwen | 2024.8.30 | ๐ค | ๐ |
| Phi 3.5 | 3.8B 4.1B | Microsoft | 2024.8.21 | ๐ค | - |
| MiniCPM-V 2.6 | 8B | OpenBMB | 2024.8.6 | ๐ค | - |
| SmolLM | 135M 360M 1.7B | Hugging Face | 2024.8.2 | ๐ค | ๐ |
| Gemma2 | 2B 9B | 2024.7.31 | ๐ค | ๐ | |
| DCLM 7B | 7B | Apple | 2024.7.18 | ๐ค | |
| Phi-3 | 3.8B 7B | Microsoft | 2024.4.23 | ๐ค | |
| Mistral NeMo | 12B | Mistral AI | 2024.6.18 | ๐ค | ๐ |
| Gemma | 2B 7B | 2024.2.21 | ๐ค | ๐ | |
| Mistral 7B | 2B 7B | Mistral AI | 2023.9.27 | ๐ค | ๐ |
Embodied Model
| Title | Date | Org | Paper |
|---|---|---|---|
| DashInfer-VLM | 2025.1 | ModelScope | ๐ |
| SparseInfer | 2024.11 | University of Seoul, etc | |
| Mooncake | 2024.6 | Moonshot AI | ๐ |
| flashinfer | 2024.2 | flashinfer-ai | ๐ |
| inferflow | 2024.2 | Tencent AI Lab | |
| PowerInfer | 2023.12 | SJTU | |
| PETALS | 2023.12 | HSE University, etc | |
| TensorRT-LLM | 2023.10 | NVIDIA | - |
| LightSeq | 2023.10 | UC Berkeley, etc | |
| vLLM | 2023.9 | UC Berkeley, etc | |
| StreamingLLM | 2023.9 | Meta AI, etc | |
| MLC-LLM | 2023.5 | mlc-ai | ๐ |
| Medusa | 2023.9 | Tianle Cai, etc | ๐ |
| LightLLM | 2023.8 | ModelTC | - |
| FastServe | 2023.5 | Peking University | |
| SpecInfer | 2023.05 | Peking University, etc | |
| Ollama | 2023.8 | Ollama Inc | - |
| LMDeploy | 2023.6 | InternLM | ๐ |
| Megatron-LM | 2020.5 | NVIDIA |
โ 50 Series @2025
| GeForce RTX 5090 | GeForce RTX 5080 | GeForce RTX 5070 Ti | GeForce RTX 5070 | |
|---|---|---|---|---|
| NVIDIA CUDA Cores | 21760 | 10752 | 8960 | 6144 |
| Shader Cores | Blackwell | Blackwell | Blackwell | Blackwell |
| Tensor Cores (AI) | 5th Generation 3352 AI TOPS |
5th Generation 1801 AI TOPS |
5th Generation 1406 AI TOPS |
5th Generation 988 AI TOPS |
| Ray Tracing Cores | 4th Generation 318 TFLOPS |
4th Generation 171 TFLOPS |
4th Generation 133 TFLOPS |
4th Generation 94 TFLOPS |
| Boost Clock (GHz) | 2.41 | 2.62 | 2.45 | 2.51 |
| Base Clock (GHz) | 2.01 | 2.30 | 2.30 | 2.16 |
| Standard Memory Config | 32 GB GDDR7 | 16 GB GDDR7 | 16 GB GDDR7 | 12 GB GDDR7 |
| Memory Interface Width | 512-bit | 256-bit | 256-bit | 192-bit |
| Price | $1999 | $999 | $749 | $549 |
โ 40 Super Series @2024
| GPU Specs | GeForce RTX 4080 Super | GeForce RTX 4070 Ti Super | GeForce RTX 4070 Super |
|---|---|---|---|
| CUDA Cores | 10,240 | 8448 | 7168 |
| Memory Configuration | 16 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X |
| Memory Interface Width | 256-bit | 256-bit | 256-bit |
| Memory Bandwidth | 736 GB/s | 736 GB/s | 736 GB/s |
| Base Clock (GHz) | 2.21 GHz | 2.31 GHz | 1.92 GHz |
| Boost Clock (GHz) | 2.55 GHz | 2.61 GHz | 2.48 GHz |
| Graphics Card Power | 320W | 285W | 200W |
| Recommended PSU | 750W | 700W | 650W |
| Price | $999 | $799 | $599 |
โ 40 Series @2022
| GPU Specs | GeForce RTX 4090 | GeForce RTX 4080 | GeForce RTX 4070 Ti | GeForce RTX 4070 | GeForce RTX 4060 Ti | GeForce RTX 4060 |
|---|---|---|---|---|---|---|
| NVIDIA CUDA Cores | 16384 | 9728 | 7680 | 5888 | 4352 | 3072 |
| Shader Cores | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ada Lovelace |
| Tensor Cores (AI) | 4th Gen 330 AI TFLOPS |
4th Gen 200 AI TFLOPS |
4th Gen 150 AI TFLOPS |
4th Gen 100 AI TFLOPS |
4th Gen 90 AI TFLOPS |
4th Gen 60 AI TFLOPS |
| Ray Tracing Cores | 3rd Gen 191 TFLOPS |
3rd Gen 112 TFLOPS |
3rd Gen 92 TFLOPS |
3rd Gen 64 TFLOPS |
3rd Gen 54 TFLOPS |
3rd Gen 35 TFLOPS |
| Boost Clock (GHz) | 2.52 | 2.51 | 2.61 | 2.48 | 2.54 | 2.42 |
| Base Clock (GHz) | 2.23 | 2.21 | 2.31 | 1.92 | 2.31 | 1.83 |
| Standard Memory Config | 24 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X | 12 GB GDDR6X | 8 GB GDDR6 | 8 GB GDDR6 |
| Memory Interface Width | 384-bit | 256-bit | 192-bit | 192-bit | 128-bit | 128-bit |
| Graphics Card Power (W) | 450W | 320W | 285W | 200W | 160W | 115W |
| Recommended PSU (W) | 850W | 750W | 700W | 650W | 550W | 450W |
| Price | $1,599 | $1,199 | $799 | $599 | $399 (8GB) $499 (16GB) |
$299 |
| Name | Company | Model | Time | Price |
|---|---|---|---|---|
| ้ท้ธV3 | ้ท้ธๅๆฐ | Qwen | 2025.1.7 | ยฅ 1799 + |
| ้ชๆๆๆ้ | ้ชๆ็งๆ | Qwen Kimi GLM, etc. | 2024.12.19 | ยฅ999 + |
| INMO GO2 | ๅฝฑ็ฎ็งๆ | - | 2024.11.29 | ยฅ3999 |
| Rokid Glasses | Rokid | Qwen | 2024.11.18 | ยฅ2499 |
| Looktech | Looktech | ChatGPT Claude Gemini | 2024.11.16 | $199 |
| Ray-Ban | Meta | Meta AI | 2023.9 | $299 |