Skip to content

Lynncc6/Awesome-Edge-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

54 Commits
ย 
ย 

Repository files navigation

๐Ÿ” Awesome Edge LLMs

A comprehensive survey on Edge AI๏ผŒcovering hardware, software, frameworks, applications, performance optimization, and the deployment of LLMs on edge devices.

Open Source Edge Models

The listed models are base model limited to either of the following:

  • Parameter โ‰ค 10B
  • Officially claimed edge models
Model Size Org Time Download Paper
SmalLM3 3B Hugging Face 2025.7.9 ๐Ÿค— ๐Ÿ“–
MiniCPM4 8B OpenBMB 2025.6.6 ๐Ÿค— arXiv
Qwen2.5-Omni 7B Qwen 2025.3.26 ๐Ÿค— arXiv
MiniCPM-o 2.6 8B OpenBMB 2025.1.14 ๐Ÿค— -
Phi-4 14B Microsoft 2025.1.9
2024.12.12(release)
๐Ÿค— arXiv
VITA-1.5 7B VITA 2025.1.6 - arXiv
Megrez-3B-Omni 3B Infinigence 2024.12.16 ๐Ÿค— -
OmniAudio 2.6B Nexa AI 2024.12.12 ๐Ÿค— ๐Ÿ“–
InternVL 2.5 8B OpenGVLab 2024.12.5 ๐Ÿค— -
GLM-Edge 1.5B 2B 4B 5B THUDM 2024.11.29 ๐Ÿค— -
SmalVLM 2B Hugging Face 2024.11.26 ๐Ÿค— ๐Ÿ“–
SmalLM2 135M 360M 1.7B Hugging Face 2024.11.1 ๐Ÿค— ๐Ÿ“–
Ministral 3B 8B Mistral AI 2024.10.16 ๐Ÿค— ๐Ÿ“–
Qwen2.5 0.5B, 1.5B, 3B, 7B Qwen 2024.9.19 ๐Ÿค— ๐Ÿ“–
Pixtral 12B 12B Mistral AI 2024.9.17 ๐Ÿค— ๐Ÿ“–
Qwen2-VL 2B 7B Qwen 2024.8.30 ๐Ÿค— ๐Ÿ“–
Phi 3.5 3.8B 4.1B Microsoft 2024.8.21 ๐Ÿค— -
MiniCPM-V 2.6 8B OpenBMB 2024.8.6 ๐Ÿค— -
SmolLM 135M 360M 1.7B Hugging Face 2024.8.2 ๐Ÿค— ๐Ÿ“–
Gemma2 2B 9B Google 2024.7.31 ๐Ÿค— ๐Ÿ“–
DCLM 7B 7B Apple 2024.7.18 ๐Ÿค— arXiv
Phi-3 3.8B 7B Microsoft 2024.4.23 ๐Ÿค— arXiv
Mistral NeMo 12B Mistral AI 2024.6.18 ๐Ÿค— ๐Ÿ“–
Gemma 2B 7B Google 2024.2.21 ๐Ÿค— ๐Ÿ“–
Mistral 7B 2B 7B Mistral AI 2023.9.27 ๐Ÿค— ๐Ÿ“–

Embodied Model

LLM Inference

Title Date Org Paper
DashInfer-VLM 2025.1 ModelScope ๐Ÿ“–
SparseInfer 2024.11 University of Seoul, etc arXiv
Mooncake 2024.6 Moonshot AI ๐Ÿ“–
flashinfer 2024.2 flashinfer-ai ๐Ÿ“–
inferflow 2024.2 Tencent AI Lab arXiv
PowerInfer 2023.12 SJTU
PETALS 2023.12 HSE University, etc arXiv
TensorRT-LLM 2023.10 NVIDIA -
LightSeq 2023.10 UC Berkeley, etc arXiv
vLLM 2023.9 UC Berkeley, etc arXiv
StreamingLLM 2023.9 Meta AI, etc arXiv
MLC-LLM 2023.5 mlc-ai ๐Ÿ“–
Medusa 2023.9 Tianle Cai, etc ๐Ÿ“–
LightLLM 2023.8 ModelTC -
FastServe 2023.5 Peking University arXiv
SpecInfer 2023.05 Peking University, etc arXiv
Ollama 2023.8 Ollama Inc -
LMDeploy 2023.6 InternLM ๐Ÿ“–
Megatron-LM 2020.5 NVIDIA arXiv

Processor

NVIDIA

โœ… 50 Series @2025

GeForce RTX 5090 GeForce RTX 5080 GeForce RTX 5070 Ti GeForce RTX 5070
NVIDIA CUDA Cores 21760 10752 8960 6144
Shader Cores Blackwell Blackwell Blackwell Blackwell
Tensor Cores (AI) 5th Generation
3352 AI TOPS
5th Generation
1801 AI TOPS
5th Generation
1406 AI TOPS
5th Generation
988 AI TOPS
Ray Tracing Cores 4th Generation
318 TFLOPS
4th Generation
171 TFLOPS
4th Generation
133 TFLOPS
4th Generation
94 TFLOPS
Boost Clock (GHz) 2.41 2.62 2.45 2.51
Base Clock (GHz) 2.01 2.30 2.30 2.16
Standard Memory Config 32 GB GDDR7 16 GB GDDR7 16 GB GDDR7 12 GB GDDR7
Memory Interface Width 512-bit 256-bit 256-bit 192-bit
Price $1999 $999 $749 $549

โœ… 40 Super Series @2024

GPU Specs GeForce RTX 4080 Super GeForce RTX 4070 Ti Super GeForce RTX 4070 Super
CUDA Cores 10,240 8448 7168
Memory Configuration 16 GB GDDR6X 16 GB GDDR6X 12 GB GDDR6X
Memory Interface Width 256-bit 256-bit 256-bit
Memory Bandwidth 736 GB/s 736 GB/s 736 GB/s
Base Clock (GHz) 2.21 GHz 2.31 GHz 1.92 GHz
Boost Clock (GHz) 2.55 GHz 2.61 GHz 2.48 GHz
Graphics Card Power 320W 285W 200W
Recommended PSU 750W 700W 650W
Price $999 $799 $599

โœ… 40 Series @2022

GPU Specs GeForce RTX 4090 GeForce RTX 4080 GeForce RTX 4070 Ti GeForce RTX 4070 GeForce RTX 4060 Ti GeForce RTX 4060
NVIDIA CUDA Cores 16384 9728 7680 5888 4352 3072
Shader Cores Ada Lovelace Ada Lovelace Ada Lovelace Ada Lovelace Ada Lovelace Ada Lovelace
Tensor Cores (AI) 4th Gen
330 AI TFLOPS
4th Gen
200 AI TFLOPS
4th Gen
150 AI TFLOPS
4th Gen
100 AI TFLOPS
4th Gen
90 AI TFLOPS
4th Gen
60 AI TFLOPS
Ray Tracing Cores 3rd Gen
191 TFLOPS
3rd Gen
112 TFLOPS
3rd Gen
92 TFLOPS
3rd Gen
64 TFLOPS
3rd Gen
54 TFLOPS
3rd Gen
35 TFLOPS
Boost Clock (GHz) 2.52 2.51 2.61 2.48 2.54 2.42
Base Clock (GHz) 2.23 2.21 2.31 1.92 2.31 1.83
Standard Memory Config 24 GB GDDR6X 16 GB GDDR6X 12 GB GDDR6X 12 GB GDDR6X 8 GB GDDR6 8 GB GDDR6
Memory Interface Width 384-bit 256-bit 192-bit 192-bit 128-bit 128-bit
Graphics Card Power (W) 450W 320W 285W 200W 160W 115W
Recommended PSU (W) 850W 750W 700W 650W 550W 450W
Price $1,599 $1,199 $799 $599 $399 (8GB)
$499 (16GB)
$299

Hardware Applications

AI Glasses

Name Company Model Time Price
้›ท้ธŸV3 ้›ท้ธŸๅˆ›ๆ–ฐ Qwen 2025.1.7 ยฅ 1799 +
้—ชๆžๆ‹ๆ‹้•œ ้—ชๆž็ง‘ๆŠ€ Qwen Kimi GLM, etc. 2024.12.19 ยฅ999 +
INMO GO2 ๅฝฑ็›ฎ็ง‘ๆŠ€ - 2024.11.29 ยฅ3999
Rokid Glasses Rokid Qwen 2024.11.18 ยฅ2499
Looktech Looktech ChatGPT Claude Gemini 2024.11.16 $199
Ray-Ban Meta Meta AI 2023.9 $299

Reference

Awesome-LLMs-on-device

Awesome-LLM-Inference

ๆ•ฐๅญ—็”Ÿๅ‘ฝๅกๅ…นๅ…‹- AI็กฌไปถๅคงๅ…จ

About

A comprehensive survey on Edge AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published