Awesome Local LLMs Guide

Run AI models privately on your own hardware — offline, free, under your control.

Why Run Locally?

Concern	Cloud API	Local
Privacy	Data leaves machine	100% local
Cost	Per-token billing	Hardware only
Internet	Required	Not needed

Hardware by Model Size

Size	Min VRAM (Q4)	Speed (t/s)
3B	2GB	60-120
7B	4-5GB	30-60
13B	8GB	15-30
70B	40GB	2-8

Tools

Tool	Best For	API
Ollama	Easiest setup	OpenAI-compatible REST
LM Studio	GUI desktop	OpenAI-compatible REST
llama.cpp	Max performance	CLI
vLLM	Production serving	REST

Quick Start with Ollama

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2
ollama run llama3.2

Model Recommendations

Use Case	Model	Size
Fast chat	Llama 3.2 3B	2GB
Quality chat	Mistral 7B	4GB
Code	DeepSeek Coder 6.7B	4GB

Quantization Guide

Format	Size vs FP16	Quality
Q8_0	50%	Minimal loss
Q4_K_M	28%	Sweet spot
Q2_K	14%	Noticeable loss

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
guides		guides
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Local LLMs Guide

Why Run Locally?

Hardware by Model Size

Tools

Quick Start with Ollama

Model Recommendations

Quantization Guide

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome Local LLMs Guide

Why Run Locally?

Hardware by Model Size

Tools

Quick Start with Ollama

Model Recommendations

Quantization Guide

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages