browser-llm

Run Large Language Models locally - better than edge, it's already in your browser 💪

Key Features

💰 No Fees: No keys, no costs, no quotas
🏎️ Fast Inference: Runs on WASM with WebGPU acceleration
🔒 Privacy First: Pure client-side processing
🏕️ Offline Ready: Download model once, use anywhere
🔄 Streaming: Token-by-token output with minimal latency
📱 Device Agnostic: Just needs a modern browser with sufficient memory for the model

The application is built with vanilla JavaScript and uses emerging web standards:

WebAssembly (WASM): Core runtime for model inference
WebGPU: Hardware acceleration for supported devices
Web Workers: Offloads model inference to prevent UI blocking
transformers.js: Runs transformer models directly in the browser
onnxruntime-web: Optimized inference engine
Model Loading: LRU caching system (max 3 models) with quantization fallback (4-bit → 8-bit)

Feature	Chrome	Firefox	Safari	Edge
WASM	✅	✅	✅	✅
WebGPU	✅	🚧	🚧	✅

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
image.png		image.png
index.html		index.html
script.js		script.js
style.css		style.css
worker.js		worker.js