Run Large Language Models locally - better than edge, it's already in your browser ๐ช
- ๐ฐ No Fees: No keys, no costs, no quotas
- ๐๏ธ Fast Inference: Runs on WASM with WebGPU acceleration
- ๐ Privacy First: Pure client-side processing
- ๐๏ธ Offline Ready: Download model once, use anywhere
- ๐ Streaming: Token-by-token output with minimal latency
- ๐ฑ Device Agnostic: Just needs a modern browser with sufficient memory for the model
The application is built with vanilla JavaScript and uses emerging web standards:
- WebAssembly (WASM): Core runtime for model inference
- WebGPU: Hardware acceleration for supported devices
- Web Workers: Offloads model inference to prevent UI blocking
- transformers.js: Runs transformer models directly in the browser
- onnxruntime-web: Optimized inference engine
- Model Loading: LRU caching system (max 3 models) with quantization fallback (4-bit โ 8-bit)
| Feature | Chrome | Firefox | Safari | Edge |
|---|---|---|---|---|
| WASM | โ | โ | โ | โ |
| WebGPU | โ | ๐ง | ๐ง | โ |
