Inline is a privacy-first VS Code extension that delivers intelligent, AI-powered code completion entirely offline. All processing happens locally on your machineโyour code never leaves your device.
- One-click downloads for recommended models (DeepSeek, CodeLlama, Phi-3)
- Drag & drop
.gguffile import - URL-based model downloads from Hugging Face
- Automatic model validation and metadata detection
- Completions: Context-aware inline suggestions
- Hover Info: Type information and documentation
- Code Actions: AI-powered quick fixes and refactoring
- Diagnostics: Error detection and vulnerability scanning
- Caching: Smart LRU cache for performance
- Node.js 18+
- pnpm (
npm install -g pnpm) - Rust (for native modules): rustup.rs
- C++ Build Tools:
- macOS:
xcode-select --install - Windows: Visual Studio Build Tools (C++ workload)
- Linux:
build-essentialandcmake
- macOS:
./scripts/setup.sh# Install dependencies
pnpm install
# Build everything
pnpm run build
# Run extension (Press F5 in VS Code)# Watch mode (auto-recompile)
pnpm run watch
# Lint code
pnpm run lint
# Type check
pnpm run check-types# All tests
pnpm test
# Unit tests only
pnpm run test:unit
# E2E tests only
pnpm run test:e2erustc --version
# If missing: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh- macOS:
xcode-select --install - Windows: Install Visual Studio Build Tools with C++ workload
- Linux:
sudo apt-get install build-essential cmake
pnpm install
pnpm run build- Check VS Code version (1.85.0+)
- View logs:
Ctrl+Shift+Pโ "Inline: Show Logs" - Rebuild:
pnpm run build
| Tier | VRAM | Recommended Models |
|---|---|---|
| Lightweight | 2-4 GB | CodeGemma-2B, StableCode-3B, Phi-3-mini, TinyLlama-1.1B, Qwen1.5-0.5B-Chat |
| Mid-Tier | 6-8 GB | DeepSeek-Coder-6.7B, StarCoder2-7B, CodeLlama-7B, WizardCoder-Python-7B, Phind-CodeLlama-34B (Q4_0) |
| Heavy | 12GB+ | CodeLlama-13B, Mixtral (Quantized), Dolphin-Mixtral-8x7B (Quantized), CodeLlama-34B |
| Ultra | 24GB+ | CodeLlama-34B, Llama-3-70B (Quantized), Yi-34B-200K, StarCoder2-15B |
All models use GGUF format via llama.cpp.
Press Ctrl+Shift+P and search for Inline:
- Model Manager - Manage and download models
- Toggle Offline Mode - Switch offline/online
- Clear Cache - Free memory
- Download Model from URL - Download from Hugging Face
- Show Logs - View debug information
- Folder Structure: Explanation of the codebase layout.
- Contributing: Guidelines for submitting PRs.
- Testing: How to run the comprehensive test suite.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.