JARVIS is a professional-grade, GPU-accelerated desktop assistant and 3D audio visualizer. It integrates high-speed AI grounding with a translucent user interface to provide a seamless bridge between human interaction and computer control.
Screen.Recording.2026-01-01.030346.mp4
Screen.Recording.2026-01-01.123536.mp4
JARVIS is designed to act as a central nervous system for your desktop environment, leveraging the Gemini 2.0 Flash Live API for real-time multi-modal communication. It combines specialized system-level tools with web-grounded intelligence to handle everything from file management to complex research queries instantly.
- GPU-Accelerated Rendering: Utilizes Three.js for butter-smooth visual performance at 60+ FPS.
- Floating HUD Aesthetics: A frameless, transparent UI that hovers non-intrusively over any open application.
- Real-Time Audio Reactivity: The visualizer sphere pulses and transforms dynamically based on your voice or Jarvis's responses.
- Ambient Color Cycling: Features smooth HSL color transitions during idle states to maintain an active visual presence.
- Google Search Grounding: Uses native Google Search retrieval to provide factual, up-to-date information instantly.
- Streaming Speech-to-Text: Real-time user transcription displayed in the HUD as you speak, powered by Gemini Live.
- Low-Latency Neural Link: Optimized 16ms audio capture with industry-grade buffering for near-instant conversational turn-around.
- Grounded Intel HUD: A dedicated side-panel manifests to display source links and citations for all web-retrieved facts.
- Image Generation: Create high-fidelity illustrations directly through voice commands via the integrated image generation tool.
- Visual Intelligence: Jarvis analyzed your live camera feed or current screen contents to provide context-aware insights.
- Visual Cortex Preview: All generated media and camera feeds are displayed in a persistent, non-intrusive container on the right side of the screen.
- Security Hardened Execution: A whitelist-based application launcher and sanitized command handler to ensure safe OS interactions.
- OS Hardware Control: Voice-activated control over volume, brightness, and system power states (Shutdown/Restart).
- File & Navigation Support: Robust handling for folder creation, file deletion, and directory listing across your system.
- Clipboard & Logic: Read from or write to the system clipboard and handle natural language scheduling/reminders.
- Natural Language Scheduling: Set reminders and add events using natural phrasing (e.g., "Remind me to call John in 10 minutes").
- Schedule Management: Retrieve your today's agenda verbally.
- Core Framework: Electron for native OS integration and transparency.
- Build Tool: Vite for high-speed frontend development and HMR.
- AI Logic: Google Generative AI SDK (Gemini 2.0 Flash Live).
- 3D Engine: Three.js for WebGL-based visuals.
- Audio Processing: Web Audio API with custom AudioWorklets for low-latency streaming.
- Node.js (Version 18 or higher).
- A valid Gemini API Key.
- Clone the repository to your local machine.
- Install dependencies using npm install.
- Configure your API credentials in src/config.js (Refer to src/config.example.js).
- Run npm run dev to start the local Vite server.
- Run npm run electron:dev to launch the Jarvis application in development mode.
- Run npm run electron:build to generate the standalone Windows installer.
- The executable will be located in the release folder.
AI Assistant, Audio Visualizer, Electron, Three.js, Gemini AI, Productivity
Developed with 💖 by Ashutosh Kumar Singh (Ashutosh0x)