🎥 AI Video Chat

Real-time video chat with AI — it can see you and hear you, then talks back.

Built with Groq APIs for blazing-fast inference. Single file server, no frameworks, runs locally.

How It Works

🎤 You speak → Groq Whisper (STT)
📷 Camera frame → Groq Llama 4 Scout (Vision)
       ↓ (parallel)
🧠 Groq Llama 3.3 70B (Conversation) → combines what it heard + saw
       ↓
🔊 edge-tts (Text-to-Speech) → AI speaks back

All processing runs through Groq's API — no local GPU needed. Typical round-trip: 2-4 seconds.

Quick Start

1. Get a Groq API Key (free)

2. Install & Run

git clone https://github.com/littleshuai-bot/ai-video-chat.git
cd ai-video-chat

# Set your API key
export GROQ_API_KEY=gsk_your_key_here

# Install dependencies
pip install -r requirements.txt

# Run
python server.py

3. Open in Browser

Go to http://localhost:8765 → allow camera & microphone → click 🎤 to talk.

Configuration

Copy .env.example to .env and customize:

cp .env.example .env

Variable	Default	Description
`GROQ_API_KEY`	(required)	Your Groq API key
`AGENT_NAME`	`AI Assistant`	Name displayed on the AI avatar
`USER_NAME`	`You`	Name displayed on your video
`PORT`	`8765`	Server port
`LANGUAGE`	`zh`	STT language code (`en`, `zh`, `ja`, `ko`, `es`, `fr`, etc.)
`TTS_VOICE`	`zh-CN-XiaoxiaoNeural`	edge-tts voice (list voices)
`LLM_MODEL`	`llama-3.3-70b-versatile`	Groq LLM model for conversation
`VISION_MODEL`	`meta-llama/llama-4-scout-17b-16e-instruct`	Groq vision model
`AGENT_PERSONA`	(auto-generated)	Custom system prompt override

Language Examples

English:

LANGUAGE=en
TTS_VOICE=en-US-AriaNeural

Chinese:

LANGUAGE=zh
TTS_VOICE=zh-CN-XiaoxiaoNeural

Japanese:

LANGUAGE=ja
TTS_VOICE=ja-JP-NanamiNeural

Requirements

Python 3.10+
ffmpeg — for audio conversion (brew install ffmpeg / apt install ffmpeg)
Groq API key — free tier at console.groq.com
Modern browser with camera & microphone support

Architecture

┌─────────────────────────────────────────────────┐
│                  Browser (UI)                    │
│  ┌──────────┐              ┌──────────────────┐  │
│  │  Camera   │              │   AI Avatar      │  │
│  │  (user)   │              │   + Subtitles    │  │
│  └──────────┘              └──────────────────┘  │
│        🎤 Record → POST /api/chat (audio+image)  │
│                           ← { text, audio_url }  │
└─────────────────────────────────────────────────┘
                        │
┌─────────────────────────────────────────────────┐
│              Python Server (FastAPI)             │
│                                                  │
│  Audio ──→ [ffmpeg] ──→ Groq Whisper (STT)      │
│  Image ──→ Groq Llama 4 Scout (Vision)    ← parallel
│                    ↓                             │
│  transcript + scene ──→ Groq Llama 3.3 (LLM)   │
│                    ↓                             │
│  reply text ──→ edge-tts (TTS) ──→ MP3          │
└─────────────────────────────────────────────────┘

The frontend is a single HTML file with no build step. The backend is a single Python file with FastAPI.

Features

🎤 Voice Input — press to record, release to send
📷 Vision — AI can see your camera feed
🔊 Voice Output — AI speaks its replies
💬 Subtitles — typewriter-style text animation
⏱️ Call Timer — FaceTime-style UI
📱 Responsive — works on mobile & desktop
🌍 Multi-language — configurable STT language and TTS voice
🎭 Custom Persona — fully customizable AI personality

How It's Built

Component	Technology	Why
STT	Groq Whisper Large v3 Turbo	Fastest Whisper inference available
Vision	Groq Llama 4 Scout	Multimodal understanding
LLM	Groq Llama 3.3 70B	Fast, high-quality conversation
TTS	edge-tts	Free, many voices, low latency
Server	FastAPI + uvicorn	Async Python, minimal overhead
Frontend	Vanilla HTML/CSS/JS	No build step, just works

License

MIT

Built by ExtraSmall ✨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎥 AI Video Chat

How It Works

Quick Start

1. Get a Groq API Key (free)

2. Install & Run

3. Open in Browser

Configuration

Language Examples

Requirements

Architecture

Features

How It's Built

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🎥 AI Video Chat

How It Works

Quick Start

1. Get a Groq API Key (free)

2. Install & Run

3. Open in Browser

Configuration

Language Examples

Requirements

Architecture

Features

How It's Built

License