A video content search system that allows users to upload videos, automatically transcribe audio to text using Deepgram, and perform natural language queries to find specific segments within the video.
Upload your video and transcribe it with Deepgram
Search using natural language questions, not just keywords
Click on results to jump directly to that moment in the video
- 🎥 Video Upload: Support for multiple video formats (MP4, MOV, AVI, MKV, etc.)
- 🎤 Audio Transcription: Automatic transcription with word-level timestamps using Deepgram
- 🔍 Semantic Search: Natural language search powered by Claude AI
- ⏱️ Precise Timestamps: Jump directly to relevant video segments
- 🚀 Simple Approach: Direct LLM matching (Approach A) - perfect for videos < 30 minutes
- 🖥️ Web Interface: Easy-to-use frontend with drag-and-drop upload and integrated video player
- Python 3.11+
- Conda (Anaconda or Miniconda)
- FFmpeg
- Deepgram API key
- Anthropic API key
git clone git@github.com:darwin-ye/videoRAG.git
cd videoRAG./setup.shThis will:
- Create a conda environment called
videoRAG_env - Install all Python dependencies
- Create necessary directories
- Generate a
.envtemplate file
Edit the .env file and add your API keys:
DEEPGRAM_API_KEY=your_deepgram_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_hereGet your API keys:
- Deepgram: https://console.deepgram.com/
- Anthropic: https://console.anthropic.com/
conda activate videoRAG_env
python main.pyThe application will be available at http://localhost:8000
Open your browser and navigate to http://localhost:8000
The web interface provides a simple 3-step workflow:
- Upload Video: Drag and drop or browse for your video file
- Transcribe: Click to transcribe the video (uses Deepgram API)
- Search: Enter natural language queries to find relevant segments
Click on any search result to jump directly to that timestamp in the video!
curl -X POST "http://localhost:8000/api/upload" \
-F "video=@your_video.mp4"Response:
{
"video_id": "abc123...",
"filename": "your_video.mp4",
"duration": 120.5,
"size": 15728640,
"status": "uploaded"
}curl -X POST "http://localhost:8000/api/transcribe" \
-H "Content-Type: application/json" \
-d '{"video_id": "abc123..."}'Response:
{
"video_id": "abc123...",
"transcript_id": "abc123...",
"status": "completed",
"transcript": {
"text": "Full transcript text...",
"words": [
{
"word": "Hello",
"start": 0.5,
"end": 1.2,
"confidence": 0.98
}
]
}
}curl -X POST "http://localhost:8000/api/query" \
-H "Content-Type: application/json" \
-d '{
"video_id": "abc123...",
"query": "find segments about travel",
"max_results": 5
}'Response:
{
"video_id": "abc123...",
"query": "find segments about travel",
"segments": [
{
"text": "Today we went to Beijing for travel",
"start_time": 45.2,
"end_time": 52.8,
"relevance_score": 0.95,
"reason": "Explicitly mentions travel"
}
],
"total_matches": 1
}Once the server is running, visit:
- Web Interface: http://localhost:8000
- API Docs (Swagger): http://localhost:8000/docs
- API Docs (ReDoc): http://localhost:8000/redoc
videoRAG/
├── app/
│ ├── config.py # Configuration settings
│ ├── models.py # Pydantic models
│ ├── routes/
│ │ └── video.py # API endpoints
│ ├── services/
│ │ ├── audio_extractor.py # FFmpeg audio extraction
│ │ ├── transcription.py # Deepgram transcription
│ │ └── search.py # Claude AI search
│ └── utils/
├── frontend/
│ ├── index.html # Main web interface
│ └── static/
│ ├── css/
│ │ └── styles.css # Frontend styles
│ └── js/
│ └── app.js # Frontend JavaScript
├── uploads/ # Uploaded video files
├── transcripts/ # Saved transcripts
├── logs/ # Application logs
├── main.py # Application entry point
├── requirements.txt # Python dependencies
├── setup.sh # Setup script
├── .env # Environment variables (create from template)
└── README.md # This file
This implementation uses Approach A (Simple Version) - Direct LLM matching:
- Video Upload: Accept video files and save to storage
- Audio Extraction: Extract audio using FFmpeg
- Transcription: Send audio to Deepgram for word-level transcription
- Query Processing: Send full transcript + query to Claude for segment matching
- Results: Return matched segments with timestamps and relevance scores
- ✅ Simple implementation (~200 lines of core logic)
- ✅ No additional services needed
- ✅ Fast development
- ✅ Good for videos < 30 minutes
- ❌ Video length limit (~10-20 hours due to context window)
- ❌ Slower query response (3-8 seconds)
- ❌ Higher cost for repeated queries on long videos
conda activate videoRAG_env
uvicorn main:app --reload# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run tests (when implemented)
pytestFor a 1-hour video with 100 queries:
- Deepgram transcription: ~$0.10 (one-time)
- Claude queries: ~$1.50 (12K tokens per query)
- Total: ~$1.60 for 100 queries
- Upgrade to Approach B (Vector Search + LLM) for longer videos
- Multi-language support
- Speaker identification
- React frontend with video player
- Clip export functionality
- Real-time transcription for live streams
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpegMake sure your .env file has valid API keys and no extra spaces.
Change the PORT in .env file or kill the process using port 8000.
MIT
Contributions are welcome! Please feel free to submit a Pull Request.