Video RAG System

A video content search system that allows users to upload videos, automatically transcribe audio to text using Deepgram, and perform natural language queries to find specific segments within the video.

Demo

Upload your video and transcribe it with Deepgram

Search using natural language questions, not just keywords

Click on results to jump directly to that moment in the video

Features

🎥 Video Upload: Support for multiple video formats (MP4, MOV, AVI, MKV, etc.)
🎤 Audio Transcription: Automatic transcription with word-level timestamps using Deepgram
🔍 Semantic Search: Natural language search powered by Claude AI
⏱️ Precise Timestamps: Jump directly to relevant video segments
🚀 Simple Approach: Direct LLM matching (Approach A) - perfect for videos < 30 minutes
🖥️ Web Interface: Easy-to-use frontend with drag-and-drop upload and integrated video player

Prerequisites

Python 3.11+
Conda (Anaconda or Miniconda)
FFmpeg
Deepgram API key
Anthropic API key

Quick Start

1. Clone the Repository

git clone git@github.com:darwin-ye/videoRAG.git
cd videoRAG

2. Run Setup Script

./setup.sh

This will:

Create a conda environment called videoRAG_env
Install all Python dependencies
Create necessary directories
Generate a .env template file

3. Configure API Keys

Edit the .env file and add your API keys:

DEEPGRAM_API_KEY=your_deepgram_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Get your API keys:

Deepgram: https://console.deepgram.com/
Anthropic: https://console.anthropic.com/

4. Activate Environment and Run

conda activate videoRAG_env
python main.py

The application will be available at http://localhost:8000

5. Use the Web Interface

Open your browser and navigate to http://localhost:8000

The web interface provides a simple 3-step workflow:

Upload Video: Drag and drop or browse for your video file
Transcribe: Click to transcribe the video (uses Deepgram API)
Search: Enter natural language queries to find relevant segments

Click on any search result to jump directly to that timestamp in the video!

API Usage (Optional - use Web Interface instead)

1. Upload Video

curl -X POST "http://localhost:8000/api/upload" \
  -F "video=@your_video.mp4"

Response:

{
  "video_id": "abc123...",
  "filename": "your_video.mp4",
  "duration": 120.5,
  "size": 15728640,
  "status": "uploaded"
}

2. Transcribe Video

curl -X POST "http://localhost:8000/api/transcribe" \
  -H "Content-Type: application/json" \
  -d '{"video_id": "abc123..."}'

Response:

{
  "video_id": "abc123...",
  "transcript_id": "abc123...",
  "status": "completed",
  "transcript": {
    "text": "Full transcript text...",
    "words": [
      {
        "word": "Hello",
        "start": 0.5,
        "end": 1.2,
        "confidence": 0.98
      }
    ]
  }
}

3. Query Segments

curl -X POST "http://localhost:8000/api/query" \
  -H "Content-Type: application/json" \
  -d '{
    "video_id": "abc123...",
    "query": "find segments about travel",
    "max_results": 5
  }'

Response:

{
  "video_id": "abc123...",
  "query": "find segments about travel",
  "segments": [
    {
      "text": "Today we went to Beijing for travel",
      "start_time": 45.2,
      "end_time": 52.8,
      "relevance_score": 0.95,
      "reason": "Explicitly mentions travel"
    }
  ],
  "total_matches": 1
}

Documentation

Once the server is running, visit:

Web Interface: http://localhost:8000
API Docs (Swagger): http://localhost:8000/docs
API Docs (ReDoc): http://localhost:8000/redoc

Project Structure

videoRAG/
├── app/
│   ├── config.py              # Configuration settings
│   ├── models.py              # Pydantic models
│   ├── routes/
│   │   └── video.py           # API endpoints
│   ├── services/
│   │   ├── audio_extractor.py # FFmpeg audio extraction
│   │   ├── transcription.py   # Deepgram transcription
│   │   └── search.py          # Claude AI search
│   └── utils/
├── frontend/
│   ├── index.html             # Main web interface
│   └── static/
│       ├── css/
│       │   └── styles.css     # Frontend styles
│       └── js/
│           └── app.js         # Frontend JavaScript
├── uploads/                   # Uploaded video files
├── transcripts/               # Saved transcripts
├── logs/                      # Application logs
├── main.py                    # Application entry point
├── requirements.txt           # Python dependencies
├── setup.sh                   # Setup script
├── .env                       # Environment variables (create from template)
└── README.md                  # This file

Architecture

This implementation uses Approach A (Simple Version) - Direct LLM matching:

Video Upload: Accept video files and save to storage
Audio Extraction: Extract audio using FFmpeg
Transcription: Send audio to Deepgram for word-level transcription
Query Processing: Send full transcript + query to Claude for segment matching
Results: Return matched segments with timestamps and relevance scores

Pros

✅ Simple implementation (~200 lines of core logic)
✅ No additional services needed
✅ Fast development
✅ Good for videos < 30 minutes

Cons

❌ Video length limit (~10-20 hours due to context window)
❌ Slower query response (3-8 seconds)
❌ Higher cost for repeated queries on long videos

Development

Running in Development Mode

conda activate videoRAG_env
uvicorn main:app --reload

Testing

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests (when implemented)
pytest

Cost Estimates

For a 1-hour video with 100 queries:

Deepgram transcription: ~$0.10 (one-time)
Claude queries: ~$1.50 (12K tokens per query)
Total: ~$1.60 for 100 queries

Future Enhancements

Upgrade to Approach B (Vector Search + LLM) for longer videos
Multi-language support
Speaker identification
React frontend with video player
Clip export functionality
Real-time transcription for live streams

Troubleshooting

FFmpeg not found

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

API Key errors

Make sure your .env file has valid API keys and no extra spaces.

Port already in use

Change the PORT in .env file or kill the process using port 8000.

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
demonstration		demonstration
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh
video_rag_architecture.md		video_rag_architecture.md

Folders and files

Latest commit

History

Repository files navigation

Video RAG System

Demo

Features

Prerequisites

Quick Start

1. Clone the Repository

2. Run Setup Script

3. Configure API Keys

4. Activate Environment and Run

5. Use the Web Interface

API Usage (Optional - use Web Interface instead)

1. Upload Video

2. Transcribe Video

3. Query Segments

Documentation

Project Structure

Architecture

Pros

Cons

Development

Running in Development Mode

Testing

Cost Estimates

Future Enhancements

Troubleshooting

FFmpeg not found

API Key errors

Port already in use

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages