Skip to content

darwin-ye/videoRAG

Repository files navigation

Video RAG System

A video content search system that allows users to upload videos, automatically transcribe audio to text using Deepgram, and perform natural language queries to find specific segments within the video.

Demo

Upload and Transcribe Upload your video and transcribe it with Deepgram

Search with Natural Language Search using natural language questions, not just keywords

Jump to Timestamp Click on results to jump directly to that moment in the video

Features

  • 🎥 Video Upload: Support for multiple video formats (MP4, MOV, AVI, MKV, etc.)
  • 🎤 Audio Transcription: Automatic transcription with word-level timestamps using Deepgram
  • 🔍 Semantic Search: Natural language search powered by Claude AI
  • ⏱️ Precise Timestamps: Jump directly to relevant video segments
  • 🚀 Simple Approach: Direct LLM matching (Approach A) - perfect for videos < 30 minutes
  • 🖥️ Web Interface: Easy-to-use frontend with drag-and-drop upload and integrated video player

Prerequisites

  • Python 3.11+
  • Conda (Anaconda or Miniconda)
  • FFmpeg
  • Deepgram API key
  • Anthropic API key

Quick Start

1. Clone the Repository

git clone git@github.com:darwin-ye/videoRAG.git
cd videoRAG

2. Run Setup Script

./setup.sh

This will:

  • Create a conda environment called videoRAG_env
  • Install all Python dependencies
  • Create necessary directories
  • Generate a .env template file

3. Configure API Keys

Edit the .env file and add your API keys:

DEEPGRAM_API_KEY=your_deepgram_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Get your API keys:

4. Activate Environment and Run

conda activate videoRAG_env
python main.py

The application will be available at http://localhost:8000

5. Use the Web Interface

Open your browser and navigate to http://localhost:8000

The web interface provides a simple 3-step workflow:

  1. Upload Video: Drag and drop or browse for your video file
  2. Transcribe: Click to transcribe the video (uses Deepgram API)
  3. Search: Enter natural language queries to find relevant segments

Click on any search result to jump directly to that timestamp in the video!

API Usage (Optional - use Web Interface instead)

1. Upload Video

curl -X POST "http://localhost:8000/api/upload" \
  -F "video=@your_video.mp4"

Response:

{
  "video_id": "abc123...",
  "filename": "your_video.mp4",
  "duration": 120.5,
  "size": 15728640,
  "status": "uploaded"
}

2. Transcribe Video

curl -X POST "http://localhost:8000/api/transcribe" \
  -H "Content-Type: application/json" \
  -d '{"video_id": "abc123..."}'

Response:

{
  "video_id": "abc123...",
  "transcript_id": "abc123...",
  "status": "completed",
  "transcript": {
    "text": "Full transcript text...",
    "words": [
      {
        "word": "Hello",
        "start": 0.5,
        "end": 1.2,
        "confidence": 0.98
      }
    ]
  }
}

3. Query Segments

curl -X POST "http://localhost:8000/api/query" \
  -H "Content-Type: application/json" \
  -d '{
    "video_id": "abc123...",
    "query": "find segments about travel",
    "max_results": 5
  }'

Response:

{
  "video_id": "abc123...",
  "query": "find segments about travel",
  "segments": [
    {
      "text": "Today we went to Beijing for travel",
      "start_time": 45.2,
      "end_time": 52.8,
      "relevance_score": 0.95,
      "reason": "Explicitly mentions travel"
    }
  ],
  "total_matches": 1
}

Documentation

Once the server is running, visit:

Project Structure

videoRAG/
├── app/
│   ├── config.py              # Configuration settings
│   ├── models.py              # Pydantic models
│   ├── routes/
│   │   └── video.py           # API endpoints
│   ├── services/
│   │   ├── audio_extractor.py # FFmpeg audio extraction
│   │   ├── transcription.py   # Deepgram transcription
│   │   └── search.py          # Claude AI search
│   └── utils/
├── frontend/
│   ├── index.html             # Main web interface
│   └── static/
│       ├── css/
│       │   └── styles.css     # Frontend styles
│       └── js/
│           └── app.js         # Frontend JavaScript
├── uploads/                   # Uploaded video files
├── transcripts/               # Saved transcripts
├── logs/                      # Application logs
├── main.py                    # Application entry point
├── requirements.txt           # Python dependencies
├── setup.sh                   # Setup script
├── .env                       # Environment variables (create from template)
└── README.md                  # This file

Architecture

This implementation uses Approach A (Simple Version) - Direct LLM matching:

  1. Video Upload: Accept video files and save to storage
  2. Audio Extraction: Extract audio using FFmpeg
  3. Transcription: Send audio to Deepgram for word-level transcription
  4. Query Processing: Send full transcript + query to Claude for segment matching
  5. Results: Return matched segments with timestamps and relevance scores

Pros

  • ✅ Simple implementation (~200 lines of core logic)
  • ✅ No additional services needed
  • ✅ Fast development
  • ✅ Good for videos < 30 minutes

Cons

  • ❌ Video length limit (~10-20 hours due to context window)
  • ❌ Slower query response (3-8 seconds)
  • ❌ Higher cost for repeated queries on long videos

Development

Running in Development Mode

conda activate videoRAG_env
uvicorn main:app --reload

Testing

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests (when implemented)
pytest

Cost Estimates

For a 1-hour video with 100 queries:

  • Deepgram transcription: ~$0.10 (one-time)
  • Claude queries: ~$1.50 (12K tokens per query)
  • Total: ~$1.60 for 100 queries

Future Enhancements

  • Upgrade to Approach B (Vector Search + LLM) for longer videos
  • Multi-language support
  • Speaker identification
  • React frontend with video player
  • Clip export functionality
  • Real-time transcription for live streams

Troubleshooting

FFmpeg not found

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

API Key errors

Make sure your .env file has valid API keys and no extra spaces.

Port already in use

Change the PORT in .env file or kill the process using port 8000.

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors