Skip to content

A media processing pipeline for downloading, concatenating, transcribing, and uploading video and audio chunks from Cloudflare R2 storage.

Notifications You must be signed in to change notification settings

dampdigits/whisp-media-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Whisp Media Processor 🎬

A modern, scalable video processing pipeline built with Flask that downloads, processes, transcribes, and uploads video chunks from Cloudflare R2 storage. Designed for real-time video conference recording processing with AI-powered transcription.

πŸš€ Overview

Whisp Media Processor is a production-ready Flask web service that processes video and audio chunks into professional-quality media files with embedded transcriptions. The system features a RESTful API for seamless integration with video conferencing platforms and real-time processing capabilities.

Key Features

  • πŸŽ₯ Professional Video Processing: Converts WebM chunks to high-quality MP4 with H.264 encoding
  • 🎀 AI-Powered Transcription: OpenAI Whisper integration with multiple model sizes
  • 🌐 RESTful API: Easy integration with existing systems
  • ☁️ Cloud Storage: Seamless Cloudflare R2 integration
  • πŸ”„ Asynchronous Processing: Non-blocking pipeline execution
  • πŸ“± Soft Subtitles: Embedded captions in MP4 containers
  • πŸ›‘οΈ Error Handling: Robust error recovery and logging

πŸ—οΈ Architecture

β”œβ”€β”€ Flask Web Service
β”‚   β”œβ”€β”€ RESTful API Endpoints
β”‚   β”œβ”€β”€ Asynchronous Processing
β”‚   └── Configuration Management
β”œβ”€β”€ Video Processing Pipeline
β”‚   β”œβ”€β”€ Chunk Download & Validation
β”‚   β”œβ”€β”€ FFmpeg-based Processing
β”‚   β”œβ”€β”€ Whisper AI Transcription
β”‚   └── Cloud Upload
└── Storage Layer
    β”œβ”€β”€ Cloudflare R2 (Primary)
    └── Local Temporary Storage

πŸ”§ Tech Stack

  • Flask 3.1+: Modern Python web framework
  • Python 3.12+: Core programming language
  • FFmpeg: Professional video/audio processing
  • OpenAI Whisper: State-of-the-art speech recognition
  • Cloudflare R2: S3-compatible object storage
  • boto3: AWS SDK for Python (R2 integration)
  • Threading: Asynchronous task processing

βš™οΈ Installation

Prerequisites

  • Python 3.12 or higher
  • FFmpeg installed and accessible in PATH
  • Cloudflare R2 account and credentials

Setup

  1. Clone the repository:

    git clone https://github.com/yourusername/hexafalls2k25.git
    cd hexafalls2k25
  2. Create and activate virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install FFmpeg:

    # Ubuntu/Debian
    sudo apt update && sudo apt install ffmpeg
    
    # macOS
    brew install ffmpeg
    
    # Windows (using chocolatey)
    choco install ffmpeg
  5. Configure environment variables: Create a .env file in the project root:

    S3_ACCESS_KEY_ID=your_r2_access_key
    S3_SECRET_ACCESS_KEY=your_r2_secret_key
    ACCOUNT_ID=your_cloudflare_account_id
    S3_BUCKET_NAME=your_r2_bucket_name

πŸš€ Quick Start

Start the Flask Service

# Development mode
flask run

# Production mode with custom host/port
flask run --host=0.0.0.0 --port=5000

# Using Python directly
python run.py

API Usage

Submit Processing Job

curl -X POST http://localhost:5000/submit \
  -H "Content-Type: application/json" \
  -d '{
    "meeting_id": "meeting_123",
    "take": "1",
    "user_id": "user_456",
    "whisper_model": "base",
    "cleanup": true,
    "skip_transcription": false
  }'

Check Service Status

curl http://localhost:5000/status

πŸ“‘ API Reference

POST /submit

Initiates video processing pipeline for specified meeting chunks.

Request Body:

{
  "meeting_id": "string (required)",
  "take": "string (required)", 
  "user_id": "string (required)",
  "whisper_model": "string (optional, default: 'base')",
  "cleanup": "boolean (optional, default: true)",
  "skip_transcription": "boolean (optional, default: false)"
}

Response:

{
  "status": "success",
  "message": "Video processing pipeline started",
  "meeting_id": "meeting_123",
  "take": "1",
  "user_id": "user_456",
  "config": {
    "REMOTE_DIR": "recordings/meeting_123/1/user_456",
    "LOCAL_DIR": "../chunks/meeting_123/1/user_456",
    "OUTPUT_DIR": "../recordings/meeting_123/1/user_456",
    "UPLOAD_DIR": "recordings/meeting_123/1"
  },
  "options": {
    "whisper_model": "base",
    "cleanup": true,
    "skip_transcription": false
  }
}

GET /status

Returns service health status.

Response:

{
  "status": "running",
  "message": "Video processing service is running"
}

πŸ”„ Processing Pipeline

1. Initialization & Configuration

  • Validate API request parameters
  • Configure directory structures
  • Initialize processing components

2. Chunk Download

  • Connect to Cloudflare R2 storage
  • Download video/audio chunks by prefix
  • Organize files by type (video/audio)

3. Video Processing

  • Concatenate WebM video chunks
  • Fix timestamp inconsistencies
  • Convert to H.264 MP4 format

4. Audio Processing

  • Concatenate WebM audio chunks
  • Extract to WAV format for transcription
  • Encode to AAC for final output

5. AI Transcription (Optional)

  • Load Whisper model (tiny/base/small/medium/large)
  • Generate timestamped transcription
  • Create SRT subtitle files
  • Export JSON metadata

6. Final Assembly

  • Mux video, audio, and subtitles
  • Embed soft captions in MP4 container
  • Optimize for web delivery

7. Upload & Cleanup

  • Upload processed files to R2
  • Standardized naming convention
  • Clean temporary files (optional)

πŸŽ›οΈ Configuration Options

Whisper Models

  • tiny: Fastest, lowest accuracy (~1GB VRAM)
  • base: Balanced performance (default, ~1GB VRAM)
  • small: Better accuracy (~2GB VRAM)
  • medium: High accuracy (~5GB VRAM)
  • large: Best accuracy (~10GB VRAM)

Processing Options

  • cleanup: Remove temporary files after processing
  • skip_transcription: Skip AI transcription step
  • Custom output directories and naming

πŸ“ Project Structure

hexafalls2k25/
β”œβ”€β”€ app/                    # Flask application package
β”‚   β”œβ”€β”€ __init__.py        # Flask app initialization
β”‚   β”œβ”€β”€ routes.py          # API endpoint definitions
β”‚   β”œβ”€β”€ worker.py          # Core processing pipeline
β”‚   β”œβ”€β”€ driver.py          # Configuration management
β”‚   └── chunksToVideo.py   # Standalone video processor
β”œβ”€β”€ run.py                 # Flask application entry point
β”œβ”€β”€ config.py              # Application configuration
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ .env                   # Environment variables (not tracked)
β”œβ”€β”€ README.md              # This file
└── .gitignore            # Git ignore rules

🐳 Docker Deployment

FROM python:3.12-slim

# Install FFmpeg
RUN apt-get update && apt-get install -y ffmpeg

# Set working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 5000

# Run Flask application
CMD ["flask", "run", "--host=0.0.0.0"]

πŸ”§ Development

Local Development Setup

# Install development dependencies
pip install -r requirements.txt

# Run with debug mode
export FLASK_ENV=development
flask run --debug

# Run tests (if available)
python -m pytest

Standalone Processing

For local testing without the Flask API:

# Process local chunks directly
python app/chunksToVideo.py

# Test R2 connectivity
python accessR2.py

🚨 Troubleshooting

Common Issues

  1. FFmpeg not found: Ensure FFmpeg is installed and in PATH
  2. R2 connection failed: Verify credentials in .env file
  3. Whisper model loading: Check available VRAM for larger models
  4. Chunk not found: Verify correct meeting_id/take/user_id combination

Debug Mode

Enable detailed logging:

export FLASK_ENV=development
flask run --debug

πŸ“Š Performance

Processing Times (Approximate)

  • 10 minutes of video: ~2-5 minutes processing
  • Whisper transcription: +30-60 seconds per minute of audio
  • Upload speed: Depends on bandwidth and file size

Resource Requirements

  • CPU: Multi-core recommended for FFmpeg
  • RAM: 2-4GB base + Whisper model size
  • Storage: 3x source file size during processing
  • Network: Stable connection for R2 operations

πŸ›‘οΈ Security

  • Environment variables for sensitive credentials
  • Input validation on all API endpoints
  • Secure temporary file handling
  • Automatic cleanup of processed files

πŸ‘₯ Contributors

Team Bolts

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

For support, feature requests, or bug reports, please open an issue on GitHub.

About

A media processing pipeline for downloading, concatenating, transcribing, and uploading video and audio chunks from Cloudflare R2 storage.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages