A modern, scalable video processing pipeline built with Flask that downloads, processes, transcribes, and uploads video chunks from Cloudflare R2 storage. Designed for real-time video conference recording processing with AI-powered transcription.
Whisp Media Processor is a production-ready Flask web service that processes video and audio chunks into professional-quality media files with embedded transcriptions. The system features a RESTful API for seamless integration with video conferencing platforms and real-time processing capabilities.
- π₯ Professional Video Processing: Converts WebM chunks to high-quality MP4 with H.264 encoding
- π€ AI-Powered Transcription: OpenAI Whisper integration with multiple model sizes
- π RESTful API: Easy integration with existing systems
- βοΈ Cloud Storage: Seamless Cloudflare R2 integration
- π Asynchronous Processing: Non-blocking pipeline execution
- π± Soft Subtitles: Embedded captions in MP4 containers
- π‘οΈ Error Handling: Robust error recovery and logging
βββ Flask Web Service
β βββ RESTful API Endpoints
β βββ Asynchronous Processing
β βββ Configuration Management
βββ Video Processing Pipeline
β βββ Chunk Download & Validation
β βββ FFmpeg-based Processing
β βββ Whisper AI Transcription
β βββ Cloud Upload
βββ Storage Layer
βββ Cloudflare R2 (Primary)
βββ Local Temporary Storage
- Flask 3.1+: Modern Python web framework
- Python 3.12+: Core programming language
- FFmpeg: Professional video/audio processing
- OpenAI Whisper: State-of-the-art speech recognition
- Cloudflare R2: S3-compatible object storage
- boto3: AWS SDK for Python (R2 integration)
- Threading: Asynchronous task processing
- Python 3.12 or higher
- FFmpeg installed and accessible in PATH
- Cloudflare R2 account and credentials
-
Clone the repository:
git clone https://github.com/yourusername/hexafalls2k25.git cd hexafalls2k25
-
Create and activate virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Install FFmpeg:
# Ubuntu/Debian sudo apt update && sudo apt install ffmpeg # macOS brew install ffmpeg # Windows (using chocolatey) choco install ffmpeg
-
Configure environment variables: Create a
.env
file in the project root:S3_ACCESS_KEY_ID=your_r2_access_key S3_SECRET_ACCESS_KEY=your_r2_secret_key ACCOUNT_ID=your_cloudflare_account_id S3_BUCKET_NAME=your_r2_bucket_name
# Development mode
flask run
# Production mode with custom host/port
flask run --host=0.0.0.0 --port=5000
# Using Python directly
python run.py
curl -X POST http://localhost:5000/submit \
-H "Content-Type: application/json" \
-d '{
"meeting_id": "meeting_123",
"take": "1",
"user_id": "user_456",
"whisper_model": "base",
"cleanup": true,
"skip_transcription": false
}'
curl http://localhost:5000/status
Initiates video processing pipeline for specified meeting chunks.
Request Body:
{
"meeting_id": "string (required)",
"take": "string (required)",
"user_id": "string (required)",
"whisper_model": "string (optional, default: 'base')",
"cleanup": "boolean (optional, default: true)",
"skip_transcription": "boolean (optional, default: false)"
}
Response:
{
"status": "success",
"message": "Video processing pipeline started",
"meeting_id": "meeting_123",
"take": "1",
"user_id": "user_456",
"config": {
"REMOTE_DIR": "recordings/meeting_123/1/user_456",
"LOCAL_DIR": "../chunks/meeting_123/1/user_456",
"OUTPUT_DIR": "../recordings/meeting_123/1/user_456",
"UPLOAD_DIR": "recordings/meeting_123/1"
},
"options": {
"whisper_model": "base",
"cleanup": true,
"skip_transcription": false
}
}
Returns service health status.
Response:
{
"status": "running",
"message": "Video processing service is running"
}
- Validate API request parameters
- Configure directory structures
- Initialize processing components
- Connect to Cloudflare R2 storage
- Download video/audio chunks by prefix
- Organize files by type (video/audio)
- Concatenate WebM video chunks
- Fix timestamp inconsistencies
- Convert to H.264 MP4 format
- Concatenate WebM audio chunks
- Extract to WAV format for transcription
- Encode to AAC for final output
- Load Whisper model (tiny/base/small/medium/large)
- Generate timestamped transcription
- Create SRT subtitle files
- Export JSON metadata
- Mux video, audio, and subtitles
- Embed soft captions in MP4 container
- Optimize for web delivery
- Upload processed files to R2
- Standardized naming convention
- Clean temporary files (optional)
tiny
: Fastest, lowest accuracy (~1GB VRAM)base
: Balanced performance (default, ~1GB VRAM)small
: Better accuracy (~2GB VRAM)medium
: High accuracy (~5GB VRAM)large
: Best accuracy (~10GB VRAM)
cleanup
: Remove temporary files after processingskip_transcription
: Skip AI transcription step- Custom output directories and naming
hexafalls2k25/
βββ app/ # Flask application package
β βββ __init__.py # Flask app initialization
β βββ routes.py # API endpoint definitions
β βββ worker.py # Core processing pipeline
β βββ driver.py # Configuration management
β βββ chunksToVideo.py # Standalone video processor
βββ run.py # Flask application entry point
βββ config.py # Application configuration
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (not tracked)
βββ README.md # This file
βββ .gitignore # Git ignore rules
FROM python:3.12-slim
# Install FFmpeg
RUN apt-get update && apt-get install -y ffmpeg
# Set working directory
WORKDIR /app
# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 5000
# Run Flask application
CMD ["flask", "run", "--host=0.0.0.0"]
# Install development dependencies
pip install -r requirements.txt
# Run with debug mode
export FLASK_ENV=development
flask run --debug
# Run tests (if available)
python -m pytest
For local testing without the Flask API:
# Process local chunks directly
python app/chunksToVideo.py
# Test R2 connectivity
python accessR2.py
- FFmpeg not found: Ensure FFmpeg is installed and in PATH
- R2 connection failed: Verify credentials in
.env
file - Whisper model loading: Check available VRAM for larger models
- Chunk not found: Verify correct meeting_id/take/user_id combination
Enable detailed logging:
export FLASK_ENV=development
flask run --debug
- 10 minutes of video: ~2-5 minutes processing
- Whisper transcription: +30-60 seconds per minute of audio
- Upload speed: Depends on bandwidth and file size
- CPU: Multi-core recommended for FFmpeg
- RAM: 2-4GB base + Whisper model size
- Storage: 3x source file size during processing
- Network: Stable connection for R2 operations
- Environment variables for sensitive credentials
- Input validation on all API endpoints
- Secure temporary file handling
- Automatic cleanup of processed files
Team Bolts
- Sk Sameer Salam - Lead Developer
- Tushar Daiya - Backend Engineer
- Sougata Mandal - DevOps Engineer
- Aquib Alam - Frontend Integration
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
For support, feature requests, or bug reports, please open an issue on GitHub.