Welcome to Manim studio. A full-stack, distributed application that automatically transforms natural language prompts into programmatic, high-quality educational animations.
By chaining Large Language Models (LLMs) with the Manim mathematical animation engine, this system automates the traditionally labor-intensive process of scripting, animating, and rendering technical concepts into seamless video presentations.
Frontend & API:
- Next.js 16 (App Router)
- React 19, Tailwind CSS v4, Lucide React, Three.js
Infrastructure & Data:
- PostgreSQL via Prisma ORM
- Redis via ioredis
- BullMQ for Job Queues
- AWS S3 for Blob Storage
- Railway (Docker Containerization)
Media Processing:
- Python 3.11
- Manim v0.20.0
- FFmpeg
The platform is designed around a microservices-inspired architecture, decoupling the fast, user-facing Next.js application from the heavy, asynchronous video rendering workloads. Here is the rough architecture of the main workflow ->
When a client submits a generation request via the Next.js API (/api/generate), the system first validates the user's session using NextAuth.js. I have strictly enforced authentication before allocating any compute resources. Once verified, a new VideoJob is initialized in our PostgreSQL database (managed via Prisma) with a pending state.
Instead of processing the video synchronously—which would lead to HTTP timeouts and blocked threads—the API immediately offloads the task to a Redis-backed BullMQ queue.
Redis is the backbone of this asynchronous pipeline. It provides a durable, high-throughput message broker that ensures job persistence, allows for automatic retries, and enables the system to scale its worker nodes independently from the frontend API.
A dedicated Node.js worker (deployed via a Docker container) continuously polls the Redis queue. When a job is dequeued, it undergoes a multi-stage pipeline:
- Semantic Structuring: The raw prompt is passed to an LLM to generate a structured JSON blueprint mapping out distinct visual scenes.
- Code Generation: The worker orchestrates parallel LLM calls to translate each scene from the blueprint into executable Python code using the Manim library.
- Local Rendering: The worker spawns child processes (
execSync) to run Manim against the generated Python scripts, rendering out individual.mp4chunks at 480p/15fps. - Asset Concatenation: Once all chunks are successfully rendered, FFmpeg concatenates the segments into a single, seamless video file.
Relying on the container's ephemeral filesystem is dangerous for long-term storage. Therefore, the worker immediately uploads the final merged .mp4 to AWS S3. The PostgreSQL record is updated with the persistent S3 URL, and the client (which has been polling /api/job-status) receives the final payload to display the video.
The biggest complexity while building this pipeline to work robustly was to handle the queue jobs simultaneously and the worker's state being persistent avoiding crashing or timing out, so here are some optimization measures taken for MVP:
- Defensive Execution (240s Timeout): LLMs can hallucinate Manim syntax that causes the renderer to hang indefinitely. The BullMQ worker wraps the entire generation pipeline in a
Promise.racewith a strict 4-minute timeout. If the threshold is breached, the worker gracefully kills the job, updates the database, and prevents the queue from stalling. - Fault-Tolerant Rendering: If a specific scene throws a syntax error during the Manim compilation step, the pipeline catches the error, skips the corrupted chunk, and continues rendering the remaining scenes. Partial success is prioritized over complete failure.
- Caching Layer: Before queuing a new heavy workload, the API checks the database for an exact
(userId, prompt, status: "completed")match. If a cached video exists, it is served instantly, bypassing the queue entirely. - Resource Limiting: Video generation is compute-heavy. We currently enforce a strict quota (maximum 3 videos per user) to control cloud costs and prevent abuse.
ManimStudioArchitecture.mp4
- Enhanced Cognitive Pipeline: Upgrading the prompt engineering and LLM orchestration to support multi-step reasoning. This will allow the system to generate significantly longer, more mathematically complex, and better-timed educational content.
- Tiered Architecture (Stripe Integration): Segmenting the user base into Free (capped limits, standard processing) and Pro (unlimited generation, priority queueing, 1080p60 rendering).
- Latency Optimization: Swapping the current client-side HTTP polling mechanism for WebSockets or Server-Sent Events (SSE) to reduce database load and provide real-time status updates.
You are welcome to contribute to this project! Whether you're fixing bugs, improving the Manim prompt engineering, or optimizing the FFmpeg pipeline, your help is appreciated.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure to write test cases for new features and I will highly appreciare any suggestions to reduce the latency of the videos and increase the quality of the videos.