sanzaru

A stateless, lightweight MCP server that wraps OpenAI's Sora Video API, Whisper, GPT-4o Audio, and TTS APIs via the OpenAI Python SDK.

Features

Video Generation (Sora)

Create videos with sora-2 or sora-2-pro models
Use reference images to guide generation
Remix and refine existing videos
Download variants (video, thumbnail, spritesheet)

Image Generation

Generate images with gpt-image-1.5 (recommended) or GPT-5
Edit and compose images with up to 16 inputs
Iterative refinement via Responses API
Automatic resizing for Sora compatibility

Audio Processing

Transcription: Whisper and GPT-4o models
Audio Chat: Interactive analysis with GPT-4o
Text-to-Speech: Multi-voice TTS generation
Processing: Format conversion, compression, file management

Podcast Generation

Multi-voice podcasts with up to 4 speakers and 10 TTS voices
Parallel segment generation with configurable pacing
MP3/WAV output with loudness normalization

Note: Content guardrails are enforced by OpenAI. This server does not run local moderation.

Requirements

Python 3.10+
OPENAI_API_KEY environment variable

Media storage (choose one):

# Recommended: unified path (auto-creates videos/, images/, audio/ subdirs)
SANZARU_MEDIA_PATH="/path/to/media"

# Or individual paths (legacy, still supported)
VIDEO_PATH="/path/to/videos"
IMAGE_PATH="/path/to/images"
AUDIO_PATH="/path/to/audio"

Features are auto-detected based on configured paths. Set only what you need.

Quick Start

Clone the repository:

git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru

Run the setup script:
```
./setup.sh
```
The script will:
- Prompt for your OpenAI API key
- Create directories and .env configuration
- Install dependencies with uv sync --all-extras --dev
Start using:
```
claude
```

That's it! Claude Code will automatically connect and you can start generating videos, images, and processing audio.

Installation

Claude Code Plugin (Recommended)

Install as a plugin — auto-configures the MCP server + includes prompting guidance:

/plugin marketplace add TJC-LP/sanzaru

Requires OPENAI_API_KEY and SANZARU_MEDIA_PATH environment variables to be set.

Quick Install

# All features
uv add "sanzaru[all]"

# Specific features
uv add "sanzaru[audio]"  # With audio support
uv add sanzaru           # Base (video + image only)

Alternative Installation Methods

From Source

git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru
uv sync --all-extras

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "sanzaru": {
      "command": "uvx",
      "args": ["sanzaru[all]"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here",
        "SANZARU_MEDIA_PATH": "/absolute/path/to/media"
      }
    }
  }
}

Or from source:

{
  "mcpServers": {
    "sanzaru": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/sanzaru", "sanzaru"]
    }
  }
}

Codex MCP

# Using uvx (from PyPI)
codex mcp add sanzaru \
  --env OPENAI_API_KEY="sk-..." \
  --env SANZARU_MEDIA_PATH="$HOME/sanzaru-media" \
  -- uvx "sanzaru[all]"

Manual Setup

uv venv
uv sync

# Set required environment variables
export OPENAI_API_KEY=sk-...
export SANZARU_MEDIA_PATH=~/sanzaru-media

# Run server (stdio for MCP clients)
uv run sanzaru

# Or HTTP mode (for remote access)
uv run sanzaru --transport http --port 8000

Available Tools

Category	Tools	Description
Video	`create_video`, `get_video_status`, `download_video`, `list_videos`, `list_local_videos`, `delete_video`, `remix_video`	Generate and manage Sora videos with optional reference images
Image	`generate_image`, `edit_image`, `create_image`, `get_image_status`, `download_image`	Generate with gpt-image-1.5 (sync) or GPT-5 (polling)
Reference	`list_reference_images`, `prepare_reference_image`	Manage and resize images for Sora compatibility
Audio	`transcribe_audio`, `chat_with_audio`, `create_audio`, `convert_audio`, `compress_audio`, `list_audio_files`, `get_latest_audio`, `transcribe_with_enhancement`	Transcription, analysis, TTS, and file management
Podcast	`generate_podcast`	Multi-voice podcast generation with parallel TTS and audio stitching
Media	`view_media`	Interactive media player via MCP App protocol

Full API documentation: See docs/api-reference.md

Basic Workflows

Generate a Video

# Create video from text
video = create_video(
    prompt="A serene mountain landscape at sunrise",
    model="sora-2",
    seconds="8",
    size="1280x720"
)

# Poll for completion
status = get_video_status(video.id)

# Download when ready
download_video(video.id, filename="mountain_sunrise.mp4")

Generate with Reference Image

# 1. Generate reference image (gpt-image-1.5, synchronous)
generate_image(
    prompt="futuristic pilot in mech cockpit",
    size="1536x1024",
    filename="pilot.png"
)

# 2. Prepare for video (resize to Sora dimensions)
prepare_reference_image("pilot.png", "1280x720", resize_mode="crop")

# 3. Animate
video = create_video(
    prompt="The pilot looks up and smiles",
    size="1280x720",
    input_reference_filename="pilot_1280x720.png"
)

Audio Transcription

# List available audio files
files = list_audio_files(format="mp3")

# Transcribe
result = transcribe_audio("interview.mp3")

# Or analyze with GPT-4o
analysis = chat_with_audio(
    "meeting.mp3",
    user_prompt="Summarize key decisions and action items"
)

Generate a Podcast

generate_podcast(script={
    "title": "AI Weekly",
    "speakers": [
        {"id": "host", "name": "Alex", "voice": "nova"},
        {"id": "guest", "name": "Sam", "voice": "echo"}
    ],
    "segments": [
        {"speaker": "host", "text": "Welcome to AI Weekly!"},
        {"speaker": "guest", "text": "Thanks for having me."}
    ]
})

Documentation

API Reference - Complete tool documentation with parameters and examples
Reference Images Guide - Working with reference images and resizing
Image Generation Guide - Generating and editing reference images
Sora Prompting Guide - Crafting effective video prompts
Audio Features - Audio transcription, chat, and TTS
Performance & Architecture - Technical details and benchmarks

Transport Modes

Mode	Command	Use Case
stdio (default)	`uv run sanzaru`	Claude Desktop, Claude Code, local MCP clients
HTTP	`uv run sanzaru --transport http`	Remote access, Databricks Apps, web clients

Storage Backends

Backend	Config	Use Case
Local (default)	`SANZARU_MEDIA_PATH=/path/to/media`	Development, local deployments
Databricks	`STORAGE_BACKEND=databricks`	Databricks Apps with Unity Catalog Volumes

The Databricks backend supports per-user storage isolation via the user_context module, enabling multi-tenant deployments where each user's media is stored under their own volume prefix.

See CLAUDE.md for full configuration details.

Performance

Fully asynchronous architecture with proven scalability:

✅ 32+ concurrent operations verified
✅ 8-10x speedup for parallel tasks
✅ Non-blocking I/O with aiofiles + anyio
✅ Python 3.14 free-threading ready

See docs/async-optimizations.md for technical details.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.claude-plugin		.claude-plugin
.claude/commands		.claude/commands
.github/workflows		.github/workflows
assets		assets
docs		docs
plugin		plugin
src/sanzaru		src/sanzaru
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sanzaru

Features

Video Generation (Sora)

Image Generation

Audio Processing

Podcast Generation

Requirements

Quick Start

Installation

Claude Code Plugin (Recommended)

Quick Install

From Source

Claude Desktop

Codex MCP

Manual Setup

Available Tools

Basic Workflows

Generate a Video

Generate with Reference Image

Audio Transcription

Generate a Podcast

Documentation

Transport Modes

Storage Backends

Performance

License

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sanzaru

Features

Video Generation (Sora)

Image Generation

Audio Processing

Podcast Generation

Requirements

Quick Start

Installation

Claude Code Plugin (Recommended)

Quick Install

From Source

Claude Desktop

Codex MCP

Manual Setup

Available Tools

Basic Workflows

Generate a Video

Generate with Reference Image

Audio Transcription

Generate a Podcast

Documentation

Transport Modes

Storage Backends

Performance

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages