A terminal-first, 1-day MVP that converts podcasts into blog posts.
- 📁 Smart Organization: Podcasts organized by channel/episode in clean folders
- 📋 Batch Processing: Process multiple podcasts from a list
- 📡 RSS Monitoring: Auto-detect new episodes from RSS feeds
- 💬 Interactive Q&A: Ask questions about podcast content
- 🌐 HTML Output: Styled webpages auto-generated for each episode
- 📚 Web Dashboard: Browse all podcasts at
outputs/index.html
See FEATURES.md and ORGANIZATION.md for detailed guides.
podcast_mvp/
├── outputs/ # Organized by channel and episode
│ ├── Huberman Lab/
│ │ ├── Episode Title 1/
│ │ │ ├── audio.mp3
│ │ │ ├── transcript_raw.txt
│ │ │ ├── transcript_clean.txt
│ │ │ ├── insights.json
│ │ │ ├── draft.md
│ │ │ ├── final.md
│ │ │ ├── blog.html
│ │ │ └── metadata.json
│ │ └── Episode Title 2/
│ │ └── ...
│ ├── Lex Fridman Podcast/
│ │ └── ...
│ └── ...
│
├── logs/ # Batch processing logs
│
├── setup.sh # Initial setup script
├── process_podcast.sh # Main workflow (audio → blog)
├── extract_metadata.py # Extract channel/title metadata
│
├── batch_process.sh # 📋 Process multiple podcasts from list
├── monitor_rss.py # 📡 Monitor RSS feeds for new episodes
├── interactive_qa.py # 💬 Ask questions about podcasts
├── convert_to_html.py # 🌐 Convert markdown to HTML
│
├── clean_transcript.py # Core processing scripts
├── extract_insights.py
├── generate_blog.py
└── improve_blog.py
Run the setup.sh script to create a virtual environment and install all dependencies:
./setup.shThe setup.sh script will:
- Check for Python 3 installation
- Create a Python virtual environment (
venv/) if it doesn't exist - Activate the virtual environment
- Install all required Python packages from
requirements.txt - Check for
yt-dlpinstallation - Verify
GEMINI_API_KEYis set (or provide instructions) - Make workflow scripts executable (
process_podcast.sh,generate_from_transcript.sh, etc.)
Note: You only need to run setup.sh once after cloning/downloading the project.
If you prefer to set up manually:
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate # On macOS/Linux
# or: venv\Scripts\activate # On Windows
# Install Python dependencies
pip install -r requirements.txt
# Install yt-dlp (for YouTube downloads)
# Option 1: System-wide
brew install yt-dlp
# Option 2: In virtual environment
pip install yt-dlpNote: The scripts (process_podcast.sh, etc.) automatically activate the virtual environment if it exists. If you run Python scripts directly, make sure to activate the venv first.
export GEMINI_API_KEY="your-api-key"
# Get your free API key from: https://aistudio.google.com/app/apikeyOr create a .env file.
Use process_podcast.sh to process a podcast from audio to blog post:
./process_podcast.sh <youtube_url_or_mp3_path>Examples:
# From YouTube URL
./process_podcast.sh "https://www.youtube.com/watch?v=..."
# From local MP3
./process_podcast.sh /path/to/podcast.mp3If you already have a cleaned transcript, use generate_from_transcript.sh to skip the audio download and transcription steps:
./generate_from_transcript.sh <clean_transcript_file>Examples:
# From a cleaned transcript
./generate_from_transcript.sh transcript_clean/episode1_clean.txt
# The script will:
# 1. Extract insights from the transcript
# 2. Generate a blog draft
# 3. Optionally run an improvement passNote: This is useful if you've already transcribed audio elsewhere or want to regenerate a blog post from an existing transcript.
- Audio Ingestion: Downloads from YouTube (via yt-dlp) or copies local MP3
- Transcription: Uses Whisper (base model) to generate text
- Cleaning: Removes timestamps, filler words, fixes formatting
- Insight Extraction: LLM extracts key ideas, quotes, company mentions
- Blog Generation: Creates 800-1,200 word blog post
- Optional Improvement: One critique + revision pass
Batch Processing - Process multiple podcasts:
# Create podcasts.txt with URLs/paths, then:
./batch_process.sh podcasts.txtRSS Monitoring - Auto-detect new episodes:
# Create feeds.txt with RSS URLs, then:
python3 monitor_rss.py feeds.txt
./batch_process.sh podcasts_new.txt # Process new episodesInteractive Q&A - Ask questions about a podcast:
python3 interactive_qa.py "outputs/Channel Name/Episode Title/transcript_clean.txt" \
"outputs/Channel Name/Episode Title/insights.json"HTML Output - Auto-generated during processing:
# HTML is automatically created at:
# outputs/Channel Name/Episode Title/blog.html
# Or convert manually:
python3 convert_to_html.py "outputs/Channel Name/Episode Title/final.md"📖 See FEATURES.md for complete documentation and examples.
You can run individual components:
# Activate virtual environment first (if running scripts directly)
source venv/bin/activate
# Clean a transcript
python3 clean_transcript.py transcript_raw/input.txt transcript_clean/output.txt
# Extract insights
python3 extract_insights.py transcript_clean/input.txt insights/output.json
# Generate blog
python3 generate_blog.py transcript_clean/input.txt insights/input.json outputs/blog.md
# Improve blog
python3 improve_blog.py outputs/blog.md outputs/blog_improved.mdNote: The main workflow script (process_podcast.sh) automatically activates the virtual environment, so you don't need to activate it manually when using that script.
Edit process_podcast.sh line 28:
whisper "$AUDIO_FILE" --model base # Options: tiny, base, small, medium, largeThe project uses gemini-2.0-flash by default. To change the model, edit extract_insights.py, generate_blog.py, or improve_blog.py:
response = client.models.generate_content(
model="gemini-2.0-flash", # Change this line
contents=prompt,
config=types.GenerateContentConfig(...)
)Available models:
gemini-2.0-flash(default, fast and efficient)gemini-2.5-flash(newer version)gemini-2.5-pro(more capable, slower)gemini-1.5-flash(older version, still supported)gemini-1.5-pro(older version, still supported)
Final blog posts are in markdown with:
- Title
- Section headers
- Key insights integrated
- Strategic quotes
- Company/product mentions
- 800-1,200 words
The cleaner (clean_transcript.py) does:
- Removes timestamps:
[00:00:00],00:00:00,(00:00:00) - Removes fillers: um, uh, hmm, ah, er, like
- Fixes line breaks (joins partial sentences)
- Creates proper paragraphs
- Normalizes whitespace
Extracts:
- 5-10 key insights (main ideas and lessons)
- 5 strong quotes (memorable and impactful)
- Company/product mentions
Output format: JSON
Creates:
- Clear, analytical blog post
- Well-structured with headers
- Strategic use of quotes
- Natural mention of companies/products
- Markdown formatting
One-iteration critique that:
- Identifies weak, vague, or repetitive content
- Rewrites for specificity and engagement
- Maintains 800-1,200 word length
Minimal dependencies:
yt-dlp: YouTube downloadswhisper: Audio transcriptiongoogle-generativeai: Gemini LLM API access- Python 3.7+
- Bash
No:
- Databases
- Web frameworks
- SaaS tools (beyond Gemini API)
- Complex setup
- Fast setup: Install 3 tools, set API key, run
- Terminal-first: No GUI, no web server
- Simple workflow: One command processes entire podcast
- Modular: Each step is independent Python script
- Clear output: Structured markdown blog posts
- Batch processing (single podcast only)
- Web interface/dashboard
- Database/persistence layer
- Search functionality
- Style matching (uses generic analytical tone)
- Speaker diarization
- Multi-language support
- Cost optimization (uses straightforward API calls)
Batch Mode + Basic UI:
- Process multiple podcasts from a list
- Simple web dashboard to view all outputs
- JSON metadata file tracking processed podcasts
- RSS feed ingestion
- Style matching (analyze sample posts, match tone)
- Better cost tracking (log token usage)
Technical additions:
- Add
batch_process.shthat reads frompodcasts.txt - Create
static/index.htmlto browse outputs folder - Add
metadata.jsonto track podcast→blog mappings - Simple Python HTTP server to serve the dashboard
Whisper fails:
# Check installation
whisper --help
# Try smaller model
whisper audio.mp3 --model tinyGemini API errors:
# Verify key is set
echo $GEMINI_API_KEY
# Get API key at: https://aistudio.google.com/app/apikey
# Check API usage at: https://aistudio.google.comyt-dlp fails:
# Update yt-dlp
pip install -U yt-dlp
# Try different format
yt-dlp -x --audio-format mp3 -f bestaudio <url>Per podcast (assuming 1-hour episode):
- Whisper: Free (local)
- Gemini API: Free tier includes generous usage limits (gemini-2.0-flash)
- Total: ~$0.00 per podcast (within free tier limits)
Note: Check current Gemini API pricing and free tier limits at: https://aistudio.google.com
Public domain / MIT - use however you want.