Podcast to Podcast Intelligence

A terminal-first, 1-day MVP that converts podcasts into blog posts.

✨ Features

📁 Smart Organization: Podcasts organized by channel/episode in clean folders
📋 Batch Processing: Process multiple podcasts from a list
📡 RSS Monitoring: Auto-detect new episodes from RSS feeds
💬 Interactive Q&A: Ask questions about podcast content
🌐 HTML Output: Styled webpages auto-generated for each episode
📚 Web Dashboard: Browse all podcasts at outputs/index.html

See FEATURES.md and ORGANIZATION.md for detailed guides.

Folder Structure

podcast_mvp/
├── outputs/                  # Organized by channel and episode
│   ├── Huberman Lab/
│   │   ├── Episode Title 1/
│   │   │   ├── audio.mp3
│   │   │   ├── transcript_raw.txt
│   │   │   ├── transcript_clean.txt
│   │   │   ├── insights.json
│   │   │   ├── draft.md
│   │   │   ├── final.md
│   │   │   ├── blog.html
│   │   │   └── metadata.json
│   │   └── Episode Title 2/
│   │       └── ...
│   ├── Lex Fridman Podcast/
│   │   └── ...
│   └── ...
│
├── logs/                     # Batch processing logs
│
├── setup.sh                  # Initial setup script
├── process_podcast.sh        # Main workflow (audio → blog)
├── extract_metadata.py       # Extract channel/title metadata
│
├── batch_process.sh          # 📋 Process multiple podcasts from list
├── monitor_rss.py            # 📡 Monitor RSS feeds for new episodes
├── interactive_qa.py         # 💬 Ask questions about podcasts
├── convert_to_html.py        # 🌐 Convert markdown to HTML
│
├── clean_transcript.py       # Core processing scripts
├── extract_insights.py
├── generate_blog.py
└── improve_blog.py

Prerequisites

Quick Setup (Recommended)

Run the setup.sh script to create a virtual environment and install all dependencies:

./setup.sh

The setup.sh script will:

Check for Python 3 installation
Create a Python virtual environment (venv/) if it doesn't exist
Activate the virtual environment
Install all required Python packages from requirements.txt
Check for yt-dlp installation
Verify GEMINI_API_KEY is set (or provide instructions)
Make workflow scripts executable (process_podcast.sh, generate_from_transcript.sh, etc.)

Note: You only need to run setup.sh once after cloning/downloading the project.

Manual Setup

If you prefer to set up manually:

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate  # On macOS/Linux
# or: venv\Scripts\activate  # On Windows

# Install Python dependencies
pip install -r requirements.txt

# Install yt-dlp (for YouTube downloads)
# Option 1: System-wide
brew install yt-dlp
# Option 2: In virtual environment
pip install yt-dlp

Note: The scripts (process_podcast.sh, etc.) automatically activate the virtual environment if it exists. If you run Python scripts directly, make sure to activate the venv first.

Set Your Gemini API Key

export GEMINI_API_KEY="your-api-key"
# Get your free API key from: https://aistudio.google.com/app/apikey

Or create a .env file.

Usage

One-Command Workflow

Full Pipeline (Audio → Blog)

Use process_podcast.sh to process a podcast from audio to blog post:

./process_podcast.sh <youtube_url_or_mp3_path>

Examples:

# From YouTube URL
./process_podcast.sh "https://www.youtube.com/watch?v=..."

# From local MP3
./process_podcast.sh /path/to/podcast.mp3

Generate from Existing Transcript

If you already have a cleaned transcript, use generate_from_transcript.sh to skip the audio download and transcription steps:

./generate_from_transcript.sh <clean_transcript_file>

Examples:

# From a cleaned transcript
./generate_from_transcript.sh transcript_clean/episode1_clean.txt

# The script will:
# 1. Extract insights from the transcript
# 2. Generate a blog draft
# 3. Optionally run an improvement pass

Note: This is useful if you've already transcribed audio elsewhere or want to regenerate a blog post from an existing transcript.

What Happens

Audio Ingestion: Downloads from YouTube (via yt-dlp) or copies local MP3
Transcription: Uses Whisper (base model) to generate text
Cleaning: Removes timestamps, filler words, fixes formatting
Insight Extraction: LLM extracts key ideas, quotes, company mentions
Blog Generation: Creates 800-1,200 word blog post
Optional Improvement: One critique + revision pass

Advanced Features Quick Reference

Batch Processing - Process multiple podcasts:

# Create podcasts.txt with URLs/paths, then:
./batch_process.sh podcasts.txt

RSS Monitoring - Auto-detect new episodes:

# Create feeds.txt with RSS URLs, then:
python3 monitor_rss.py feeds.txt
./batch_process.sh podcasts_new.txt  # Process new episodes

Interactive Q&A - Ask questions about a podcast:

python3 interactive_qa.py "outputs/Channel Name/Episode Title/transcript_clean.txt" \
                          "outputs/Channel Name/Episode Title/insights.json"

HTML Output - Auto-generated during processing:

# HTML is automatically created at:
# outputs/Channel Name/Episode Title/blog.html

# Or convert manually:
python3 convert_to_html.py "outputs/Channel Name/Episode Title/final.md"

📖 See FEATURES.md for complete documentation and examples.

Manual Steps

You can run individual components:

# Activate virtual environment first (if running scripts directly)
source venv/bin/activate

# Clean a transcript
python3 clean_transcript.py transcript_raw/input.txt transcript_clean/output.txt

# Extract insights
python3 extract_insights.py transcript_clean/input.txt insights/output.json

# Generate blog
python3 generate_blog.py transcript_clean/input.txt insights/input.json outputs/blog.md

# Improve blog
python3 improve_blog.py outputs/blog.md outputs/blog_improved.md

Note: The main workflow script (process_podcast.sh) automatically activates the virtual environment, so you don't need to activate it manually when using that script.

Configuration

Whisper Model

Edit process_podcast.sh line 28:

whisper "$AUDIO_FILE" --model base  # Options: tiny, base, small, medium, large

Gemini Model

The project uses gemini-2.0-flash by default. To change the model, edit extract_insights.py, generate_blog.py, or improve_blog.py:

response = client.models.generate_content(
    model="gemini-2.0-flash",  # Change this line
    contents=prompt,
    config=types.GenerateContentConfig(...)
)

Available models:

gemini-2.0-flash (default, fast and efficient)
gemini-2.5-flash (newer version)
gemini-2.5-pro (more capable, slower)
gemini-1.5-flash (older version, still supported)
gemini-1.5-pro (older version, still supported)

Output Format

Final blog posts are in markdown with:

Title
Section headers
Key insights integrated
Strategic quotes
Company/product mentions
800-1,200 words

Transcript Cleaning Details

The cleaner (clean_transcript.py) does:

Removes timestamps: [00:00:00], 00:00:00, (00:00:00)
Removes fillers: um, uh, hmm, ah, er, like
Fixes line breaks (joins partial sentences)
Creates proper paragraphs
Normalizes whitespace

Insight Extraction Prompt

Extracts:

5-10 key insights (main ideas and lessons)
5 strong quotes (memorable and impactful)
Company/product mentions

Output format: JSON

Blog Generation Prompt

Creates:

Clear, analytical blog post
Well-structured with headers
Strategic use of quotes
Natural mention of companies/products
Markdown formatting

Improvement Pass

One-iteration critique that:

Identifies weak, vague, or repetitive content
Rewrites for specificity and engagement
Maintains 800-1,200 word length

Dependencies

Minimal dependencies:

yt-dlp: YouTube downloads
whisper: Audio transcription
google-generativeai: Gemini LLM API access
Python 3.7+
Bash

No:

Databases
Web frameworks
SaaS tools (beyond Gemini API)
Complex setup

MVP Summary

What This Does Well

Fast setup: Install 3 tools, set API key, run
Terminal-first: No GUI, no web server
Simple workflow: One command processes entire podcast
Modular: Each step is independent Python script
Clear output: Structured markdown blog posts

What's Intentionally Missing

Batch processing (single podcast only)
Web interface/dashboard
Database/persistence layer
Search functionality
Style matching (uses generic analytical tone)
Speaker diarization
Multi-language support
Cost optimization (uses straightforward API calls)

Next Upgrade Would Be

Batch Mode + Basic UI:

Process multiple podcasts from a list
Simple web dashboard to view all outputs
JSON metadata file tracking processed podcasts
RSS feed ingestion
Style matching (analyze sample posts, match tone)
Better cost tracking (log token usage)

Technical additions:

Add batch_process.sh that reads from podcasts.txt
Create static/index.html to browse outputs folder
Add metadata.json to track podcast→blog mappings
Simple Python HTTP server to serve the dashboard

Troubleshooting

Whisper fails:

# Check installation
whisper --help

# Try smaller model
whisper audio.mp3 --model tiny

Gemini API errors:

# Verify key is set
echo $GEMINI_API_KEY

# Get API key at: https://aistudio.google.com/app/apikey
# Check API usage at: https://aistudio.google.com

yt-dlp fails:

# Update yt-dlp
pip install -U yt-dlp

# Try different format
yt-dlp -x --audio-format mp3 -f bestaudio <url>

Cost Estimates

Per podcast (assuming 1-hour episode):

Whisper: Free (local)
Gemini API: Free tier includes generous usage limits (gemini-2.0-flash)
Total: ~$0.00 per podcast (within free tier limits)

Note: Check current Gemini API pricing and free tier limits at: https://aistudio.google.com

License

Public domain / MIT - use however you want.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
FEATURES.md		FEATURES.md
ORGANIZATION.md		ORGANIZATION.md
README.md		README.md
batch_process.sh		batch_process.sh
clean_transcript.py		clean_transcript.py
convert_to_html.py		convert_to_html.py
extract_insights.py		extract_insights.py
extract_metadata.py		extract_metadata.py
feeds.txt.example		feeds.txt.example
generate_blog.py		generate_blog.py
generate_from_transcript.sh		generate_from_transcript.sh
generate_index.py		generate_index.py
generate_index_golden_age.py		generate_index_golden_age.py
generate_index_library.py		generate_index_library.py
generate_index_library_v2.py		generate_index_library_v2.py
generate_summary.py		generate_summary.py
improve_blog.py		improve_blog.py
instructions .rtf		instructions .rtf
interactive_qa.py		interactive_qa.py
list_podcasts.sh		list_podcasts.sh
monitor_rss.py		monitor_rss.py
podcasts.txt.example		podcasts.txt.example
process_podcast.sh		process_podcast.sh
qa.sh		qa.sh
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Podcast to Podcast Intelligence

✨ Features

Folder Structure

Prerequisites

Quick Setup (Recommended)

Manual Setup

Set Your Gemini API Key

Usage

One-Command Workflow

Full Pipeline (Audio → Blog)

Generate from Existing Transcript

What Happens

Advanced Features Quick Reference

Manual Steps

Configuration

Whisper Model

Gemini Model

Output Format

Transcript Cleaning Details

Insight Extraction Prompt

Blog Generation Prompt

Improvement Pass

Dependencies

MVP Summary

What This Does Well

What's Intentionally Missing

Next Upgrade Would Be

Troubleshooting

Cost Estimates

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages