Skip to content

jbetala7/podcast-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Podcast to Podcast Intelligence

A terminal-first, 1-day MVP that converts podcasts into blog posts.

✨ Features

  • 📁 Smart Organization: Podcasts organized by channel/episode in clean folders
  • 📋 Batch Processing: Process multiple podcasts from a list
  • 📡 RSS Monitoring: Auto-detect new episodes from RSS feeds
  • 💬 Interactive Q&A: Ask questions about podcast content
  • 🌐 HTML Output: Styled webpages auto-generated for each episode
  • 📚 Web Dashboard: Browse all podcasts at outputs/index.html

See FEATURES.md and ORGANIZATION.md for detailed guides.

Folder Structure

podcast_mvp/
├── outputs/                  # Organized by channel and episode
│   ├── Huberman Lab/
│   │   ├── Episode Title 1/
│   │   │   ├── audio.mp3
│   │   │   ├── transcript_raw.txt
│   │   │   ├── transcript_clean.txt
│   │   │   ├── insights.json
│   │   │   ├── draft.md
│   │   │   ├── final.md
│   │   │   ├── blog.html
│   │   │   └── metadata.json
│   │   └── Episode Title 2/
│   │       └── ...
│   ├── Lex Fridman Podcast/
│   │   └── ...
│   └── ...
│
├── logs/                     # Batch processing logs
│
├── setup.sh                  # Initial setup script
├── process_podcast.sh        # Main workflow (audio → blog)
├── extract_metadata.py       # Extract channel/title metadata
│
├── batch_process.sh          # 📋 Process multiple podcasts from list
├── monitor_rss.py            # 📡 Monitor RSS feeds for new episodes
├── interactive_qa.py         # 💬 Ask questions about podcasts
├── convert_to_html.py        # 🌐 Convert markdown to HTML
│
├── clean_transcript.py       # Core processing scripts
├── extract_insights.py
├── generate_blog.py
└── improve_blog.py

Prerequisites

Quick Setup (Recommended)

Run the setup.sh script to create a virtual environment and install all dependencies:

./setup.sh

The setup.sh script will:

  • Check for Python 3 installation
  • Create a Python virtual environment (venv/) if it doesn't exist
  • Activate the virtual environment
  • Install all required Python packages from requirements.txt
  • Check for yt-dlp installation
  • Verify GEMINI_API_KEY is set (or provide instructions)
  • Make workflow scripts executable (process_podcast.sh, generate_from_transcript.sh, etc.)

Note: You only need to run setup.sh once after cloning/downloading the project.

Manual Setup

If you prefer to set up manually:

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate  # On macOS/Linux
# or: venv\Scripts\activate  # On Windows

# Install Python dependencies
pip install -r requirements.txt

# Install yt-dlp (for YouTube downloads)
# Option 1: System-wide
brew install yt-dlp
# Option 2: In virtual environment
pip install yt-dlp

Note: The scripts (process_podcast.sh, etc.) automatically activate the virtual environment if it exists. If you run Python scripts directly, make sure to activate the venv first.

Set Your Gemini API Key

export GEMINI_API_KEY="your-api-key"
# Get your free API key from: https://aistudio.google.com/app/apikey

Or create a .env file.

Usage

One-Command Workflow

Full Pipeline (Audio → Blog)

Use process_podcast.sh to process a podcast from audio to blog post:

./process_podcast.sh <youtube_url_or_mp3_path>

Examples:

# From YouTube URL
./process_podcast.sh "https://www.youtube.com/watch?v=..."

# From local MP3
./process_podcast.sh /path/to/podcast.mp3

Generate from Existing Transcript

If you already have a cleaned transcript, use generate_from_transcript.sh to skip the audio download and transcription steps:

./generate_from_transcript.sh <clean_transcript_file>

Examples:

# From a cleaned transcript
./generate_from_transcript.sh transcript_clean/episode1_clean.txt

# The script will:
# 1. Extract insights from the transcript
# 2. Generate a blog draft
# 3. Optionally run an improvement pass

Note: This is useful if you've already transcribed audio elsewhere or want to regenerate a blog post from an existing transcript.

What Happens

  1. Audio Ingestion: Downloads from YouTube (via yt-dlp) or copies local MP3
  2. Transcription: Uses Whisper (base model) to generate text
  3. Cleaning: Removes timestamps, filler words, fixes formatting
  4. Insight Extraction: LLM extracts key ideas, quotes, company mentions
  5. Blog Generation: Creates 800-1,200 word blog post
  6. Optional Improvement: One critique + revision pass

Advanced Features Quick Reference

Batch Processing - Process multiple podcasts:

# Create podcasts.txt with URLs/paths, then:
./batch_process.sh podcasts.txt

RSS Monitoring - Auto-detect new episodes:

# Create feeds.txt with RSS URLs, then:
python3 monitor_rss.py feeds.txt
./batch_process.sh podcasts_new.txt  # Process new episodes

Interactive Q&A - Ask questions about a podcast:

python3 interactive_qa.py "outputs/Channel Name/Episode Title/transcript_clean.txt" \
                          "outputs/Channel Name/Episode Title/insights.json"

HTML Output - Auto-generated during processing:

# HTML is automatically created at:
# outputs/Channel Name/Episode Title/blog.html

# Or convert manually:
python3 convert_to_html.py "outputs/Channel Name/Episode Title/final.md"

📖 See FEATURES.md for complete documentation and examples.

Manual Steps

You can run individual components:

# Activate virtual environment first (if running scripts directly)
source venv/bin/activate

# Clean a transcript
python3 clean_transcript.py transcript_raw/input.txt transcript_clean/output.txt

# Extract insights
python3 extract_insights.py transcript_clean/input.txt insights/output.json

# Generate blog
python3 generate_blog.py transcript_clean/input.txt insights/input.json outputs/blog.md

# Improve blog
python3 improve_blog.py outputs/blog.md outputs/blog_improved.md

Note: The main workflow script (process_podcast.sh) automatically activates the virtual environment, so you don't need to activate it manually when using that script.

Configuration

Whisper Model

Edit process_podcast.sh line 28:

whisper "$AUDIO_FILE" --model base  # Options: tiny, base, small, medium, large

Gemini Model

The project uses gemini-2.0-flash by default. To change the model, edit extract_insights.py, generate_blog.py, or improve_blog.py:

response = client.models.generate_content(
    model="gemini-2.0-flash",  # Change this line
    contents=prompt,
    config=types.GenerateContentConfig(...)
)

Available models:

  • gemini-2.0-flash (default, fast and efficient)
  • gemini-2.5-flash (newer version)
  • gemini-2.5-pro (more capable, slower)
  • gemini-1.5-flash (older version, still supported)
  • gemini-1.5-pro (older version, still supported)

Output Format

Final blog posts are in markdown with:

  • Title
  • Section headers
  • Key insights integrated
  • Strategic quotes
  • Company/product mentions
  • 800-1,200 words

Transcript Cleaning Details

The cleaner (clean_transcript.py) does:

  • Removes timestamps: [00:00:00], 00:00:00, (00:00:00)
  • Removes fillers: um, uh, hmm, ah, er, like
  • Fixes line breaks (joins partial sentences)
  • Creates proper paragraphs
  • Normalizes whitespace

Insight Extraction Prompt

Extracts:

  • 5-10 key insights (main ideas and lessons)
  • 5 strong quotes (memorable and impactful)
  • Company/product mentions

Output format: JSON

Blog Generation Prompt

Creates:

  • Clear, analytical blog post
  • Well-structured with headers
  • Strategic use of quotes
  • Natural mention of companies/products
  • Markdown formatting

Improvement Pass

One-iteration critique that:

  • Identifies weak, vague, or repetitive content
  • Rewrites for specificity and engagement
  • Maintains 800-1,200 word length

Dependencies

Minimal dependencies:

  • yt-dlp: YouTube downloads
  • whisper: Audio transcription
  • google-generativeai: Gemini LLM API access
  • Python 3.7+
  • Bash

No:

  • Databases
  • Web frameworks
  • SaaS tools (beyond Gemini API)
  • Complex setup

MVP Summary

What This Does Well

  • Fast setup: Install 3 tools, set API key, run
  • Terminal-first: No GUI, no web server
  • Simple workflow: One command processes entire podcast
  • Modular: Each step is independent Python script
  • Clear output: Structured markdown blog posts

What's Intentionally Missing

  • Batch processing (single podcast only)
  • Web interface/dashboard
  • Database/persistence layer
  • Search functionality
  • Style matching (uses generic analytical tone)
  • Speaker diarization
  • Multi-language support
  • Cost optimization (uses straightforward API calls)

Next Upgrade Would Be

Batch Mode + Basic UI:

  1. Process multiple podcasts from a list
  2. Simple web dashboard to view all outputs
  3. JSON metadata file tracking processed podcasts
  4. RSS feed ingestion
  5. Style matching (analyze sample posts, match tone)
  6. Better cost tracking (log token usage)

Technical additions:

  • Add batch_process.sh that reads from podcasts.txt
  • Create static/index.html to browse outputs folder
  • Add metadata.json to track podcast→blog mappings
  • Simple Python HTTP server to serve the dashboard

Troubleshooting

Whisper fails:

# Check installation
whisper --help

# Try smaller model
whisper audio.mp3 --model tiny

Gemini API errors:

# Verify key is set
echo $GEMINI_API_KEY

# Get API key at: https://aistudio.google.com/app/apikey
# Check API usage at: https://aistudio.google.com

yt-dlp fails:

# Update yt-dlp
pip install -U yt-dlp

# Try different format
yt-dlp -x --audio-format mp3 -f bestaudio <url>

Cost Estimates

Per podcast (assuming 1-hour episode):

  • Whisper: Free (local)
  • Gemini API: Free tier includes generous usage limits (gemini-2.0-flash)
  • Total: ~$0.00 per podcast (within free tier limits)

Note: Check current Gemini API pricing and free tier limits at: https://aistudio.google.com

License

Public domain / MIT - use however you want.

About

Podcast processing and intelligence tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors