An intelligent tutoring system that extracts transcripts from YouTube videos and allows you to ask questions about the content using AI. Perfect for students, researchers, and anyone looking to better understand video lectures and educational content.
- YouTube Transcript Extraction: Automatically fetches English transcripts from YouTube videos
- AI-Powered Q&A: Ask questions about video content and get intelligent answers
- Vector Search: Uses semantic search to find relevant transcript segments
- User-Friendly Interface: Clean Streamlit web interface
- Error Handling: Comprehensive error handling for various transcript issues
- Enter a YouTube video URL
- The app extracts the transcript and processes it into searchable chunks
- Ask questions about the video content
- Get AI-powered answers based on the transcript
- Python 3.13+
- uv package manager
- Groq API key (free tier available)
- Clone the repository:
git clone <repository-url>
cd yt-tutor- Install dependencies using uv:
uv sync- Create a
.envfile in the project root and add your Groq API key:
GROQ_API_KEY=your_groq_api_key_hereTo get a free Groq API key:
- Visit Groq Console
- Sign up for a free account
- Navigate to API Keys and create a new key
- Start the application:
uv run streamlit run main.py-
Open your browser and navigate to the displayed URL (typically
http://localhost:8501) -
Enter a YouTube video URL and click "Fetch Transcript"
-
Once the transcript is processed, ask questions about the video content
- streamlit: Web interface framework
- pytube: YouTube video information extraction
- youtube-transcript-api: Transcript fetching from YouTube
- langchain: LLM orchestration framework
- langchain-groq: Groq LLM integration
- langchain-huggingface: HuggingFace embeddings integration
- faiss-cpu: Vector similarity search
- sentence-transformers: Text embeddings
- python-dotenv: Environment variable management
- Videos with English transcripts (auto-generated or manual)
- Public YouTube videos
- Videos with transcripts enabled
- Only supports English transcripts
- Requires videos to have transcripts available
- Depends on YouTube's transcript API availability
- LLM responses are limited by the quality of the transcript
The application handles various common issues:
- Transcripts disabled for video
- No English transcript available
- Video unavailable or private
- Network connectivity issues
- Invalid YouTube URLs
The project includes comprehensive unit tests covering all core functionality.
Basic test run:
uv run pytest tests/ -vWith coverage report:
uv run pytest tests/ --cov=main --cov-report=term-missingUsing the test runner script:
# Basic tests
python test_runner.py
# With coverage
python test_runner.py --coverageThe test suite covers:
- URL parsing and video ID extraction
- Video title fetching (with fallback methods)
- Transcript saving and loading
- Error handling for various YouTube API issues
- Path injection protection
Current test coverage: ~67% of main functionality
When contributing new features:
- Add tests for new functions in
tests/test_main.py - Use pytest fixtures and mocking for external dependencies
- Test both success and failure scenarios
- Ensure tests are independent and can run in any order
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite:
python test_runner.py --coverage - Ensure all tests pass and maintain good coverage
- Submit a pull request
This project is open source and available under the MIT License.
If you encounter any issues or have questions:
- Check the error messages in the app interface
- Ensure your Groq API key is valid
- Verify the YouTube video has transcripts available
- Open an issue on GitHub for bugs or feature requests
- Built with Streamlit for the web interface
- Powered by Groq for fast LLM inference
- Uses LangChain for LLM orchestration
- Transcript extraction via youtube-transcript-api