A Retrieval-Augmented Generation (RAG) chatbot designed to provide financial advice based on YouTube transcripts from the "Grow with Nav" channel. This tool helps immigrants understand and navigate the path to financial retirement in Canada through personalized, context-aware responses.
https://www.loom.com/share/d61eed2b253c4eac8f42884b1b09cb64
This project is a dedicated case study born from following the "Grow with Nav" YouTube channel closely. The channel is a premier resource for immigrants in Canada, focusing on personal finance, real estate, and mortgage strategies with a mission to help newcomers retire with a $5M portfolio. As a mortgage agent himself, Nav provides invaluable insights that this project aims to digitize and scale.
Mortgage agents often find themselves "on their toes" 24/7, constantly answering repetitive client queries about rates, policies, and strategies. This manual overhead limits their ability to focus on complex cases. This project solves this by:
- Providing an automated AI assistant that leverages Nav's specific expertise.
- Drastically reducing the response time for common financial and mortgage-related questions.
- Transcribing and indexing massive video libraries into a searchable, interactive knowledge base.
- Automated Transcription: Uses
yt-dlpto fetch audio andopenai-whisperfor high-quality local transcriptions. - Efficient Chunking: Stream-based text processing to handle large transcripts without exceeding memory limits.
- Fast Similarity Search: Implements
FAISSwith normalized embeddings (Cosine Similarity) for near-instant retrieval of relevant video segments. - RAG-Powered Chat: Uses OpenAI's
gpt-4o-minito generate conversational answers strictly based on the provided video context. - Memory Optimized: Designed to run efficiently on a 16GB RAM machine using batching and disk caching.
The project follows a standard RAG pipeline optimized for local resource constraints:
-
Extraction:
yt-dlpextracts video metadata and audio files from the YouTube channel. -
Transcription:
openai-whisperconverts audio to text, which is saved as JSON. -
Indexing:
-
SentenceTransformer(all-MiniLM-L6-v2) generates 384-dimensional embeddings. -
FAISSstores these vectors in anIndexFlatIPstructure for efficient similarity searching.
-
-
Retrieval: User queries are embedded and matched against the FAISS index to find the top
$K$ relevant chunks. - Generation: The retrieved context, conversation history, and user query are sent to GPT-4o-mini to produce the final response.
- Core:
flask,python-dotenv,openai - Transcription:
yt-dlp,openai-whisper,torch - NLP & Search:
sentence-transformers,faiss-cpu,numpy,pandas - Utilities:
tqdm
- Web UI: Enhance the interactive test script into a full-stack web application using React or Next.js.
- Vector Store Scaling: Consider migrating to a managed vector database (like Pinecone or Weaviate) if the video library grows significantly.
- Agentic Workflows: Implement tools for the chatbot to perform live mortgage calculations or link directly to external financial calculators.
- Metadata Enhancement: Add more granular metadata like timestamps to the FAISS index to allow linking users to the exact second in the video.
- Environment: An
.envfile containing a validOPENAI_KEYmust be present in the root directory. - Hardware: Transcription (Whisper) assumes a CPU/GPU capable of running machine learning models; 16GB RAM is the target baseline.
- Data Source: Chunks and transcripts are expected to be in the
data/kb_data/directory for the chatbot to function.