╔══════════════════════════════════════════════════════════════════╗
║ ║
║ H Y B R I D R E C ║
║ ───────────────────────────────────────────────────────── ║
║ Hybrid Recommender System · Leona Goel
║ ║
╚══════════════════════════════════════════════════════════════════╝
Important
🟢 This is the active GSSoC project repo — open all issues and PRs here only.
A production-ready recommender fusing Content-Based Filtering (TF-IDF), Collaborative Filtering (SVD), and NLP Sentiment Analysis (VADER) with a tunable weighted scoring engine — backed by Supabase PostgreSQL, served via FastAPI, and built to be dataset-agnostic by design.
25,000+ products · Sub-50ms search · 3 ML models fused · ~60% faster integration
The core insight: blend three independent signals, each capturing something the others miss.
User Reviews (text) ──→ NLP Engine (VADER Sentiment) ──┐
Item Metadata (title/desc) ──→ Content Vectorization (TF-IDF) ──┼──→ Weighted Hybrid ──→ Ranked Results
User Purchases (clicks/buys) ──→ Matrix Factorization (SVD) ──┘ Engine
Hybrid Score = α · content_score [TF-IDF cosine similarity]
+ β · collab_score [Truncated SVD latent space]
+ γ · sentiment_score [VADER compound polarity]
// α, β, γ are live-tunable via API or UI sliders
α — Content Model · TF-IDF + Cosine Similarity
Item metadata (title + description + category) vectorized with TF-IDF (unigrams + bigrams, max 5,000 features). On-the-fly cosine similarity yields content_score ∈ [0, 1]. Fast, interpretable, and requires zero user history — ideal for cold-start.
β — Collaborative Model · Truncated SVD
User-item interaction matrix built from purchases + implicit feedback (views, clicks). SVD reduces to 50 latent factors; cosine similarity in latent space yields collab_score. Adaptive rank automatically reduces SVD components for sparse matrices.
γ — Sentiment Model · NLTK VADER
Review text analyzed for compound polarity ∈ [-1, 1]. Per-item aggregation → Min-Max normalization → sentiment_score ∈ [0, 1]. Surfaces genuinely loved products, not just popular ones.
❄ Cold-Start Handling
- Bayesian average rating — prevents 1-review, 5-star bias
- Popularity-based fallback — ranks new items by review count and category similarity
- Mock user seeding — synthetic purchase history to bootstrap collaborative filtering
| Feature | Detail |
|---|---|
PostgreSQL FTS |
GIN-indexed full-text search — sub-50ms on 250k+ rows |
Supabase Auth |
Guest (anonymous) and email/password, Row-Level Security on all tables |
Tunable Weights |
Live α/β/γ sliders to adjust recommendation blend in real time |
Dataset-Agnostic |
Fuzzy column detection (product_name → title) cuts integration time by ~60% |
Cold-Start Resilient |
Bayesian avg rating + popularity fallback for new users and items |
Type-to-Search |
Global keyboard capture — start typing anywhere to search instantly |
Responsive UI |
Amazon-inspired dark header, 4→3→2→1 column card grid across breakpoints |
Secure by Default |
Pydantic validation, parameterized queries, CORS-restricted, no stack-trace leakage |
Streamlit UI |
Local CSV upload → build models → recommendations, no Supabase or server required |
┌─────────────────┬────────────────────────────────────────────────┐
│ Layer │ Technology │
├─────────────────┼────────────────────────────────────────────────┤
│ Backend │ Python 3.10+, FastAPI, Uvicorn │
│ Database │ Supabase (PostgreSQL), Row-Level Security │
│ Search │ PostgreSQL FTS (GIN indexes, ts_rank) │
│ Auth │ Supabase Auth (anonymous + email/password) │
│ ML — Content │ scikit-learn: TF-IDF Vectorizer, Cosine Sim │
│ ML — Collab │ scikit-learn: TruncatedSVD, SciPy sparse │
│ NLP │ NLTK VADER SentimentIntensityAnalyzer │
│ Data │ Pandas, NumPy │
│ Frontend │ HTML5, CSS3, Vanilla JS, Supabase JS v2 │
└─────────────────┴────────────────────────────────────────────────┘
hybrid-recommender/
│
├── backend/
│ └── main.py # FastAPI server — search, upload, build, recommend
│
├── frontend/
│ ├── index.html # Single-page UI (Amazon-like layout)
│ ├── styles.css # Design system (dark header, cards, animations)
│ └── app.js # Frontend logic (auth, search, rendering)
│
├── scripts/
│ ├── generate_sample_data.py # Synthetic test dataset generator
│ ├── import_to_supabase.py # Batch import CSV/JSON → PostgreSQL
│ └── seed_mock_data.py # Mock users + purchases for cold-start bootstrap
│
├── data_adapter.py # ⭐ Auto column detection + schema normalization
├── content_model.py # TF-IDF content-based recommender
├── collaborative_model.py # SVD collaborative recommender + implicit feedback
├── hybrid_model.py # Weighted hybrid engine (Bayesian avg, popularity)
├── nlp_engine.py # VADER sentiment analysis pipeline
├── evaluation.py # Precision@K, Recall@K, NDCG@K benchmarks
├── db.py # Supabase client singleton (anon + admin)
├── app.py # Streamlit UI — upload CSV, build models, get recommendations
├── requirements.txt
├── .env.example
└── SETUP.md
Prerequisites: Python 3.10+ · Supabase account (free tier works)
# 1 — Clone & install
git clone https://github.com/leonagoel/hybrid-recommender.git
cd hybrid-recommender
pip install -r requirements.txt# 2 — Configure Supabase
cp .env.example .env
# Fill in from: Supabase Dashboard → Settings → APISUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-role-key # Required for bulk import# 3 — Run SQL migrations
# See SETUP.md for full schema → paste into Supabase SQL Editor
# 4 — Start the server
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000Open http://localhost:8000, upload any CSV/JSON from datasets/, click Build Models, then start typing to search.
# After cloning and installing dependencies (step 1 above)
streamlit run app.pyUpload any CSV file, click Build Models, then enter an item name or User ID to get recommendations directly in your browser — no database or server setup needed.
GET /api/config → Supabase public config
GET /api/status → System status + product count
GET /api/search?q=...&limit=20 → Full-text search (PostgreSQL FTS)
POST /api/upload → Upload CSV/JSON dataset
POST /api/build → Train TF-IDF, SVD, VADER models
GET /api/recommend/{title} → Hybrid recommendations for an item
GET /api/items?page=1&per_page=50 → Paginated product listing
GET /api/categories → All available categories
GET /api/weights → Current α, β, γ blend weights
PUT /api/weights → Update blend weights live
GET /api/purchases/{user_id} → User purchase history
POST /api/purchases → Record a purchase event
python evaluation.pyBenchmarks Content-Only, Collab-Only, Sentiment-Only, and Hybrid across:
Precision@K — fraction of relevant items in top-K
Recall@K — fraction of all relevant items retrieved
NDCG@K — ranking quality (discounted cumulative gain)
✓ No hardcoded credentials — config served via /api/config
✓ .env excluded from git via .gitignore
✓ CORS restricted to configured origins
✓ Row-Level Security (RLS) on all Supabase tables
✓ Input validation via Pydantic models
✓ Generic error messages — no stack trace leakage
✓ SQL injection safe (Supabase SDK parameterized queries)
If you see:
ModuleNotFoundError: No module named 'xyz'Run:
pip install -r requirements.txtIf port 8000 is busy:
python -m uvicorn backend.main:app --port 8001Run Python shell:
import nltk
nltk.download('vader_lexicon')Install Streamlit manually:
pip install streamlitCheck your .env file:
SUPABASE_URL=your_url
SUPABASE_ANON_KEY=your_key
SUPABASE_SERVICE_KEY=your_service_keyMake sure:
- No extra spaces
- No quotes
- Correct project credentials
Run:
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000Open:
http://localhost:8000/api/status
Expected response:
{
"status": "ok"
}Run:
streamlit run app.pyExpected:
- Browser opens automatically
- CSV upload interface visible
- Recommendation UI loads successfully
Upload any sample CSV and verify:
- Dataset loads without errors
- Models build successfully
- Recommendations appear
git remote add upstream https://github.com/leonagoel/hybrid-recommender.git
git fetch upstream
git merge upstream/mainIf conflicts happen:
- Open conflicted files
- Remove conflict markers:
<<<<<<< ======= >>>>>>> - Keep correct code
- Save file
- Commit again
Before submitting PR:
- Project runs successfully
- README formatting checked
- No unnecessary files added
- Branch name follows guidelines
- Commit message follows convention
- PR linked to issue
MIT — see LICENSE
Built by Leona Goel
B.Tech CSE · Vellore Institute of Technology
National Finalist · Smart India Hackathon 2025 · Top 8% of 950+ Teams


