Skip to content

gTanusri/hybrid-recommender

 
 

Repository files navigation

╔══════════════════════════════════════════════════════════════════╗
║                                                                  ║
║    H Y B R I D R E C                                             ║
║    ─────────────────────────────────────────────────────────     ║
║    Hybrid Recommender System · Leona Goel      
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝

Python FastAPI Supabase scikit-learn NLTK MIT License CI License Python Version


Important

🟢 This is the active GSSoC project repo — open all issues and PRs here only.


A production-ready recommender fusing Content-Based Filtering (TF-IDF), Collaborative Filtering (SVD), and NLP Sentiment Analysis (VADER) with a tunable weighted scoring engine — backed by Supabase PostgreSQL, served via FastAPI, and built to be dataset-agnostic by design.

25,000+ products  ·  Sub-50ms search  ·  3 ML models fused  ·  ~60% faster integration

01 — Architecture

The core insight: blend three independent signals, each capturing something the others miss.

User Reviews (text)           ──→  NLP Engine (VADER Sentiment)    ──┐
Item Metadata (title/desc)    ──→  Content Vectorization (TF-IDF)  ──┼──→  Weighted Hybrid  ──→  Ranked Results
User Purchases (clicks/buys)  ──→  Matrix Factorization (SVD)      ──┘         Engine

     Hybrid Score  =  α · content_score        [TF-IDF cosine similarity]
                    + β · collab_score          [Truncated SVD latent space]
                    + γ · sentiment_score       [VADER compound polarity]

     // α, β, γ are live-tunable via API or UI sliders
α — Content Model  ·  TF-IDF + Cosine Similarity

Item metadata (title + description + category) vectorized with TF-IDF (unigrams + bigrams, max 5,000 features). On-the-fly cosine similarity yields content_score ∈ [0, 1]. Fast, interpretable, and requires zero user history — ideal for cold-start.

β — Collaborative Model  ·  Truncated SVD

User-item interaction matrix built from purchases + implicit feedback (views, clicks). SVD reduces to 50 latent factors; cosine similarity in latent space yields collab_score. Adaptive rank automatically reduces SVD components for sparse matrices.

γ — Sentiment Model  ·  NLTK VADER

Review text analyzed for compound polarity ∈ [-1, 1]. Per-item aggregation → Min-Max normalization → sentiment_score ∈ [0, 1]. Surfaces genuinely loved products, not just popular ones.

❄ Cold-Start Handling
  • Bayesian average rating — prevents 1-review, 5-star bias
  • Popularity-based fallback — ranks new items by review count and category similarity
  • Mock user seeding — synthetic purchase history to bootstrap collaborative filtering

02 — Features

Feature Detail
PostgreSQL FTS GIN-indexed full-text search — sub-50ms on 250k+ rows
Supabase Auth Guest (anonymous) and email/password, Row-Level Security on all tables
Tunable Weights Live α/β/γ sliders to adjust recommendation blend in real time
Dataset-Agnostic Fuzzy column detection (product_nametitle) cuts integration time by ~60%
Cold-Start Resilient Bayesian avg rating + popularity fallback for new users and items
Type-to-Search Global keyboard capture — start typing anywhere to search instantly
Responsive UI Amazon-inspired dark header, 4→3→2→1 column card grid across breakpoints
Secure by Default Pydantic validation, parameterized queries, CORS-restricted, no stack-trace leakage
Streamlit UI Local CSV upload → build models → recommendations, no Supabase or server required

03 — Tech Stack

┌─────────────────┬────────────────────────────────────────────────┐
│ Layer           │ Technology                                      │
├─────────────────┼────────────────────────────────────────────────┤
│ Backend         │ Python 3.10+, FastAPI, Uvicorn                 │
│ Database        │ Supabase (PostgreSQL), Row-Level Security       │
│ Search          │ PostgreSQL FTS (GIN indexes, ts_rank)          │
│ Auth            │ Supabase Auth (anonymous + email/password)      │
│ ML — Content    │ scikit-learn: TF-IDF Vectorizer, Cosine Sim    │
│ ML — Collab     │ scikit-learn: TruncatedSVD, SciPy sparse       │
│ NLP             │ NLTK VADER SentimentIntensityAnalyzer           │
│ Data            │ Pandas, NumPy                                   │
│ Frontend        │ HTML5, CSS3, Vanilla JS, Supabase JS v2        │
└─────────────────┴────────────────────────────────────────────────┘

04 — Project Structure

hybrid-recommender/
│
├── backend/
│   └── main.py                  # FastAPI server — search, upload, build, recommend
│
├── frontend/
│   ├── index.html               # Single-page UI (Amazon-like layout)
│   ├── styles.css               # Design system (dark header, cards, animations)
│   └── app.js                   # Frontend logic (auth, search, rendering)
│
├── scripts/
│   ├── generate_sample_data.py  # Synthetic test dataset generator
│   ├── import_to_supabase.py    # Batch import CSV/JSON → PostgreSQL
│   └── seed_mock_data.py        # Mock users + purchases for cold-start bootstrap
│
├── data_adapter.py              # ⭐ Auto column detection + schema normalization
├── content_model.py             # TF-IDF content-based recommender
├── collaborative_model.py       # SVD collaborative recommender + implicit feedback
├── hybrid_model.py              # Weighted hybrid engine (Bayesian avg, popularity)
├── nlp_engine.py                # VADER sentiment analysis pipeline
├── evaluation.py                # Precision@K, Recall@K, NDCG@K benchmarks
├── db.py                        # Supabase client singleton (anon + admin)
├── app.py                       # Streamlit UI — upload CSV, build models, get recommendations
├── requirements.txt
├── .env.example
└── SETUP.md

05 — Quick Start

Prerequisites: Python 3.10+ · Supabase account (free tier works)

# 1 — Clone & install
git clone https://github.com/leonagoel/hybrid-recommender.git 
cd hybrid-recommender
pip install -r requirements.txt
# 2 — Configure Supabase
cp .env.example .env
# Fill in from: Supabase Dashboard → Settings → API
SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-role-key   # Required for bulk import
# 3 — Run SQL migrations
# See SETUP.md for full schema → paste into Supabase SQL Editor

# 4 — Start the server
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000

Open http://localhost:8000, upload any CSV/JSON from datasets/, click Build Models, then start typing to search.

Alternative — Streamlit UI (no Supabase required)

# After cloning and installing dependencies (step 1 above)
streamlit run app.py

Upload any CSV file, click Build Models, then enter an item name or User ID to get recommendations directly in your browser — no database or server setup needed.

06 — API Reference

GET    /api/config                   →  Supabase public config
GET    /api/status                   →  System status + product count
GET    /api/search?q=...&limit=20    →  Full-text search (PostgreSQL FTS)
POST   /api/upload                   →  Upload CSV/JSON dataset
POST   /api/build                    →  Train TF-IDF, SVD, VADER models
GET    /api/recommend/{title}        →  Hybrid recommendations for an item
GET    /api/items?page=1&per_page=50 →  Paginated product listing
GET    /api/categories               →  All available categories
GET    /api/weights                  →  Current α, β, γ blend weights
PUT    /api/weights                  →  Update blend weights live
GET    /api/purchases/{user_id}      →  User purchase history
POST   /api/purchases                →  Record a purchase event

07 — Evaluation

python evaluation.py

Benchmarks Content-Only, Collab-Only, Sentiment-Only, and Hybrid across:

Precision@K  —  fraction of relevant items in top-K
Recall@K     —  fraction of all relevant items retrieved
NDCG@K       —  ranking quality (discounted cumulative gain)

08 — Security

✓  No hardcoded credentials — config served via /api/config
✓  .env excluded from git via .gitignore
✓  CORS restricted to configured origins
✓  Row-Level Security (RLS) on all Supabase tables
✓  Input validation via Pydantic models
✓  Generic error messages — no stack trace leakage
✓  SQL injection safe (Supabase SDK parameterized queries)


09 — Troubleshooting

ModuleNotFoundError

If you see:

ModuleNotFoundError: No module named 'xyz'

Run:

pip install -r requirements.txt

Port Already In Use

If port 8000 is busy:

python -m uvicorn backend.main:app --port 8001

NLTK VADER Download Error

Run Python shell:

import nltk
nltk.download('vader_lexicon')

Streamlit Not Found

Install Streamlit manually:

pip install streamlit

Supabase Connection Error

Check your .env file:

SUPABASE_URL=your_url
SUPABASE_ANON_KEY=your_key
SUPABASE_SERVICE_KEY=your_service_key

Make sure:

  • No extra spaces
  • No quotes
  • Correct project credentials

10 — Setup Verification

Backend Verification

Run:

python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000

Open:

http://localhost:8000/api/status

Expected response:

{
  "status": "ok"
}

Streamlit Verification

Run:

streamlit run app.py

Expected:

  • Browser opens automatically
  • CSV upload interface visible
  • Recommendation UI loads successfully

Dataset Upload Verification

Upload any sample CSV and verify:

  • Dataset loads without errors
  • Models build successfully
  • Recommendations appear

11 — Beginner Contributor Tips

Sync Your Fork Before Starting

git remote add upstream https://github.com/leonagoel/hybrid-recommender.git
git fetch upstream
git merge upstream/main

Resolve Merge Conflicts

If conflicts happen:

  1. Open conflicted files
  2. Remove conflict markers:
    <<<<<<<
    =======
    >>>>>>>
    
  3. Keep correct code
  4. Save file
  5. Commit again

Pull Request Checklist

Before submitting PR:

  • Project runs successfully
  • README formatting checked
  • No unnecessary files added
  • Branch name follows guidelines
  • Commit message follows convention
  • PR linked to issue

License

MIT — see LICENSE


Built by Leona Goel
B.Tech CSE · Vellore Institute of Technology
National Finalist · Smart India Hackathon 2025 · Top 8% of 950+ Teams

LinkedIn GitHub Email

## 09 — Screenshots

Home Page

Home Page

Recommendation Results

Recommendations

API Documentation

Swagger Docs

About

A hybrid recommender system using content-based and collaborative filtering with a data adapter for dynamic datasets.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 60.3%
  • JavaScript 15.6%
  • CSS 10.3%
  • Jupyter Notebook 9.1%
  • HTML 4.6%
  • Dockerfile 0.1%