GitHub - gTanusri/hybrid-recommender: A hybrid recommender system using content-based and collaborative filtering with a data adapter for dynamic datasets.

╔══════════════════════════════════════════════════════════════════╗
║                                                                  ║
║    H Y B R I D R E C                                             ║
║    ─────────────────────────────────────────────────────────     ║
║    Hybrid Recommender System · Leona Goel      
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝

Important

🟢 This is the active GSSoC project repo — open all issues and PRs here only.

A production-ready recommender fusing Content-Based Filtering (TF-IDF), Collaborative Filtering (SVD), and NLP Sentiment Analysis (VADER) with a tunable weighted scoring engine — backed by Supabase PostgreSQL, served via FastAPI, and built to be dataset-agnostic by design.

25,000+ products  ·  Sub-50ms search  ·  3 ML models fused  ·  ~60% faster integration

01 — Architecture

The core insight: blend three independent signals, each capturing something the others miss.

User Reviews (text)           ──→  NLP Engine (VADER Sentiment)    ──┐
Item Metadata (title/desc)    ──→  Content Vectorization (TF-IDF)  ──┼──→  Weighted Hybrid  ──→  Ranked Results
User Purchases (clicks/buys)  ──→  Matrix Factorization (SVD)      ──┘         Engine

     Hybrid Score  =  α · content_score        [TF-IDF cosine similarity]
                    + β · collab_score          [Truncated SVD latent space]
                    + γ · sentiment_score       [VADER compound polarity]

     // α, β, γ are live-tunable via API or UI sliders

α — Content Model · TF-IDF + Cosine Similarity

Item metadata (title + description + category) vectorized with TF-IDF (unigrams + bigrams, max 5,000 features). On-the-fly cosine similarity yields content_score ∈ [0, 1]. Fast, interpretable, and requires zero user history — ideal for cold-start.

β — Collaborative Model · Truncated SVD

User-item interaction matrix built from purchases + implicit feedback (views, clicks). SVD reduces to 50 latent factors; cosine similarity in latent space yields collab_score. Adaptive rank automatically reduces SVD components for sparse matrices.

γ — Sentiment Model · NLTK VADER

Review text analyzed for compound polarity ∈ [-1, 1]. Per-item aggregation → Min-Max normalization → sentiment_score ∈ [0, 1]. Surfaces genuinely loved products, not just popular ones.

❄ Cold-Start Handling

Bayesian average rating — prevents 1-review, 5-star bias
Popularity-based fallback — ranks new items by review count and category similarity
Mock user seeding — synthetic purchase history to bootstrap collaborative filtering

02 — Features

Feature	Detail
`PostgreSQL FTS`	GIN-indexed full-text search — sub-50ms on 250k+ rows
`Supabase Auth`	Guest (anonymous) and email/password, Row-Level Security on all tables
`Tunable Weights`	Live α/β/γ sliders to adjust recommendation blend in real time
`Dataset-Agnostic`	Fuzzy column detection (`product_name` → `title`) cuts integration time by ~60%
`Cold-Start Resilient`	Bayesian avg rating + popularity fallback for new users and items
`Type-to-Search`	Global keyboard capture — start typing anywhere to search instantly
`Responsive UI`	Amazon-inspired dark header, 4→3→2→1 column card grid across breakpoints
`Secure by Default`	Pydantic validation, parameterized queries, CORS-restricted, no stack-trace leakage
`Streamlit UI`	Local CSV upload → build models → recommendations, no Supabase or server required

03 — Tech Stack

┌─────────────────┬────────────────────────────────────────────────┐
│ Layer           │ Technology                                      │
├─────────────────┼────────────────────────────────────────────────┤
│ Backend         │ Python 3.10+, FastAPI, Uvicorn                 │
│ Database        │ Supabase (PostgreSQL), Row-Level Security       │
│ Search          │ PostgreSQL FTS (GIN indexes, ts_rank)          │
│ Auth            │ Supabase Auth (anonymous + email/password)      │
│ ML — Content    │ scikit-learn: TF-IDF Vectorizer, Cosine Sim    │
│ ML — Collab     │ scikit-learn: TruncatedSVD, SciPy sparse       │
│ NLP             │ NLTK VADER SentimentIntensityAnalyzer           │
│ Data            │ Pandas, NumPy                                   │
│ Frontend        │ HTML5, CSS3, Vanilla JS, Supabase JS v2        │
└─────────────────┴────────────────────────────────────────────────┘

04 — Project Structure

hybrid-recommender/
│
├── backend/
│   └── main.py                  # FastAPI server — search, upload, build, recommend
│
├── frontend/
│   ├── index.html               # Single-page UI (Amazon-like layout)
│   ├── styles.css               # Design system (dark header, cards, animations)
│   └── app.js                   # Frontend logic (auth, search, rendering)
│
├── scripts/
│   ├── generate_sample_data.py  # Synthetic test dataset generator
│   ├── import_to_supabase.py    # Batch import CSV/JSON → PostgreSQL
│   └── seed_mock_data.py        # Mock users + purchases for cold-start bootstrap
│
├── data_adapter.py              # ⭐ Auto column detection + schema normalization
├── content_model.py             # TF-IDF content-based recommender
├── collaborative_model.py       # SVD collaborative recommender + implicit feedback
├── hybrid_model.py              # Weighted hybrid engine (Bayesian avg, popularity)
├── nlp_engine.py                # VADER sentiment analysis pipeline
├── evaluation.py                # Precision@K, Recall@K, NDCG@K benchmarks
├── db.py                        # Supabase client singleton (anon + admin)
├── app.py                       # Streamlit UI — upload CSV, build models, get recommendations
├── requirements.txt
├── .env.example
└── SETUP.md

05 — Quick Start

Prerequisites: Python 3.10+ · Supabase account (free tier works)

# 1 — Clone & install
git clone https://github.com/leonagoel/hybrid-recommender.git 
cd hybrid-recommender
pip install -r requirements.txt

# 2 — Configure Supabase
cp .env.example .env
# Fill in from: Supabase Dashboard → Settings → API

SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-role-key   # Required for bulk import

# 3 — Run SQL migrations
# See SETUP.md for full schema → paste into Supabase SQL Editor

# 4 — Start the server
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000

Open http://localhost:8000, upload any CSV/JSON from datasets/, click Build Models, then start typing to search.

Alternative — Streamlit UI (no Supabase required)

# After cloning and installing dependencies (step 1 above)
streamlit run app.py

Upload any CSV file, click Build Models, then enter an item name or User ID to get recommendations directly in your browser — no database or server setup needed.

06 — API Reference

GET    /api/config                   →  Supabase public config
GET    /api/status                   →  System status + product count
GET    /api/search?q=...&limit=20    →  Full-text search (PostgreSQL FTS)
POST   /api/upload                   →  Upload CSV/JSON dataset
POST   /api/build                    →  Train TF-IDF, SVD, VADER models
GET    /api/recommend/{title}        →  Hybrid recommendations for an item
GET    /api/items?page=1&per_page=50 →  Paginated product listing
GET    /api/categories               →  All available categories
GET    /api/weights                  →  Current α, β, γ blend weights
PUT    /api/weights                  →  Update blend weights live
GET    /api/purchases/{user_id}      →  User purchase history
POST   /api/purchases                →  Record a purchase event

07 — Evaluation

python evaluation.py

Benchmarks Content-Only, Collab-Only, Sentiment-Only, and Hybrid across:

Precision@K  —  fraction of relevant items in top-K
Recall@K     —  fraction of all relevant items retrieved
NDCG@K       —  ranking quality (discounted cumulative gain)

08 — Security

✓  No hardcoded credentials — config served via /api/config
✓  .env excluded from git via .gitignore
✓  CORS restricted to configured origins
✓  Row-Level Security (RLS) on all Supabase tables
✓  Input validation via Pydantic models
✓  Generic error messages — no stack trace leakage
✓  SQL injection safe (Supabase SDK parameterized queries)

09 — Troubleshooting

ModuleNotFoundError

If you see:

ModuleNotFoundError: No module named 'xyz'

Run:

pip install -r requirements.txt

Port Already In Use

If port 8000 is busy:

python -m uvicorn backend.main:app --port 8001

NLTK VADER Download Error

Run Python shell:

import nltk
nltk.download('vader_lexicon')

Streamlit Not Found

Install Streamlit manually:

pip install streamlit

Supabase Connection Error

Check your .env file:

SUPABASE_URL=your_url
SUPABASE_ANON_KEY=your_key
SUPABASE_SERVICE_KEY=your_service_key

Make sure:

No extra spaces
No quotes
Correct project credentials

10 — Setup Verification

Backend Verification

Run:

python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000

Open:

http://localhost:8000/api/status

Expected response:

{
  "status": "ok"
}

Streamlit Verification

Run:

streamlit run app.py

Expected:

Browser opens automatically
CSV upload interface visible
Recommendation UI loads successfully

Dataset Upload Verification

Upload any sample CSV and verify:

Dataset loads without errors
Models build successfully
Recommendations appear

11 — Beginner Contributor Tips

Sync Your Fork Before Starting

git remote add upstream https://github.com/leonagoel/hybrid-recommender.git
git fetch upstream
git merge upstream/main

Resolve Merge Conflicts

If conflicts happen:

Open conflicted files
Remove conflict markers:
```
<<<<<<<
=======
>>>>>>>
```
Keep correct code
Save file
Commit again

Pull Request Checklist

Before submitting PR:

License

MIT — see LICENSE

Built by Leona Goel
B.Tech CSE · Vellore Institute of Technology
National Finalist · Smart India Hackathon 2025 · Top 8% of 950+ Teams

## 09 — Screenshots

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.github		.github
assets		assets
backend		backend
datasets		datasets
frontend		frontend
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LLM_IMPLEMENTATION.md		LLM_IMPLEMENTATION.md
README.md		README.md
SETUP.md		SETUP.md
TESTING.md		TESTING.md
TESTING_QUICK_REF.md		TESTING_QUICK_REF.md
demo.ipynb		demo.ipynb
idea.md		idea.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
results.md		results.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

01 — Architecture

02 — Features

03 — Tech Stack

04 — Project Structure

05 — Quick Start

Alternative — Streamlit UI (no Supabase required)

06 — API Reference

07 — Evaluation

08 — Security

09 — Troubleshooting

ModuleNotFoundError

Port Already In Use

NLTK VADER Download Error

Streamlit Not Found

Supabase Connection Error

10 — Setup Verification

Backend Verification

Streamlit Verification

Dataset Upload Verification

11 — Beginner Contributor Tips

Sync Your Fork Before Starting

Resolve Merge Conflicts

Pull Request Checklist

License

Home Page

Recommendation Results

API Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

01 — Architecture

02 — Features

03 — Tech Stack

04 — Project Structure

05 — Quick Start

Alternative — Streamlit UI (no Supabase required)

06 — API Reference

07 — Evaluation

08 — Security

09 — Troubleshooting

ModuleNotFoundError

Port Already In Use

NLTK VADER Download Error

Streamlit Not Found

Supabase Connection Error

10 — Setup Verification

Backend Verification

Streamlit Verification

Dataset Upload Verification

11 — Beginner Contributor Tips

Sync Your Fork Before Starting

Resolve Merge Conflicts

Pull Request Checklist

License

Home Page

Recommendation Results

API Documentation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages