Skip to content

kckDeepak/book-recommendation-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

📚 Book Recommendation System

Your Go-To Library for Discovering Great Books

Python TensorFlow License

An intelligent book recommendation system that learns your taste and surrounds you with books you'll love.


🎯 Project Vision

BookRec is an optimized recommendation engine designed to be your personalized library companion. The system allows you to:

  • 🔍 Search for books by title, author, or genre
  • ❤️ Add favorites and build your reading list
  • 🤖 Get personalized recommendations based on your taste
  • 🎨 Discover similar books to ones you've enjoyed
  • 📊 Explore a curated library tailored to your preferences

Unlike generic recommendation systems, BookRec creates a personalized reading environment where every suggestion aligns with your unique literary taste, making book discovery an enjoyable journey rather than an overwhelming search.


🏗️ Project Architecture

book-recommendation-system/
├── notebooks/               # Jupyter notebooks for ML pipeline
│   ├── 01_data_eda.ipynb           # Exploratory Data Analysis
│   ├── 02_preprocessing.ipynb       # Data cleaning & feature engineering
│   ├── 03_base_model.ipynb          # Baseline collaborative filtering
│   ├── 04_model_optimization.ipynb  # Hyperparameter tuning & optimization
│   └── 04_model_optimization_for_colab.ipynb  # GPU-accelerated training
│
├── data/                    # Dataset files
│   ├── books.csv                    # Book metadata (title, author, year)
│   ├── ratings.csv                  # User-book ratings
│   ├── tags.csv                     # User-generated tags
│   ├── processed_books.csv          # Cleaned book data
│   ├── processed_ratings.csv        # Filtered ratings
│   └── *.pkl                        # Pre-computed matrices (TF-IDF, cosine similarity)
│
├── models/                  # Trained ML models
│   ├── base_svd.pkl                 # Baseline SVD model
│   ├── tuned_svd.pkl                # Hyperparameter-tuned SVD
│   ├── base_mf.keras                # TensorFlow Matrix Factorization
│   └── quantized_mf.tflite          # Optimized deployment model (91% smaller!)
│
├── src/                     # Python source code
│   ├── models/                      # Model training scripts
│   └── utils/                       # Helper functions
│
├── backend/                 # API server (🚧 In Development)
├── frontend/                # Web UI (🚧 In Development)
├── requirements.txt         # Python dependencies
└── README.md               # You are here!

📊 Dataset

The system uses the Goodbooks-10k dataset:

  • Books: 10,000 popular books with rich metadata
  • Ratings: 6M+ ratings from 53,000+ users
  • Tags: User-generated tags for content-based filtering
  • Rating Scale: 1-5 stars

Key Statistics:

  • Average ratings per book: ~600
  • Average ratings per user: ~113
  • Data sparsity: ~99.4% (typical for recommendation systems)

🔬 Machine Learning Pipeline

1️⃣ Exploratory Data Analysis (01_data_eda.ipynb)

What it does:

  • Analyzes rating distribution and user behavior
  • Identifies popular books and active users
  • Visualizes data sparsity and patterns
  • Detects data quality issues

Key Findings:

  • Most ratings are 4-5 stars (positive bias)
  • Power users contribute disproportionately to ratings
  • Long-tail distribution: few books are extremely popular

2️⃣ Data Preprocessing (02_preprocessing.ipynb)

What it does:

  • Cleans missing values and duplicates
  • Filters low-activity users and obscure books
  • Creates TF-IDF vectors from book tags
  • Computes cosine similarity matrix for content-based filtering
  • Generates processed datasets for modeling

Outputs:

  • processed_books.csv - Clean book metadata
  • processed_ratings.csv - Filtered user-item interactions
  • tfidf_matrix.pkl - Term frequency vectors
  • cosine_sim.pkl - Pre-computed book similarities

3️⃣ Base Model Development (03_base_model.ipynb)

What it does:

  • Implements Collaborative Filtering using SVD (Singular Value Decomposition)
  • Builds Content-Based Filtering using TF-IDF + cosine similarity
  • Creates Hybrid Recommendation System combining both approaches
  • Evaluates baseline performance

Models:

  • SVD: Factorizes user-item matrix into latent factors
  • Content-Based: Recommends books with similar tags/genres
  • Hybrid: Weighted combination for robust recommendations

Baseline Performance:

  • RMSE: 1.37 (Root Mean Squared Error)
  • Model Size: 87.8 MB
  • Inference Time: 3.0s (for 368K predictions)

4️⃣ Model Optimization (04_model_optimization.ipynb)

What it does:

  • Hyperparameter Tuning with Optuna (20 trials)
  • Converts SVD to TensorFlow Matrix Factorization for deployment
  • Applies Quantization (float32 → int8) for model compression

Optimization Techniques:

  1. Optuna-based tuning: Automated search for optimal hyperparameters
  2. Model conversion: SVD → TensorFlow for production scalability
  3. Post-training quantization: Reduces model size with minimal accuracy loss

Final Results:

Model RMSE Size (MB) Inference Time (s) Size Reduction
Base SVD 1.374 87.77 2.96 -
Tuned SVD 1.320 101.0 N/A -15.1%
Base MF (TensorFlow) 1.418 30.33 20.66 65.4%
Quantized MF 1.417 7.59 3.61 91.4%

🎉 Key Achievements:

  • Best accuracy: 1.32 RMSE (3.9% improvement over baseline)
  • 91.4% smaller model: 88 MB → 7.6 MB
  • Deployment-ready: TFLite format works on edge devices
  • Minimal accuracy loss: Only 0.1 RMSE degradation from quantization

The Quantized MF model is recommended for production deployment due to its optimal balance of accuracy, size, and speed.


🚀 GPU-Accelerated Training (04_model_optimization_for_colab.ipynb)

What it does:

  • Google Colab-compatible version with GPU support
  • Skips expensive Optuna tuning (uses pre-computed hyperparameters)
  • Faster training with T4/V100 GPUs

Why use this:

  • Local training takes hours; Colab reduces it to ~20 minutes
  • Free GPU access for model training
  • Easy Google Drive integration for data/model storage

🛠️ Installation & Setup

Prerequisites

  • Python 3.8+
  • pip or conda
  • (Optional) Google Colab account for GPU training

Local Setup

  1. Clone the repository
git clone https://github.com/yourusername/book-recommendation-system.git
cd book-recommendation-system
  1. Create virtual environment
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download dataset

📖 Running the Notebooks

Step-by-Step Execution

1. Data Exploration

jupyter notebook notebooks/01_data_eda.ipynb

Run all cells to understand the dataset.

2. Data Preprocessing

jupyter notebook notebooks/02_preprocessing.ipynb

Generates cleaned datasets in data/ folder.

3. Train Base Model

jupyter notebook notebooks/03_base_model.ipynb

Creates base_svd.pkl in models/ folder.

4. Optimize Model

Option A: Local (slower)

jupyter notebook notebooks/04_model_optimization.ipynb

Option B: Google Colab (faster)

  1. Upload 04_model_optimization_for_colab.ipynb to Colab
  2. Create folder: MyDrive/book-recommendation-system/
  3. Upload data/ and models/ folders to Drive
  4. Run notebook with GPU runtime

5. Results

  • Check model performance in final comparison table
  • Models saved in models/ folder

🎯 Model Usage

Making Predictions

import tensorflow as tf
import numpy as np

# Load quantized model
interpreter = tf.lite.Interpreter(model_path="models/quantized_mf.tflite")
interpreter.allocate_tensors()

# Get input/output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Predict rating for user 123, book 456
user_id = np.array([[123]], dtype=np.int32)
book_id = np.array([[456]], dtype=np.int32)

interpreter.set_tensor(input_details[0]['index'], user_id)
interpreter.set_tensor(input_details[1]['index'], book_id)
interpreter.invoke()

predicted_rating = interpreter.get_tensor(output_details[0]['index'])[0][0]
print(f"Predicted rating: {predicted_rating:.2f}")

Hybrid Recommendations

import pickle

# Load SVD model
with open('models/tuned_svd.pkl', 'rb') as f:
    svd_model = pickle.load(f)

# Load content similarity
with open('data/cosine_sim.pkl', 'rb') as f:
    cosine_sim = pickle.load(f)

# Get recommendations (combine collaborative + content-based)
# See 03_base_model.ipynb for hybrid_recommend() function

🚧 Development Roadmap

✅ Completed

  • Data exploration and cleaning
  • Collaborative filtering (SVD)
  • Content-based filtering (TF-IDF)
  • Hybrid recommendation system
  • Hyperparameter optimization
  • Model quantization and compression

🔄 In Progress

  • Backend API (FastAPI/Flask)
    • REST endpoints for recommendations
    • User authentication
    • Model serving with TFLite
  • Frontend Web App (React/Next.js)
    • Book search and browsing
    • User profiles and favorites
    • Personalized recommendation dashboard

📋 Planned

  • Real-time model updates with new ratings
  • A/B testing framework
  • Cold-start problem handling (new users/books)
  • Explainable recommendations
  • Mobile app deployment
  • Docker containerization
  • CI/CD pipeline

📈 Performance Metrics

Model Comparison

Accuracy: Tuned SVD wins

  • Winner: Tuned SVD (1.32 RMSE)
  • Runner-up: Quantized MF (1.42 RMSE)

Speed: Quantized MF is fastest

  • Winner: Quantized MF (3.6s)
  • Original baseline: 3.0s

Size: Quantized MF is 12x smaller

  • Winner: Quantized MF (7.6 MB)
  • Original baseline: 87.8 MB

Recommended for Production: Quantized MF

  • Excellent speed/accuracy tradeoff
  • Tiny model size (mobile-friendly)
  • Easy deployment with TFLite

🤝 Contributing

Contributions are welcome! Areas of interest:

  • Cold-start problem solutions
  • Deep learning models (NCF, autoencoders)
  • Frontend/backend development
  • Performance optimizations
  • Documentation improvements

📜 License

This project is licensed under the MIT License.


🙏 Acknowledgments

  • Dataset: Goodbooks-10k by Zygmunt Zając
  • Libraries: Scikit-surprise, TensorFlow, Scikit-learn, Pandas
  • Inspiration: Building a personalized reading experience for book lovers

📧 Contact

For questions or collaboration:


⭐ Star this repo if you find it useful!

Happy Reading! 📚

About

An intelligent book recommendation system that learns your taste and surrounds you with books you'll love.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors