This project implements the BFR (Bradley, Fayyad, and Reina) Algorithm from scratch to perform scalable clustering on the Amazon Books Reviews dataset. It processes data in chunks to handle memory constraints and identifies distinct market segments based on price, ratings, and review counts.
- Click the "Open in Colab" badge above.
- Run the first cell.
- When prompted, upload your personal
kaggle.jsonAPI token (downloadable from your Kaggle account settings). The notebook will automatically handle the authentication and dataset download.
Clustering - Amazon Books Analysis.ipynb: The main notebook containing the BFR implementation and analysis.Project_Report.pdf: The final academic report summarizing methodology and findings.README.md: Project documentation.