🏒 PuckInsights: NHL Data Analysis and Cloud Deployment PuckInsights is an end-to-end data science project that explores historical NHL and ice hockey statistics through practical exploratory and statistical analysis. The project starts in an interactive Google Colab notebook and evolves into a cloud-deployed, containerized analytics service hosted on Azure.
🚀 Project Roadmap This project is documented in a Medium article series covering:
📊 Descriptive Statistics — Central tendencies and dispersion metrics
🔗 Correlation and Regression — Linear vs monotonic trends, residual diagnostics, and modeling
📈 Distributions and Patterns — Fitting and evaluating probabilistic models
🐳 Dockerization & Cloud Deployment — Building and deploying a full pipeline on Azure Container Apps
📦 Dataset Overview The dataset was collected and cleaned from public NHL sources.
Property Value Rows 12,250 Columns 23 Memory Usage 5.67 MB Bytes per Row ~485.3 Year Range 1963 to 2022
The dataset includes aggregated and per-season metrics for players and goalies, allowing for rich EDA, correlation analysis, and modeling exercises.
🧪 Current Focus: Exploratory & Statistical Analysis We're currently deep-diving into:
Pre-correlation diagnostics: linear vs monotonic detection using Pearson and Spearman
Residual analysis: to check for patterns, nonlinearity, or heteroscedasticity
Quadratic model comparison: to identify nonlinear trends not captured by linear models
Visual diagnostics: heatmaps, scatterplots, residual plots, and QQ-plots
☁️ Coming Soon: Deployment After the notebook analysis is complete:
The codebase will be modularized and refactored
Containerized using Docker
Deployed as a stateless app on Azure Container Apps
Exposed via HTTP API and integrated with basic visualization tools