Featured Projects β Resume
- Sneaker Demand Intelligence Platform
- H&M Personalized Fashion Recommender
- FUSION RAG: Multi-Agent GenAI System
- H&M Customer Churn Prediction
Welcome to my GitHub!
I'm Andrew Leem, a Master's student in Machine Learning & Data Science at Northwestern University with prior experience as a Data Scientist and AI Intern. My work spans demand forecasting, NLP/social listening, recommender systems, and GenAI pipelines β with a focus on consumer and retail domains. I'm passionate about building ML systems that connect behavioral signals to business decisions.
At Breakthru Beverage Group, I developed forecasting models to improve inventory planning, orchestrated Azure-hosted AI agents in C#/.NET Core, and led segmentation of 1,000+ retail accounts to guide sales strategy. Previously at Concentrix Catalyst, I enhanced global data accuracy, automated GDPR-compliant data retention pipelines in BigQuery, and standardized 20,000+ e-commerce tags across 100+ sites. Earlier at GLG, I built automated Python pipelines and Selenium scrapers to enrich company datasets and improve efficiency.
I bring a strong foundation in Python, SQL, PyTorch, and TensorFlow, along with experience in cloud platforms like Azure, AWS, GCP, and Snowflake. My focus is on building interpretable, scalable AI systems that drive measurable business impact.
- Programming: Python, R, SQL, C#, Bash
- ML & AI: Scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, Prophet, ARIMA, FAISS, LoRA
- NLP: VADER, HuggingFace Transformers (RoBERTa), TF-IDF, spaCy
- Data Platforms: Snowflake, BigQuery, Hadoop
- DevOps & Infra: Azure DevOps, Docker, .NET Core, ASP.NET Core
- Analytics & Visualization: Plotly, Streamlit, Tableau, Power BI, Google Analytics
- Collaboration Tools: GCP, Jira, Confluence, Git
A two-part end-to-end system that combines quantitative aftermarket signals with real-time consumer sentiment to support footwear demand planning decisions.
| Component | Repo | Description |
|---|---|---|
| sneaker-intel | hyunstar11/sneaker-intel | Tabular ML pipeline on 99K+ StockX transactions |
| reddit-sentiment | hyunstar11/reddit-sentiment Β· Live Demo | NLP sentiment pipeline on 2,400+ Reddit posts across 9 subreddits |
sneaker-intel β ML demand forecasting:
- XGBoost / LightGBM / Random Forest ensemble; 35-feature pipeline (brand, colorway, size, release recency, regional demand)
- Outputs: demand tier classification, price premium prediction, sell-through risk score
- 4 analysis notebooks: StockX EDA, market dynamics, model benchmarking, release strategy optimization
- FastAPI inference server + Streamlit dashboard; 28 tests
reddit-sentiment β NLP social listening:
- Hybrid scoring: VADER (fast baseline) +
cardiffnlp/twitter-roberta-base-sentiment-lateston brand-context windows; blended 0.6 / 0.4 - Detects brand mentions, 40+ retail channels, 7 purchase intent types, and 31 shoe models
- 3 analysis notebooks covering data collection, sentiment analysis, and brand insights
- CLI pipeline (
collect β analyze β report); HTML report with Plotly charts; 101 tests
Combined insight example:
sneaker-intel places Nike in the High demand tier at 1.3β1.8Γ retail. reddit-sentiment shows Nike at +0.21 avg sentiment with 188 active purchase-seekers and eBay/GOAT as the dominant secondary channels. β Demand exceeds supply; prioritize Nike Direct allocation and hold restock for 60+ days.
- Built end-to-end recommender using collaborative filtering (FAISS), content-based similarity, and hybrid logic with association rules
- Achieved MAP@12 = 0.0470 across 1.3M customers and 31M transactions
- Constructed 4,000+ RFM features, applied Incremental PCA + MiniBatchKMeans, and uncovered high-value customer clusters
- Built multi-agent Retrieval-Augmented Generation system with GPT-3.5, Pinecone, and ChromaDB
- Designed LoRA-tuned agents with specialized tasks (retrieval, summarization, synthesis) orchestrated via prompt chaining
- Improved modularity and context relevance in climate change Q&A pipelines
- Built an end-to-end churn prediction pipeline with automated data processing, feature engineering, and reproducible experiment management
- Developed a Random Forest model to identify high-risk customers, enabling targeted retention strategies with interpretable churn scores
- Deployed an interactive Streamlit dashboard on AWS to visualize churn risk, feature importance, and model performance in real time
- Email: andrew.leem@gmail.com
