Skip to content
View hyunstar11's full-sized avatar
πŸ‘‹
πŸ‘‹

Block or report hyunstar11

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hyunstar11/README.md

Andrew Leem

Featured Projects – Resume


Welcome to my GitHub!

I'm Andrew Leem, a Master's student in Machine Learning & Data Science at Northwestern University with prior experience as a Data Scientist and AI Intern. My work spans demand forecasting, NLP/social listening, recommender systems, and GenAI pipelines β€” with a focus on consumer and retail domains. I'm passionate about building ML systems that connect behavioral signals to business decisions.


About Me

At Breakthru Beverage Group, I developed forecasting models to improve inventory planning, orchestrated Azure-hosted AI agents in C#/.NET Core, and led segmentation of 1,000+ retail accounts to guide sales strategy. Previously at Concentrix Catalyst, I enhanced global data accuracy, automated GDPR-compliant data retention pipelines in BigQuery, and standardized 20,000+ e-commerce tags across 100+ sites. Earlier at GLG, I built automated Python pipelines and Selenium scrapers to enrich company datasets and improve efficiency.

I bring a strong foundation in Python, SQL, PyTorch, and TensorFlow, along with experience in cloud platforms like Azure, AWS, GCP, and Snowflake. My focus is on building interpretable, scalable AI systems that drive measurable business impact.


Skills & Technologies

  • Programming: Python, R, SQL, C#, Bash
  • ML & AI: Scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, Prophet, ARIMA, FAISS, LoRA
  • NLP: VADER, HuggingFace Transformers (RoBERTa), TF-IDF, spaCy
  • Data Platforms: Snowflake, BigQuery, Hadoop
  • DevOps & Infra: Azure DevOps, Docker, .NET Core, ASP.NET Core
  • Analytics & Visualization: Plotly, Streamlit, Tableau, Power BI, Google Analytics
  • Collaboration Tools: GCP, Jira, Confluence, Git

Projects

Sneaker Demand Intelligence Platform (Feb 2026)

A two-part end-to-end system that combines quantitative aftermarket signals with real-time consumer sentiment to support footwear demand planning decisions.

Component Repo Description
sneaker-intel hyunstar11/sneaker-intel Tabular ML pipeline on 99K+ StockX transactions
reddit-sentiment hyunstar11/reddit-sentiment Β· Live Demo NLP sentiment pipeline on 2,400+ Reddit posts across 9 subreddits

sneaker-intel β€” ML demand forecasting:

  • XGBoost / LightGBM / Random Forest ensemble; 35-feature pipeline (brand, colorway, size, release recency, regional demand)
  • Outputs: demand tier classification, price premium prediction, sell-through risk score
  • 4 analysis notebooks: StockX EDA, market dynamics, model benchmarking, release strategy optimization
  • FastAPI inference server + Streamlit dashboard; 28 tests

reddit-sentiment β€” NLP social listening:

  • Hybrid scoring: VADER (fast baseline) + cardiffnlp/twitter-roberta-base-sentiment-latest on brand-context windows; blended 0.6 / 0.4
  • Detects brand mentions, 40+ retail channels, 7 purchase intent types, and 31 shoe models
  • 3 analysis notebooks covering data collection, sentiment analysis, and brand insights
  • CLI pipeline (collect β†’ analyze β†’ report); HTML report with Plotly charts; 101 tests

Combined insight example:

sneaker-intel places Nike in the High demand tier at 1.3–1.8Γ— retail. reddit-sentiment shows Nike at +0.21 avg sentiment with 188 active purchase-seekers and eBay/GOAT as the dominant secondary channels. β†’ Demand exceeds supply; prioritize Nike Direct allocation and hold restock for 60+ days.


H&M Personalized Fashion Recommendation System (Mar 2025)

  • Built end-to-end recommender using collaborative filtering (FAISS), content-based similarity, and hybrid logic with association rules
  • Achieved MAP@12 = 0.0470 across 1.3M customers and 31M transactions
  • Constructed 4,000+ RFM features, applied Incremental PCA + MiniBatchKMeans, and uncovered high-value customer clusters

FUSION RAG: Multi-Agent GenAI System (Mar 2025)

  • Built multi-agent Retrieval-Augmented Generation system with GPT-3.5, Pinecone, and ChromaDB
  • Designed LoRA-tuned agents with specialized tasks (retrieval, summarization, synthesis) orchestrated via prompt chaining
  • Improved modularity and context relevance in climate change Q&A pipelines

H&M Customer Churn Prediction System (May 2025)

  • Built an end-to-end churn prediction pipeline with automated data processing, feature engineering, and reproducible experiment management
  • Developed a Random Forest model to identify high-risk customers, enabling targeted retention strategies with interpretable churn scores
  • Deployed an interactive Streamlit dashboard on AWS to visualize churn risk, feature importance, and model performance in real time

Get in Touch

Popular repositories Loading

  1. LeetHub LeetHub Public

    Forked from QasimWani/LeetHub

    Automatically sync your leetcode solutions to your github account - top 5 trending GitHub repository

    JavaScript

  2. LeetCode LeetCode Public

    Collection of LeetCode questions to ace the coding interview! - Created using [LeetHub](https://github.com/QasimWani/LeetHub)

    Python

  3. leap-day leap-day Public

    Forked from pages-themes/leap-day

    Leap day is a Jekyll theme for GitHub Pages

    SCSS

  4. slate slate Public

    Forked from pages-themes/slate

    Slate is a Jekyll theme for GitHub Pages

    SCSS

  5. bootcamp-2024 bootcamp-2024 Public

    Forked from NUMLDS/bootcamp-2024

    HTML

  6. hyunstar11 hyunstar11 Public