NBA Play-by-Play Summarization with T5-Gemma

Overview

This repository builds a custom dataset of NBA play-by-play (pbp) logs and finetunes T5-Gemma small model to generate concise but engaging quarter summaries of an NBA game. The pipeline covers scraping, preprocessing, and supervised fine-tuning.

Project structure

NBA_pbp_scraper.ipynb
Scrapes official NBA play-by-play logs and stores actions (shots, fouls, turnovers, rebounds, etc.) in structured form.
pbp_preprocessing.ipynb
Cleans and formats scraped data, splits games by quarter, and pairs quarter pbp text with a ground-truth summary in a kaggle dataset
finetuning_nba.ipynb
Finetunes google/t5-gemma-small using Hugging Face training utilities with experiment tracking via Weights & Biases

Dataset

Source: NBA Quarter Play Summaries (custom-built)
Granularity: Per quarter
Link: https://www.kaggle.com/datasets/shrishtiroy6/nba-quarter-play-summaries
Format example:
- Input: Time TIMBERWOLVES Score Lead Warriors 12:00 Start of Period (9:16 PM) 12:00 Possession: Timberwolves 30-23 +7 11:44 MISS N.Alexander-Walker 25' 3PT Jump Shot 11:41 Q.Post REBOUND 11:34 30-25 +5 J.ButlerIII Driving Layup 11:34 N.Reid S.FOUL (P1, T1) (S.Wright) 11:34 30-26 +4 J.ButlerIII Free Throw 1 of 1
- Output: The first quarter was tightly contested, with the Warriors leading thanks to strong three-point shooting from Curry and Hield.

Model training

Base model: google/t5-gemma-small (60M params)
Uses LoRA finetuning and 4-bit quantization to compress model weights
Typical settings used:
- Per-device batch size: 1 (with gradient accumulation)
- Optimizer: adamw_torch_8bit
- Mixed precision: bf16
- Logging: Weights & Biases
- GPU: Tesla P100
Development goal: Overfit small examples to validate training loop, then scale to full dataset and evaluate with ROUGE/BLEU.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
NBA_pbp_scraper.ipynb		NBA_pbp_scraper.ipynb
README.md		README.md
finetuning_nba.ipynb		finetuning_nba.ipynb
pbp_preprocessing.ipynb		pbp_preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBA Play-by-Play Summarization with T5-Gemma

Overview

Project structure

Dataset

Model training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NBA Play-by-Play Summarization with T5-Gemma

Overview

Project structure

Dataset

Model training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages