A small ELT pipeline built with Snowflake + dbt + Streamlit using a curated dataset of Radiohead tracks.
Data was collected using the iTunes Search API and saved as: data/itunes_radiohead_raw.csv.
Columns include track name, artist, album, genre, duration (ms), and release date.
The CSV was uploaded into: MALLARD_PROJECT.PUBLIC.ITUNES_RADIOHEAD_RAW.
This is the raw layer.
Cleans and standardizes fields; derives release_year.
Output: PUBLIC.STG_RADIOHEAD
album_stats: track count + avg/min/max duration per albumyear_stats: tracks released + avg duration per year
Outputs:
PUBLIC.ALBUM_STATSPUBLIC.YEAR_STATS
An interactive Streamlit app was built using Snowpark.
Features:
- Filters (year range, album)
- KPIs (track count, album count, avg length)
- Charts:
- Tracks released per year
- Avg track length per album
- Track explorer (longest/shortest tracks)
dw_radiohead/
data/
radiohead_project/ (dbt project)
models/
staging/
analytics/
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-- Run this SQL in Snowflake: CREATE OR REPLACE TABLE ITUNES_RADIOHEAD_RAW AS SELECT * FROM @~/itunes_radiohead_raw.csv;
cd radiohead_project dbt run dbt test
streamlit run streamlit_app.py