Cloud bills are complex. Raw exports from Azure Cost Management are often messy, containing millions of rows of nested JSON resource tags.
This project is a FinOps Data Pipeline built on the Lakehouse architecture. It ingests raw billing exports, normalizes the data into a Gold Delta Table, and runs heuristic algorithms to detect spending anomalies (e.g., a resource group costing 50% more than its 7-day moving average).
We follow the Medallion Architecture (Bronze -> Silver -> Gold).
graph LR
A["Azure Cost Export<br/>(CSV/Parquet)"] -->|Autoloader| B[("Bronze Table<br/>(Raw Billing)")]
B -->|PySpark / Cleaning| C[("Silver Table<br/>(Cleaned Costs)")]
C -->|Aggregation & Tag Parsing| D[("Gold Table<br/>(Daily Aggregates)")]
D -->|JDBC Read| E["Metabase / Power BI<br/>(Dashboards)"]
D -->|SQL Alerting| F["Anomaly Alert<br/>(Teams/Slack)"]