Skip to content

ELT (Extract, Load, Transform) pipeline that fetches stock data from Yahoo Finance, stores it in an S3 bucket, and then loads it into an Redshift Serverless table

Notifications You must be signed in to change notification settings

mihir-robotics/aws-yahoo-finance-etl-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

End-to-End AWS Data (ETL/ELT) Pipeline for Yahoo Finance Stock Data

This project is a complete ELT (Extract, Load, Transform) pipeline that fetches stock data from Yahoo Finance, stores it in an S3 bucket, and then loads it into an Redshift Serverless table. Finally, AWS Quicksight is used to create BI Dashboards on this data. The pipeline uses AWS Lambda, AWS Glue, and AWS Redshift Serverless.

Architecture

AWS Stock Pipeline Architecture

Components

1. AWS Lambda Function

  • An AWS Lambda function calls the yfinance API to load data for the past 30 days of specific stocks.
  • The data is combined into one CSV file and uploaded to an S3 bucket.
  • File: src/lambda-functions/load_data_s3.py

2. AWS Glue Job

  • The Glue job loads the stock data from the S3 bucket into the Amazon Redshift table.
  • File: src/glue/s3_to_redshift_job.py

3. AWS Redshift Table & Views

  • The raw stock data is stored in a Redshift table.

  • Several views are defined on top of this base table for specific use-cases.

  • Intraday Percentage Difference View: Calculates the average intraday percentage change for each stock, measuring how much a stock moves within a single trading session.

  • Daily Volatility View: Calculates the daily price range percentage for each stock, measuring intraday volatility by computing how much the price fluctuates within a single day.

  • Daily Gainers and Losers View: Identifies whether a stock is a gainer, loser, or neutral for the day by comparing today's closing price with the previous day's closing price

  • Table/View DDLs:

  • src/redshift-sql/tables/stock_data_raw.sql

  • src/redshift-sql/views/stock_intraday_diff.sql

  • src/redshift-sql/views/stock_daily_volatility.sql

  • src/redshift-sql/views/stock_daily_gainers_losers.sql

4. AWS QuickSight

  • The data is visualized in a dashboard on AWS QuickSight.

Quicksight Dashboard Example

AWS QuickSight Dashboard

Setup and Deployment Overview

  1. Deploy the Lambda function to fetch and upload stock data to S3.
  2. Configure the Glue job to load data from S3 to Redshift.
  3. Create the necessary Redshift tables and views.
  4. Visualize the data using AWS QuickSight.

Pipeline Usage

  1. Trigger the Lambda function to fetch and upload stock data CSV to S3.
  2. Run the Glue job to load data from S3 to Redshift.
  3. Configure Redshift table as Data Source in AWS Quicksight via VPC.

Reference Links:

  1. Configuring AWS Quicksight with Redshift
  2. Connecting AWS Glue to Redshift and S3
  3. Adding layers to AWS Lambda function for Python dependencies

About

ELT (Extract, Load, Transform) pipeline that fetches stock data from Yahoo Finance, stores it in an S3 bucket, and then loads it into an Redshift Serverless table

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages