Big-Data-ETL

Introduction

Implemented various ETL techniques on Amazon's shoppers product reviews. A single dataset alone contained over 1.5 million rows with various 40 meta-data links. My goal was to perform ETL processing in the cloud and upload a structured DataFrame to an RDS instance. Second goal was to utilize PySpark and SQL to perform a statistical analysis on the selected data.

This mini-project is split into two sections:

Extract two Amazon customer review datasets, transforming into 4 final DataFrames and loading onto AWS cloud.
Extract two Amazon review datasets and use SQL and PySpark to analyze whether reviews from Amazon's Vine program are trustworth.

Part 1

Extract the Data:

Read in each dataset with correct header and parameters
Calculated the shape of the dataset

Transform the Data:

Created a review_df with appropriate columns and data types
Created a product_df with dropped duplicates in both columns
Created a customer_df that grouped data on 'customer_id' by the # of times a customer reviewed a product
Created a vine_df with the necessary aggregated columns

Load the Data into an RDS Instance

Exported into postgreSql server wtih the necessary tables for analyzes

Part 2

I explored various methods to investigate whether Vine reviews were free of bias. Implemented PySpark and SQL to analyze the data

1. For analysis, I considered steps to reducing noisy data by filtering reviews by certain criteria

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Images		Images
Resources		Resources
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Big-Data-ETL

Introduction

Part 1

Part 2

Summary:

About

Uh oh!

Releases

Packages

Languages

Master-Leo/Big-Data-ETL

Folders and files

Latest commit

History

Repository files navigation

Big-Data-ETL

Introduction

Part 1

Part 2

Summary:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages