This repository supports five part series about the streaming analytics on AWS and In this video series, we are building real-time gaming leaderboard application based on real use case to learn all parts of streaming architecture including
- Data Ingestion
- Real-time enrichment using database Change data capture (CDC)
- Data Processing
- Computing results, storing them and
- Visualisation
In this series you will also learn advance analytics techniques like
- Control channel technique for A/B testing, feature switching and parameter updates with zero downtime
- Handling late data arrival
- Exactly-once processing (data-duplication avoidance) and
- Storage of historical data with ability of on-demand replay.
This repository contains two folders
- infra which contains aws-cdk source code of the AWS infrastructure
- notebooks which contains Amazon Managed Service for Apache Flink Studio (Zeppelin) notebooks. Two of
them
- challenges.zpln: Which contains progressive challenge for specific parts.
- answers.zpln: Which contains progressive answers to the challenges.
Repository has one main infrastructure file with infrastructure code of a final solution with comments added for specific parts. You can either deploy full solution or implement it progressively as you watch video parts by commenting out parts which are not yet required.
- Latest Node JS and npm
- Latest cdk
npm install -g aws-cdk
- Python 3 with
pip3
- Take a check out of a
main
branch. - Switch to the folder infra/functions/players and run
pip3 install -r requirements.txt -t .
- Switch to the folder infra/functions/redis-sync and run
pip3 install -r requirements.txt -t .
- Go to the infra folder and run
npm install
- Go to the infra folder and run
cdk bootstrap
if using CDK for the first time in the given AWS account and region, else skip this step. - Go to the infra folder and run
cdk deploy
You can implement solution progressively by commenting out different parts of infrastructure in file infra/lib/gaming-leaderboard-stack.ts
- 1-ingestion-setup: Sets up kinesis data stream, notebook role and data generator to automatically publish gaming events to the source stream.
- 1-ingestion-answer: Sets up Amazon Managed Service for Apache Flink Studio (Flink Zeppelin Notebook) application. Adds challenge answer in answers.zpln notebook.
- 2-cdc-enrichment-setup: Adds setup of MySQL database, data generator for MySQL, Connectivity between studio notebook and MySQL.
- 2-cdc-enrichment-answer adds challenge answer in answers.zpln notebook.
- 3-process-store-visualize-setup: Adds new kinesis data stream to receive Redis queries, Lambda to execute those to Amazon MemoryDB and Grafana installed on EC2 for visualization.
- 4-dynamic-config-setup: Adds new kinesis data stream and Lambda to publish config updates to Amazon Managed Service for Apache Flink.
- 5-archive-and-replay-setup: Adds two new Amazon Managed Service for Apache Flink applications. One to store data to S3 and one to replay.