Cloud Model Deployment Project

This repository contains the code and configurations required to deploy a machine learning model that classifies types of clouds. It's set up to use Docker to ensure that the environment is reproducible and that the model can be run consistently across different machines.

Repository Structure

pipeline.py: This is the main script that runs the entire machine learning pipeline, from data acquisition to model training and evaluation, and finally uploading the results to AWS S3.
app.py: This is the main script that runs the cloud classifier web application. The app can be run locally with the following command: streamlit run app.py
src/: This directory contains the Python modules that the main pipeline script uses. Each module is responsible for different stages of the pipeline:
- acquire_data.py: Functions for getting data from a URL and constructing/saving the dataset.
- eda.py: Functions to generate data visualizations and save them locally.
- generate_features.py: Functions to generate features used to train and machine learning model.
- train_model.py: Functions to train the machine learning model, evaluate its performance, and save model artifacts.
- aws_utils.py: Utilities for uploading artifacts to AWS S3.
config/: Contains YAML/logging configuration files that control various aspects of the pipeline, such as data sources, model parameters, and AWS settings.
artifacts/: Local directory where artifacts from the pipeline, including models, are saved.
data/: Local directory where data needed for model training and feature generation will be stored.
`dockerfiles/': Contains Docker files for both the pipeline and unit_testing
- Dockerfile.dockerfile: Defines Docker environment for running the pipeline.
- Dockerfile_testing.dockerfile: Defines the Docker environment specifically for running tests.
tests/: Contains pytest file used for generating features.
- generate_features_test.py: Contains unit testing script for generate_features module.
requirements.txt: Lists dependencies and packages needed to run the Docker file.

Instructions to Run

Note about OS

These instructions were developed using Windows 11 Pro and PowerShell. Attempts to provide the corresponding Mac commands have been included where appropriate.

Cloning the Repository

To clone this repository and start working with it, run the following command:

git clone https://github.com/omarshatrat/cloud-classifier-web-app.git
cd cloud-classifier-web-app

Customizing the YAML Configuration File

The YAML configuration file in the config/ directory controls various parameters of the pipeline. Make changes to the YAML file to match your project requirements and AWS configuration.

Make sure to change the bucket name within the config file to an existing bucket name within your S3 account.

Building and Running the Docker Container

Since everything will be run in docker, users don't need to install anything except for docker itself. Before building and running the Docker containers, you must have Docker installed on your system. Visit Docker's official website for installation instructions tailored to your operating system.

To authenticate and configure your AWS credentials, run:

aws configure sso --profile default

Once you have set up your desired configurations, log into AWS using:

aws sso login

The command should take you to an external website where you can complete the authentication process.

AWS IAM Instance Role

Ensure that you have IAM EC2 Instance Role set up with rights to access S3, ECR, and ECS.

To build the Docker container for the main pipeline, run:

Windows command:

docker build -f .\dockerfiles\Dockerfile.dockerfile -t <DESIRED IMAGE NAME> .

Mac command:

docker build -f ./dockerfiles/Dockerfile.dockerfile -t <DESIRED IMAGE NAME> .

If you encounter any errors building the image, consider switching to a different CLI such as PowerShell.

To run the pipeline:

docker run -v ${HOME}/.aws/:/root/.aws/:ro -v ${PWD}/artifacts:/app/artifacts -v ${PWD}/data:/app/data --name <DESIRED CONTAINER NAME> <DESIRED IMAGE NAME>

Note that you have to configure and login to your AWS profile locally before running this command. After running this, the container will build the model, and upload the model artifact to the s3 bucket specified in the YAML file.

Building and Running Tests with Docker

To build the Docker container for running tests:

Windows command:

docker build -f .\dockerfiles\Dockerfile_testing.dockerfile -t <DESIRED IMAGE NAME - DIFFERENT FROM ABOVE> .

Mac command:

docker build -f ./dockerfiles/Dockerfile_testing.dockerfile -t <DESIRED IMAGE NAME - DIFFERENT FROM ABOVE> .

ECR

To build the streamlit image, I ran these commands step-by-step:

docker build -t app_image .

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/f7r1g2f1

docker tag app_image:latest public.ecr.aws/f7r1g2f1/cloud_app_rpi0559:latest

Finally, I pushed the change into the existing ECR repository.

docker push public.ecr.aws/f7r1g2f1/cloud_app_rpi0559:latest

ECS

To begin formal deployment of your streamlit app to the web, create a new cluster in ECS with the name of your choice.

Make sure you create a new Security Group with Northwestern VPN IP 165.124.160.0/21.
- Also ensure you are giving access to HTTP port 80 within the Dockerfile.
- Finally, ensure that you grant access to FARGATE.

After thus, create a new Task Definition with the name of your choice.

If you have your own SSH Key Pair, I will recommend matching it.
Create a Task Execution Role that grants access to use S3, ECS, ECR.
Name the container whatever you please.
Designate the port as 80 as noted in Dockerfile in Port Mappings.
If you like, add a Key-Value pair of BUCKET_NAME and your S3 bucket name in Environment Variable.

Finally, create a service for the app.

For ECS, I made sure to use FARGATE with streamlit application, instead of EC2.

If all checks out, you will be able to access the streamlit application via the open address set within the Networking tab.

Unit Testing

To run the tests, make sure your terminal is configured to the root of the directory. Then, enter the following command.

pytest tests/app_test.py

All 4 tests should pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Model Deployment Project

Repository Structure

Instructions to Run

Note about OS

Cloning the Repository

Customizing the YAML Configuration File

Make sure to change the bucket name within the config file to an existing bucket name within your S3 account.

Building and Running the Docker Container

AWS IAM Instance Role

Windows command:

Mac command:

Building and Running Tests with Docker

Windows command:

Mac command:

ECR

ECS

Unit Testing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
artifacts		artifacts
config		config
data		data
dockerfiles		dockerfiles
src		src
tests		tests
README.md		README.md
app.py		app.py
logging.conf		logging.conf
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Cloud Model Deployment Project

Repository Structure

Instructions to Run

Note about OS

Cloning the Repository

Customizing the YAML Configuration File

Make sure to change the bucket name within the config file to an existing bucket name within your S3 account.

Building and Running the Docker Container

AWS IAM Instance Role

Windows command:

Mac command:

Building and Running Tests with Docker

Windows command:

Mac command:

ECR

ECS

Unit Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages