Skip to content

omarshatrat/cloud-classifier-web-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud Model Deployment Project

This repository contains the code and configurations required to deploy a machine learning model that classifies types of clouds. It's set up to use Docker to ensure that the environment is reproducible and that the model can be run consistently across different machines.

Repository Structure

  • pipeline.py: This is the main script that runs the entire machine learning pipeline, from data acquisition to model training and evaluation, and finally uploading the results to AWS S3.
  • app.py: This is the main script that runs the cloud classifier web application. The app can be run locally with the following command: streamlit run app.py
  • src/: This directory contains the Python modules that the main pipeline script uses. Each module is responsible for different stages of the pipeline:
    • acquire_data.py: Functions for getting data from a URL and constructing/saving the dataset.
    • eda.py: Functions to generate data visualizations and save them locally.
    • generate_features.py: Functions to generate features used to train and machine learning model.
    • train_model.py: Functions to train the machine learning model, evaluate its performance, and save model artifacts.
    • aws_utils.py: Utilities for uploading artifacts to AWS S3.
  • config/: Contains YAML/logging configuration files that control various aspects of the pipeline, such as data sources, model parameters, and AWS settings.
  • artifacts/: Local directory where artifacts from the pipeline, including models, are saved.
  • data/: Local directory where data needed for model training and feature generation will be stored.
  • `dockerfiles/': Contains Docker files for both the pipeline and unit_testing
    • Dockerfile.dockerfile: Defines Docker environment for running the pipeline.
    • Dockerfile_testing.dockerfile: Defines the Docker environment specifically for running tests.
  • tests/: Contains pytest file used for generating features.
    • generate_features_test.py: Contains unit testing script for generate_features module.
  • requirements.txt: Lists dependencies and packages needed to run the Docker file.

Instructions to Run

Note about OS

These instructions were developed using Windows 11 Pro and PowerShell. Attempts to provide the corresponding Mac commands have been included where appropriate.

Cloning the Repository

To clone this repository and start working with it, run the following command:

git clone https://github.com/omarshatrat/cloud-classifier-web-app.git
cd cloud-classifier-web-app

Customizing the YAML Configuration File

The YAML configuration file in the config/ directory controls various parameters of the pipeline. Make changes to the YAML file to match your project requirements and AWS configuration.

Make sure to change the bucket name within the config file to an existing bucket name within your S3 account.

Building and Running the Docker Container

Since everything will be run in docker, users don't need to install anything except for docker itself. Before building and running the Docker containers, you must have Docker installed on your system. Visit Docker's official website for installation instructions tailored to your operating system.

To authenticate and configure your AWS credentials, run:

aws configure sso --profile default

Once you have set up your desired configurations, log into AWS using:

aws sso login

The command should take you to an external website where you can complete the authentication process.

AWS IAM Instance Role

Ensure that you have IAM EC2 Instance Role set up with rights to access S3, ECR, and ECS.

To build the Docker container for the main pipeline, run:

Windows command:

docker build -f .\dockerfiles\Dockerfile.dockerfile -t <DESIRED IMAGE NAME> .

Mac command:

docker build -f ./dockerfiles/Dockerfile.dockerfile -t <DESIRED IMAGE NAME> .

If you encounter any errors building the image, consider switching to a different CLI such as PowerShell.

To run the pipeline:

docker run -v ${HOME}/.aws/:/root/.aws/:ro -v ${PWD}/artifacts:/app/artifacts -v ${PWD}/data:/app/data --name <DESIRED CONTAINER NAME> <DESIRED IMAGE NAME>

Note that you have to configure and login to your AWS profile locally before running this command. After running this, the container will build the model, and upload the model artifact to the s3 bucket specified in the YAML file.

Building and Running Tests with Docker

To build the Docker container for running tests:

Windows command:

docker build -f .\dockerfiles\Dockerfile_testing.dockerfile -t <DESIRED IMAGE NAME - DIFFERENT FROM ABOVE> .

Mac command:

docker build -f ./dockerfiles/Dockerfile_testing.dockerfile -t <DESIRED IMAGE NAME - DIFFERENT FROM ABOVE> .

ECR

To build the streamlit image, I ran these commands step-by-step:

docker build -t app_image .
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/f7r1g2f1
docker tag app_image:latest public.ecr.aws/f7r1g2f1/cloud_app_rpi0559:latest

Finally, I pushed the change into the existing ECR repository.

docker push public.ecr.aws/f7r1g2f1/cloud_app_rpi0559:latest

ECS

To begin formal deployment of your streamlit app to the web, create a new cluster in ECS with the name of your choice.

  • Make sure you create a new Security Group with Northwestern VPN IP 165.124.160.0/21.
    • Also ensure you are giving access to HTTP port 80 within the Dockerfile.
    • Finally, ensure that you grant access to FARGATE.

After thus, create a new Task Definition with the name of your choice.

  • If you have your own SSH Key Pair, I will recommend matching it.
  • Create a Task Execution Role that grants access to use S3, ECS, ECR.
  • Name the container whatever you please.
  • Designate the port as 80 as noted in Dockerfile in Port Mappings.
  • If you like, add a Key-Value pair of BUCKET_NAME and your S3 bucket name in Environment Variable.

Finally, create a service for the app.

  • For ECS, I made sure to use FARGATE with streamlit application, instead of EC2.

If all checks out, you will be able to access the streamlit application via the open address set within the Networking tab.

Unit Testing

To run the tests, make sure your terminal is configured to the root of the directory. Then, enter the following command.

pytest tests/app_test.py

All 4 tests should pass.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors