This repository contains the code and configurations required to deploy a machine learning model that classifies types of clouds. It's set up to use Docker to ensure that the environment is reproducible and that the model can be run consistently across different machines.
pipeline.py: This is the main script that runs the entire machine learning pipeline, from data acquisition to model training and evaluation, and finally uploading the results to AWS S3.app.py: This is the main script that runs the cloud classifier web application. The app can be run locally with the following command: streamlit run app.pysrc/: This directory contains the Python modules that the main pipeline script uses. Each module is responsible for different stages of the pipeline:acquire_data.py: Functions for getting data from a URL and constructing/saving the dataset.eda.py: Functions to generate data visualizations and save them locally.generate_features.py: Functions to generate features used to train and machine learning model.train_model.py: Functions to train the machine learning model, evaluate its performance, and save model artifacts.aws_utils.py: Utilities for uploading artifacts to AWS S3.
config/: Contains YAML/logging configuration files that control various aspects of the pipeline, such as data sources, model parameters, and AWS settings.artifacts/: Local directory where artifacts from the pipeline, including models, are saved.data/: Local directory where data needed for model training and feature generation will be stored.- `dockerfiles/': Contains Docker files for both the pipeline and unit_testing
Dockerfile.dockerfile: Defines Docker environment for running the pipeline.Dockerfile_testing.dockerfile: Defines the Docker environment specifically for running tests.
tests/: Contains pytest file used for generating features.generate_features_test.py: Contains unit testing script for generate_features module.
requirements.txt: Lists dependencies and packages needed to run the Docker file.
These instructions were developed using Windows 11 Pro and PowerShell. Attempts to provide the corresponding Mac commands have been included where appropriate.
To clone this repository and start working with it, run the following command:
git clone https://github.com/omarshatrat/cloud-classifier-web-app.git
cd cloud-classifier-web-appThe YAML configuration file in the config/ directory controls various parameters of the pipeline. Make changes to the YAML file to match your project requirements and AWS configuration.
Make sure to change the bucket name within the config file to an existing bucket name within your S3 account.
Since everything will be run in docker, users don't need to install anything except for docker itself. Before building and running the Docker containers, you must have Docker installed on your system. Visit Docker's official website for installation instructions tailored to your operating system.
To authenticate and configure your AWS credentials, run:
aws configure sso --profile defaultOnce you have set up your desired configurations, log into AWS using:
aws sso loginThe command should take you to an external website where you can complete the authentication process.
Ensure that you have IAM EC2 Instance Role set up with rights to access S3, ECR, and ECS.
To build the Docker container for the main pipeline, run:
docker build -f .\dockerfiles\Dockerfile.dockerfile -t <DESIRED IMAGE NAME> .docker build -f ./dockerfiles/Dockerfile.dockerfile -t <DESIRED IMAGE NAME> .If you encounter any errors building the image, consider switching to a different CLI such as PowerShell.
To run the pipeline:
docker run -v ${HOME}/.aws/:/root/.aws/:ro -v ${PWD}/artifacts:/app/artifacts -v ${PWD}/data:/app/data --name <DESIRED CONTAINER NAME> <DESIRED IMAGE NAME>Note that you have to configure and login to your AWS profile locally before running this command. After running this, the container will build the model, and upload the model artifact to the s3 bucket specified in the YAML file.
To build the Docker container for running tests:
docker build -f .\dockerfiles\Dockerfile_testing.dockerfile -t <DESIRED IMAGE NAME - DIFFERENT FROM ABOVE> .docker build -f ./dockerfiles/Dockerfile_testing.dockerfile -t <DESIRED IMAGE NAME - DIFFERENT FROM ABOVE> .To build the streamlit image, I ran these commands step-by-step:
docker build -t app_image .aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/f7r1g2f1docker tag app_image:latest public.ecr.aws/f7r1g2f1/cloud_app_rpi0559:latestFinally, I pushed the change into the existing ECR repository.
docker push public.ecr.aws/f7r1g2f1/cloud_app_rpi0559:latestTo begin formal deployment of your streamlit app to the web, create a new cluster in ECS with the name of your choice.
- Make sure you create a new Security Group with Northwestern VPN IP
165.124.160.0/21.- Also ensure you are giving access to HTTP port 80 within the Dockerfile.
- Finally, ensure that you grant access to FARGATE.
After thus, create a new Task Definition with the name of your choice.
- If you have your own SSH Key Pair, I will recommend matching it.
- Create a Task Execution Role that grants access to use S3, ECS, ECR.
- Name the container whatever you please.
- Designate the port as 80 as noted in Dockerfile in Port Mappings.
- If you like, add a Key-Value pair of
BUCKET_NAMEand your S3 bucket name in Environment Variable.
Finally, create a service for the app.
- For ECS, I made sure to use
FARGATEwith streamlit application, instead of EC2.
If all checks out, you will be able to access the streamlit application via the open address set within the Networking tab.
To run the tests, make sure your terminal is configured to the root of the directory. Then, enter the following command.
pytest tests/app_test.pyAll 4 tests should pass.