For detailed information about the system architecture, please refer to the overview report.
There is some additional information for each service below:
This API performs CRUD operations for images using MongoDB and pushes changes to Kafka topics.
This API contains ML models that analyze leaf diseases based on different image representations.
basic_services/api/image_analyzer_api
This job automatically generates and uploads new pepper, potato, or tomato photos every 30 seconds.
basic_services/jobs/inner_jobs/camera
This job identifies diseases for one image every 100 seconds.
basic_services/jobs/inner_jobs/leaf_disease_recognizer
This job continuously updates information and can also activate and delete images.
This service comprises three distinct jobs:
- Job to activate or deactivate images, which runs 300 seconds.
- Job to update image metadata, which runs every 300 seconds.
- Job to delete images, which runs every 700 seconds.
basic_services/jobs/inner_jobs/users
This job synchronizes the producer and consumer databases using a batch approach.
basic_services/jobs/outer_jobs/db_synchronizer
Databases stores images along with their related information.
This database is a part of Leaf Image Management System and stores images along with their related information, which are produced.
basic_services/mongodb/producer_db/plant_db
This database stores images along with their related information, which are consumed.
basic_services/mongodb/consumer_db/plant_db
Metrics allow us to monitor the important aspects of the system, which are related to the performance, availability, and reliability.
This service collects metrics from the system. In our case, we collect metrics from image-api for stream-processing, and db-synchronizer for batch-processing.
- image_api_image - number of images according to their plant, id and disease
- image_api_image_size - size of images in bytes with metadata
- db_synchronizer_job_image - number of images according to their plant, id and disease
- db_synchronizer_job_image_size - size of images in bytes with metadata
Prometheus is deployed using Helm chart. Remember that you should install Helm before deploying Prometheus.
This service visualizes metrics from Prometheus. To login to Grafana use the following credentials: admin (login)/admin(password).
Here is the mechanism to import the dashboard to Grafana:
- You should open Grafana and login to it. After that you should click on the '+' button and select 'Import dashboard' option.
- You should add .json file from metrics/grafana/dashboard folder and click on 'Load' button.
- You should skip the 'Options' section and click on 'Import' button.
- You should see the imported dashboard.
For this project we use .stg files for staging and .prod files for production.
- Install minikube
- Start minikube
minikube start
You can change the context to minikube using the following command:
kubectl config view kubectl config current-context kubectl config use-context minikube
- Run the following script:
sh start.stg.sh
All the services will be deployed locally in 'leaf-image-management-system' namespace.
- Wait 2-3 minutes until all pods are running and the all the data has been loaded into the databases
- You can expose the ports of all services using the foolowing script: open.stg.sh
sh open.stg.sh
Each service has been deployed to 127.0.0.1 with ClusterIP type:
- image_api: 8080
- image_api_prometheus: 8050
- image_analizer_api: 8081
- camera: 5050
- leaf_disease_recognizer: 5051
- users: 5052
- db_synchronizer: 5053
- db_synchronizer_prometheus: 8051
- producer_db: 27017
- consumer_db: 27018
- kafka: 9093
- You can close the ports using the following script:
sh close.stg.sh
- After you implemented your consumer you can deploy Prometheus cluster with Grafana, run the following script:
sh start-metrics.stg.sh
All the services from this script will be deployed locally in 'metrics' namespace. Prometheus will be deployed using Helm chart. Remember that you should install Helm before deploying Prometheus.
If you have already installed Helm, you can deploy Prometheus calling the following command before running the script:
helm repo update
- Wait 7-9 minutes until all pods are running and the all the data has been loaded into the databases.
- You can expose the ports of all services using the foolowing script: open-metrics.stg.sh
sh open-metrics.stg.sh
Each service has been deployed to 127.0.0.1 with ClusterIP type:
- prometheus: 9090
- grafana: 3000
- You can close the ports using the following script:
sh close-metrics.stg.sh
- In grafana you can see the dashboard showing the batch and stream processing metrics.
- Create a GKE cluster. You can read the description file here.
- Run the following script:
sh start.prod.sh your_project_id
All the services will be deployed to 'leaf-image-management-system' namespace.
- Wait 7-9 minutes until all pods are running and the all the data has been loaded into the databases.
- GKE deploys services with NodePort type. The external IP addresses of the services are:
- image_api: 30080
- image_api_prometheus: 30051
- image_analizer_api: - (not deployed)
- camera: 30550
- leaf_disease_recognizer: - (not deployed)
- users: 30551
- db_synchronizer: 30552
- db_synchronizer_prometheus: 30051
- producer_db: 30017
- consumer_db: 30018
As you can see, image_analizer_api and leaf_disease_recognizer are not deployed to GKE. It is because we do not use GPU for this project.
You can connect to the services using the following command:
4.1 Find out the external IP address of the service: kubectl get nodes -o wide
4.2 Connect to the service using the following address: your_external_ip:service_port
- For the port exposing this schema uses ingress.
- After you implemented your consumer you can deploy Prometheus cluster with Grafana, run the following script:
sh start-metrics.prod.sh
All the services from this script will be deployed to 'metrics' namespace. Prometheus will be deployed using Helm chart.
- Wait 2-3 minutes until all pods are running and the all the data has been loaded into the databases.
- GKE deploys Grafana with LoadBalancer type.
- In Grafana you can add the dashboard showing the batch and stream processing metrics.
- For the entire system:
sh stop.stg.sh
- For metrics:
sh stop-metrics.stg.sh
- For the entire system:
sh stop.prod.sh
- For metrics:
sh stop-metrics.prod.sh
Based on the overview report and the deployment instructions, you will develop a service, which interacts with this system using and event-driven approach.
This assignment is divided into three parts:
- Task 1 [55%]: Develop a service for consuming and logging data from a Kafka cluster. Test and run the application locally. For this purpose you should use .stg files to deploy the system.
- Task 2 [30%]: Test and run the existing application, the Kafka cluster, and your service on Google Cloud Platform (GCP) using Google Kubernetes Engine (GKE). For this purpose you should use .prod files to deploy the system on GCP.
- Task 3 [15%]: Compare the bandwidth between the pre-implemented batch processing application and the stream processing you are tasked to implement. For this task you can see the difference between two approaches opening Grafana dashboards for the batch and stream processing processing.