In this lab you will:
- Leverage Kubernetes to enable continuous model training and inference
- Distribute traffic across multiple replicas of the backend inference service
- And experiment with container lifecycle hooks in Kubernetes.
- Show the TA that you are continuously training a model using Kubernetes CronJobs
- Show the TA that inference requests are being distributed across multiple backend replicas. Explain to the TA how Kubernetes routes requests across multiple replicas in a service.
- Demonstrate an effective use of the
preStopcontainer lifecycle hook to the TA. Explain the graceful shutdown sequence to the TA and describe one practical use case for lifecycle hooks.
- Sign up for Docker: If you don't already, ensure you have an account at docker.com, so you can push images to the docker registry.
- Start Docker: Ensure Docker is running on your machine.
- Fork Repo: Fork the repo here
- Install MiniKube: Follow instructions here.
- Start MiniKube: Start MiniKube on your local setup. Run
minikube start. - Verify MiniKube: Confirm that MiniKube is running by listing all pods:
kubectl get po -A
- In
model_trainer.pyimplement the code to train the model given training dataXand labelsY - Push your model training image. For details on how to do that, see Build and Push the Docker Image
- In
trainer_deployment.yamlconfigure the cron so that the model training runs at a periodic interval (for demonstration purposes keep it fairly frequent; every 1-2 minutes)
kubectl apply -f trainer-deployment.yaml
At this point, you should be able to see the continuously running training model-trainer-job CronJob using the Minikube dashboard. For instructions on doing that see Troubleshooting.
Alternatively, you should be able to verify this using the kubectl cli:
kubectl get jobs # this should output a list of your most recent jobs
kubectl logs -f job/<trainer-job-id>
-
In
backend.pyimplement the code changes to predict based on -
Push your backend image to the Docker registry. For details on how to do that, see Build and Push the Docker Image
-
At this point, you should be able to verify that you can reach and can see the inference happening on distributed backend. For this task/step comment out the
lifecyclehook inbackend-deployment.yaml(or just skip this step and do the full implementation ofbackend-deployment.yamlat the next step).
Then apply the manifest for the backend service:
kubectl apply -f backend-deployment.yaml
You should see something like this printed out to the terminal:
deployment.apps/flask-backend-deployment created
service/flask-backend-service configured
Verify using this postman collection AND/OR the following cURL commands. For obtaining the correct port, see Accessing the Backend Service. You should see the host parameter in the response body vary across requests:
curl --location --request GET '127.0.0.1:<some-port-here>/model-info'
curl --location --request POST '127.0.0.1:<some-port-here>/predict' \
--header 'Content-Type: application/json' \
--data-raw '{
"avg_session_duration": 30,
"visits_per_week": 14,
"response_rate": 4,
"feature_usage_depth": 6,
"user_id": 34
}'
-
Looking at the responses from either the GET or the POST requests, how can you verify that Kubernetes is load-balancing the requests across the replicas of the backend inference service (hint: the
hostfield might be helpful here). Discuss with the TA how Kubernetes routes traffic to the replicas of a service.Consider reading the following source to better understand how Kubernetes routes traffic in a service:
-
Configure the lifecycle
preStophook in backend-deployment.yaml to signal to the backend.py process that the -
Re-deploy the backend:
kubectl apply -f backend-deployment.yaml
- Now that this task is complete you should be able to demonstrate the behavior of te lifecycle hook:
# for verifying shutdown hooks
kubectl rollout restart deployment/flask-backend-deployment
# then in a separate process
kubectl logs -l app=flask-backend -f
-
Based on the logs you observed during the rollout restart, when does the preStop container lifecycle hook run relative to Kubernetes sending SIGTERM to the container? Explain the graceful shutdown sequence to the TA and describe one practical use case for lifecycle hooks.
Consider reading the following two sources to better understand how Kubernetes handles shutdowns:
Before you do the following steps, ensure you have logged into docker. You will need to do the following for EACH type of image (trainer and backend)
docker build -t <your-dockerhub-username>/<backend-image-name>:1.0.0 -f Dockerfile.backend .
docker push <your-dockerhub-username>/<backend-image-name>:1.0.0
Access the Backend Service via the NodePort
- Access via NodePort:
- Get the MiniKube IP:
minikube ip - Access the backend service using
curl(replace<minikube-ip>with the output fromminikube ip):curl "http://<minikube-ip>:30080/?user_id=Alice"
- Use
minikube service(If NodePort Does Not Work):
- Create a tunnel to the backend service:
minikube service flask-backend-service - This command will provide a URL, typically in the format
http://127.0.0.1:<some-port>, which you can use to test the backend service.
- Launch MiniKube Dashboard:
Open the MiniKube dashboard to monitor the status of Pods, deployments, and services:
minikube dashboard - Minikube IP Issues: Use
minikube ipto verify the correct IP. - Service Not Accessible: If NodePort does not work, use
minikube serviceto create a tunnel. - Backend Logs: Use
kubectl logs -l app=flask-backend -fto monitor requests going to the backend. - Image Pull Issues: Ensure your Docker images are pushed to Docker Hub with the correct tags. More Details