-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
audience/technicalIssue primarily for technical review and service.Issue primarily for technical review and service.kind/cicdCICD, dev ops, platform ops, etcCICD, dev ops, platform ops, etckind/performancekind/production-and-commercializationTasks for commercial distribution PaaS / SaaS and scale.Tasks for commercial distribution PaaS / SaaS and scale.kind/text-generative-aiDevelopment of generative AI capabilitiesDevelopment of generative AI capabilitiesstatus/ready-pending-testsReady to make pull request once tests pass.Ready to make pull request once tests pass.triage/high-priority
Description
TLDR Package the settings from best run in Docker
Issue
We have found the optimal model on the HPO srudy done on branch 307. We need to package this run in a suitable GPU enabled container to scale it up and control the dependencies.
Task
Make this run in tensorflow/tensorflow:2.20.0-gpu
- Dockerize the script
- Set up a volume mount for artifacts
- Clean up dependencies
- Parameterize the script (Dataset, number of samples, sample expansion, ...)
To DO:
- Replace the prompt samples with the ones from the data set.
- Set the stage 1-a and 1-b model checkpoints to save in the artifacts folder.
- Make sure model serialization still works on the mounted volume
- Add conditional MlFlow logging of params (log if MLFLOW_PORT != 0).
- Set MlFlow up on SQLite on the monted volume.
- Add MlFlow system metrics.
- make sure the experiment name is unique automatically. (Generate a unique name to avoid naming collisions.)
- Rebuild and push the latest container.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
audience/technicalIssue primarily for technical review and service.Issue primarily for technical review and service.kind/cicdCICD, dev ops, platform ops, etcCICD, dev ops, platform ops, etckind/performancekind/production-and-commercializationTasks for commercial distribution PaaS / SaaS and scale.Tasks for commercial distribution PaaS / SaaS and scale.kind/text-generative-aiDevelopment of generative AI capabilitiesDevelopment of generative AI capabilitiesstatus/ready-pending-testsReady to make pull request once tests pass.Ready to make pull request once tests pass.triage/high-priority