-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an example Notebook to fine-tune Llama3 model using PyTorchJob #2418
Added an example Notebook to fine-tune Llama3 model using PyTorchJob #2418
Conversation
…beflow#2370) Signed-off-by: Andrey Velichkevich <[email protected]> Co-authored-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
…2377) Signed-off-by: Andrey Velichkevich <[email protected]> Co-authored-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
* Add MNIST example with SPMD for JAX Illustrate how to use JAX's `pmap` to express and execute single-program multiple-data (SPMD) programs for data parallelism along a batch dimension Signed-off-by: Sandipan Panda <[email protected]> * Update CONTRIBUTING.md Use -- server-side to install the latest local changes of Training Operator control plane Signed-off-by: Sandipan Panda <[email protected]> * Add JAXJob output Signed-off-by: Sandipan Panda <[email protected]> * Update JAXJob CI images Signed-off-by: Sandipan Panda <[email protected]> * Adjust jaxjob spmd example batch size Signed-off-by: Sandipan Panda <[email protected]> * Add JAX Example Docker Image Build in CI Signed-off-by: sailesh duddupudi <[email protected]> * Fix script name typo Signed-off-by: sailesh duddupudi <[email protected]> * Update script permissions Signed-off-by: sailesh duddupudi <[email protected]> * Add KIND_CLUSTER env var Signed-off-by: sailesh duddupudi <[email protected]> * Increase timeouts Signed-off-by: sailesh duddupudi <[email protected]> * Test higher resources Signed-off-by: sailesh duddupudi <[email protected]> * Increase Timeout Signed-off-by: sailesh duddupudi <[email protected]> * remove resource reqs Signed-off-by: sailesh duddupudi <[email protected]> * test low batch size Signed-off-by: sailesh duddupudi <[email protected]> * test small batch size Signed-off-by: sailesh duddupudi <[email protected]> * Hardcode number of batches Signed-off-by: sailesh duddupudi <[email protected]> --------- Signed-off-by: Sandipan Panda <[email protected]> Signed-off-by: sailesh duddupudi <[email protected]> Co-authored-by: Sandipan Panda <[email protected]> Co-authored-by: sailesh duddupudi <[email protected]>
) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.30.0 to 0.33.0. - [Commits](golang/net@v0.30.0...v0.33.0) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: ChristianZaccaria <[email protected]> Co-authored-by: ChristianZaccaria <[email protected]>
…beflow#2417) This commit adds jaxjobs to the aggregation ClusterRole for Kubeflow, which allows Kubeflow Profiles to have edit and admin rights over this CR. Fixes kubeflow#2416 Signed-off-by: Daniela Plascencia <[email protected]>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: