Skip to content

Adding Secret support to Lepton Executor#382

Closed
zoeyz101 wants to merge 64 commits intoNVIDIA-NeMo:mainfrom
zoeyz101:zozhang/lepton-secrets
Closed

Adding Secret support to Lepton Executor#382
zoeyz101 wants to merge 64 commits intoNVIDIA-NeMo:mainfrom
zoeyz101:zozhang/lepton-secrets

Conversation

@zoeyz101
Copy link
Contributor

No description provided.

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
roclark
roclark previously approved these changes Nov 12, 2025
Copy link
Contributor

@roclark roclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

hemildesai and others added 22 commits November 12, 2025 12:12
* Add logs dir to container mount for ray slurm

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix tests

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
…DIA-NeMo#286)

* finetune on dgxcloud with nemo-run and deploy on bedrock example

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>

* removing trailing slash

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>

* reformatting notebook

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>

* adding EOF

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>

---------

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Roee Landesman <roeeland@cisco.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
…#285)

* fix docs tutorial links and add intro to guides/index.md

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Adding project.json/versions1.json, and update conf.py

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* fixes

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Co-authored-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
…es (NVIDIA-NeMo#295)

* Add thread pool to get status of jobs inside experiment

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Add thread pools to experiment run

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fixes

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
…NeMo#251)

- do prepare stage only from single process or rank
- for --node-rank, also look for SLURM_NODEID

Signed-off-by: Pramod Kumbhar <prkumbhar@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* Upgrade skypilot to v0.10.0, introduce network_tier

Signed-off-by: Roee Landesman <roeeland@cisco.com>

* add unit tests

Signed-off-by: Roee Landesman <roeeland@cisco.com>

---------

Signed-off-by: Roee Landesman <roeeland@cisco.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* ci(fix): Use GITHUB_TOKEN for community bot

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* remove breaking torchrun config for single-node runs

Signed-off-by: Roee Landesman <roeeland@cisco.com>

* fix lint

Signed-off-by: Roee Landesman <roeeland@cisco.com>

---------

Signed-off-by: Roee Landesman <roeeland@cisco.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* update lepton executor to include custom prelaunch commands section

Signed-off-by: ansjindal <ansjindal@nvidia.com>

* add test for prelaunch section

Signed-off-by: ansjindal <ansjindal@nvidia.com>

* add more tests for checking the pre-launch-commands section

Signed-off-by: ansjindal <ansjindal@nvidia.com>

* update lepton executor tests

Signed-off-by: ansjindal <ansjindal@nvidia.com>

---------

Signed-off-by: ansjindal <ansjindal@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* Create CHANGELOG.md

Signed-off-by: Pablo Garay <palenq@gmail.com>

* Add entries to CHANGELOG.md

Signed-off-by: Pablo Garay <palenq@gmail.com>

* Update CHANGELOG.md

Signed-off-by: Pablo Garay <palenq@gmail.com>

* Update CHANGELOG.md

Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>

* add links

---------

Signed-off-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* Correctly append tar files for packaging

Signed-off-by: Sahil Modi <samodi@nvidia.com>

* tests

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Sahil Modi <samodi@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
…VIDIA-NeMo#320)

* Specify nodes for gpu metrics collection and split data to each rank

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>

* Fix unit test

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
aschilling-nv and others added 27 commits November 12, 2025 12:14
* Fixing documentation layout

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* documentation.md

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* Removing live-server

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

* Correctin .vscode

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

---------

Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* fix: Emit exit-code of docker runs

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fix test

Signed-off-by: oliver könig <okoenig@nvidia.com>

* fixes

Signed-off-by: oliver könig <okoenig@nvidia.com>

* refactor

Signed-off-by: oliver könig <okoenig@nvidia.com>

* cleanup

Signed-off-by: oliver könig <okoenig@nvidia.com>

* add scheduler test

Signed-off-by: oliver könig <okoenig@nvidia.com>

* more scheduler tests

Signed-off-by: oliver könig <okoenig@nvidia.com>

* test executor

Signed-off-by: oliver könig <okoenig@nvidia.com>

* formatting

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* [🤖]: Howdy folks, let's bump NeMo Run to `0.8.0rc0.dev0` !

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* lintfix double quotes

Signed-off-by: Pablo Garay <pagaray@nvidia.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* feat: add copyright check

* feat: add copyright check

* feat: add copyright check

* feat: add copyright check

* feat: add copyright check

Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* Add port parameter to SSHTunnel

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Fix ci

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Fix copyright

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Fix copyright

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Fix copyright

Signed-off-by: Igor Gitman <igitman@nvidia.com>

---------

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
* Update ray template

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* add ray enroot exec template

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
Signed-off-by: Zoey Zhang <zozhang@nvidia.com>
@zoeyz101 zoeyz101 force-pushed the zozhang/lepton-secrets branch from 90478e0 to 9cf1d77 Compare November 12, 2025 20:16
@zoeyz101 zoeyz101 closed this Nov 12, 2025
@zoeyz101 zoeyz101 deleted the zozhang/lepton-secrets branch November 12, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.