Skip to content

Add error handling for executor deserialization in dgxcloud scheduler…#167

Closed
pablo-garay wants to merge 7 commits intohemil/fix-dgxc-serdefrom
main
Closed

Add error handling for executor deserialization in dgxcloud scheduler…#167
pablo-garay wants to merge 7 commits intohemil/fix-dgxc-serdefrom
main

Conversation

@pablo-garay
Copy link
Contributor

… (#166)

hemildesai and others added 3 commits March 6, 2025 11:13
* refactor: Improve packaging job handling in SlurmExecutor

- Add `get_packaging_job_key()` to generate unique packaging job keys using experiment ID
- Update packaging job tracking to use experiment-specific keys
- Modify packaging job deserialization to handle legacy dictionary format
- Enhance experiment initialization with optional component defaults
- Improve tunnel scheduler initialization with optional experiment context

This refactoring ensures more robust and flexible packaging job management across different execution contexts.

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* ruff format fix

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(feat): Onboard codecov

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

* f

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants