Skip to content

refactor: clean up multi-node config and use num_instances#805

Closed
AdamRajfer wants to merge 1 commit intomainfrom
arajfer/multi-node-multi-instance-cleanup
Closed

refactor: clean up multi-node config and use num_instances#805
AdamRajfer wants to merge 1 commit intomainfrom
arajfer/multi-node-multi-instance-cleanup

Conversation

@AdamRajfer
Copy link
Contributor

Replace boolean multi-instance flag with num_instances count in slurm execution config. Set up node IP discovery and per-instance routing for multi-node deployments. Clean up multi-node examples and docs. Use OmegaConf.select for safe num_instances access across all executor types. Update tests.

Replace boolean multi-instance flag with num_instances count.
Set up node IP discovery and per-instance routing for future
multi-node runs. Update examples, docs, and tests.

Signed-off-by: Adam Rajfer <arajfer@nvidia.com>
@AdamRajfer AdamRajfer requested review from a team as code owners March 5, 2026 19:06
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added documentation Improvements or additions to documentation nemo-evaluator-launcher tests labels Mar 5, 2026
@AdamRajfer AdamRajfer self-assigned this Mar 5, 2026
@AdamRajfer
Copy link
Contributor Author

/ok to test bf5f330

@marta-sd
Copy link
Contributor

marta-sd commented Mar 9, 2026

Closing as we've merged a more extended version in #757

@marta-sd marta-sd closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation nemo-evaluator-launcher tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants