Skip to content

Minimize io operations in experiment, avoid persisting experiment to disk before run/dryrun#150

Merged
hemildesai merged 3 commits intomainfrom
hemil/minimize-io
Feb 14, 2025
Merged

Minimize io operations in experiment, avoid persisting experiment to disk before run/dryrun#150
hemildesai merged 3 commits intomainfrom
hemil/minimize-io

Conversation

@hemildesai
Copy link
Contributor

This PR solves two problems:

  1. Before, experiment and job config was being persisted to disk even if exp.run was not called. This created unnecessary directories and made it harder to track down experiments that actually ran. Now, it persists information only when exp.run is called thus eliminating noise.

  2. Before, _save_jobs was called after every exp.add and after every job launch. This is unnecessary and increases io times for experiments that have thousands of jobs. Now we only call _save_jobs twice. Once at the start of exp.run and once at the end. This will be followed by a PR that cancels all submitted jobs if exp.run fails midway.

…disk before run/dryrun

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
titu1994
titu1994 previously approved these changes Feb 13, 2025
Copy link
Contributor

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Copy link
Contributor

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test the speed, but job submission seems to work correctly

@hemildesai hemildesai merged commit 03db88e into main Feb 14, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants