Skip to content

Fix usage of fract_base_dir and val_split#63

Merged
michaelmckinsey1 merged 14 commits intoLBANN:mainfrom
michaelmckinsey1:fractal_base
May 6, 2026
Merged

Fix usage of fract_base_dir and val_split#63
michaelmckinsey1 merged 14 commits intoLBANN:mainfrom
michaelmckinsey1:fractal_base

Conversation

@michaelmckinsey1
Copy link
Copy Markdown
Collaborator

@michaelmckinsey1 michaelmckinsey1 commented Apr 30, 2026

  • Fixes setting fractal_dir. This was being ignored before
  • Fixes setting val_split. This was not creating a new dataset because it wasn't in the hash
  • Undoes val_batch_size. We should use the same for train and val

@michaelmckinsey1 michaelmckinsey1 self-assigned this Apr 30, 2026
@PatrickRMiles
Copy link
Copy Markdown
Collaborator

Why do we want to be able to change the val_split from 75/25?

@michaelmckinsey1
Copy link
Copy Markdown
Collaborator Author

Why do we want to be able to change the val_split from 75/25?

I changed it to 30 because the default number of samples in the validation dataset with our current config is 60 at 25% split. At a 30% split it is 72. The target is 64, since that allows for a global batch size up to 64 instead of 32 (powers of 2 because of sharding).

We could also make it 27% to satisfy this constraint, but I just rounded up.

@michaelmckinsey1 michaelmckinsey1 marked this pull request as ready for review May 6, 2026 15:34
Copy link
Copy Markdown
Collaborator

@PatrickRMiles PatrickRMiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing to discuss. Looks like github is having issues currently. Let me add that review comment back.

@PatrickRMiles
Copy link
Copy Markdown
Collaborator

In get_dataset.py, why include fract_base_dir in the dataset hash?

@michaelmckinsey1
Copy link
Copy Markdown
Collaborator Author

michaelmckinsey1 commented May 6, 2026

In get_dataset.py, why include fract_base_dir in the dataset hash?

I think this was a mistake, so I removed it. val_split is intentional so it generates a different dataset based on val split

@michaelmckinsey1 michaelmckinsey1 merged commit cd4f79b into LBANN:main May 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants