-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi dataset option for GB24 #221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if we create a standalone example to pull multiple datasets together.
|
||
assert not (args.shmem and args.ddstore), "Cannot use both ddstore and shmem" | ||
if args.ddstore: | ||
opt = {"ddstore_width": args.ddstore_width, "local": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use local=True?
Is it because we want different ranks to import different datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intended to indicate that the dataset will setup for DDStore is a set of "locally" owned and thus no need to split internally. Previously, we provide a global view to DDStore and DDstore split them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this affect global shuffling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clarifications needed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question.
* add multi dataset option * create a separate directory and update * black * black * replace OC2020 with GFM
Two update: