Skip to content

Conversation

jordgedu
Copy link

No description provided.

@jordgedu
Copy link
Author

When I tested it, I found that this abnormal value resulted in a huge amount of GPU memory

@tgale96
Copy link
Contributor

tgale96 commented Jan 2, 2024

Ah yes, a while back we were specifying the capacity factor in terms of tokens rather than multiples of the expected number of tokens per expert. We must have missed updating this when we changed it :)

Would you mind updating the other moe scripts as well? Thanks!

@tgale96
Copy link
Contributor

tgale96 commented Jan 2, 2024

Also, out of curiosity - why are you using MoE, as opposed to dMoE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants