-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The plot got from muP coord_check seems not horizontal, which may indicates there exits a bug in the muP implementation? #956
Comments
Thanks for raising this issue. It looks like you’re correct and we broke the implementation at some point. One thing we really need to start doing (but haven’t been able to do due to manpower limitations) is build out a robust testing suite that verifies new major changes don’t break old features :S |
Thanks for your reply! I checkout to some other commits, such as the |
I was looking into the muP implementation in I am thinking, could LR schedule be the cause of the problem? By design it overrides the LR values per group (hence overwrites muP changes), and so the way muP scaling was introduced in Also, However, neither I found whether |
I think you are right @ofivite, i don't think the implementation was ever correct since the learning rate wasn't correctly setup from the beginning. After a long and thorough debugging I managed to pass at least the 2 basic sanity checks for mup:
Issues I have found are:
|
Found another bug, |
@marcobellagente93 Oh yes, now it's indeed nicely flat curves, great ! :) |
I'll make a PR as soon as I can |
Edit: email formatting did not work properly.
Hi, I’m interested to see the PR as well. When I originally made my PR, the curves looked as expected—flat for mup and blown up for SP.
The use_mup parameter is set to false so the weights never get initialized
Are you referring to it being false by default? When calculating the base shapes using mup, two models have to be instantiated, one with use_mup set to false and the other set to true. If I remember correctly, I set use_mup to false by default everywhere and enabled it only when mup was set to true in the config file. Then, when calculating base shapes it’s forced to false.
The learning rate is not scaled by width_mult
I saw some code being linked to in this thread that’s from the original mup repo from Microsoft. I had to bring in and modify a lot of it. I believe width_mult for the learning rate may have been set there.
|
What I mean is that the main training loops with mup enabled does the following:
but at step 1 all parameters get initialized with self.use_mup = neox_args.use_mup (which is false) and causes everything else to be wrong (multipliers not used, 1/d attention not used, ...) |
This behavior is expected--the weights are reinitialized using Line 446 in 43ea51c
Or do you mean this function does not get called? |
Bug Discription & To Reproduce
The source code is from current main branch, and follow the instructions in the
README-MUP.md
until this step:I encounter an error like this:
This is caused by passing a Bool Tensor into the
get_stat
ofmup
(maybe the attention mask), but themup
library cannot handle it. In addition, we will also encount an error which is caused by passingNone
to theget_stat
.In order to solve this problem temporarily, I modify the source code in the file
coord_check.py
inmup
like this:This time,
coord_check
ran successfully, it outputs many jpgs, one for each GPU, jpg from different GPUs looks very similar, so I just show one jpg for each paramerization.Standard Parameterization:
muP Parameterization:
The result looks weird, the SP is more horizontal than muP, which is not expected.
Expected behavior
https://github.com/microsoft/mup#coord-check
An expected behavior should looks like the plots in the above link, in which muP is very horizontal, while SP blows up.
Proposed solution
I check the code related to
mup
, but don't have a proposal yet, I will try to keep checking it. Maybe contributors in the issue(#679) can give some comments? @nsarka @Quentin-Anthony @StellaAthenaThanks a lot!
Environment (please complete the following information):
mup
related configs into theconfigs/local_setup.yml
, and keep completely identical to the instructions inREADME-MUP.md
.The text was updated successfully, but these errors were encountered: