-
Notifications
You must be signed in to change notification settings - Fork 362
[ENH] Implements DoRA #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@calpt I've written out the compose function, using https://github.com/NVlabs/DoRA/blob/7ee989695252cb0f4f7579182c81581aa75139a7/commonsense_reasoning/peft/src/peft/tuners/dora.py#L411 as an reference point. Using the first notebook as a testing standpoint, it does train, however the performance is quite poor (does not outperform LoRA by a decent amount). maybe there is something wrong with my implementation (i'm also curious as to why they do norm_scale-1) It also does not seem straightforward/simple to write out the An initial review is appreciated :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a very promising state, the changes seem to look good for me from a functional perspective. What I think could be improved still is the readability of the implementation to ensure this is as easily understandable as possible for developers. Left a few comments mainly to this regard.
src/adapters/methods/lora.py
Outdated
norm_scale = m.weight.view(-1) / norm | ||
scaled_weights = (norm_scale - 1) * weights | ||
scaled_lora = norm_scale * added | ||
result = scaled_weights + scaled_lora | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this computation is due to us calling the the com
method afterwards, which adds re-adds the layer output? While this technically works, I think we should refactor this to make the code overall more readable (even if that makes it less concise). The formulation from the paper should ideally be clearly readable in our code to make it as understandable as possible for developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats right, the idea here for the method compute_dora_deltaw
was to formulate the 'BAx' that goes into the Vera
or LoRA
layers. I tried to formulate it this way so that we could still utilize the original com
method and keep as much of the original code intact. the method also isn't exactly too extensible since its designed specifically for the "W_oX + BAx" com method
src/adapters/methods/lora.py
Outdated
"""This function returns the required weights necessary | ||
to compute the inverse composition where `composition_mode` == add. | ||
""" | ||
result = weights - weights * norm.unsqueeze(1) / m.weight - added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to above, I think we should try to make this more readable
Paper: https://arxiv.org/pdf/2402.09353
Github: https://github.com/NVlabs/DoRA/