Skip to content

[ENH] Implements DoRA #790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 60 commits into
base: main
Choose a base branch
from

Conversation

julian-fong
Copy link
Contributor

@julian-fong julian-fong commented Feb 4, 2025

@julian-fong
Copy link
Contributor Author

@calpt I've written out the compose function, using https://github.com/NVlabs/DoRA/blob/7ee989695252cb0f4f7579182c81581aa75139a7/commonsense_reasoning/peft/src/peft/tuners/dora.py#L411 as an reference point.

Using the first notebook as a testing standpoint, it does train, however the performance is quite poor (does not outperform LoRA by a decent amount). maybe there is something wrong with my implementation (i'm also curious as to why they do norm_scale-1)

It also does not seem straightforward/simple to write out the delta_w methods and the inv_com method, as i believe it would require the pretrained weights W_o as an input for the calculation.

An initial review is appreciated :)

Copy link
Member

@calpt calpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a very promising state, the changes seem to look good for me from a functional perspective. What I think could be improved still is the readability of the implementation to ensure this is as easily understandable as possible for developers. Left a few comments mainly to this regard.

Comment on lines 469 to 473
norm_scale = m.weight.view(-1) / norm
scaled_weights = (norm_scale - 1) * weights
scaled_lora = norm_scale * added
result = scaled_weights + scaled_lora
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this computation is due to us calling the the com method afterwards, which adds re-adds the layer output? While this technically works, I think we should refactor this to make the code overall more readable (even if that makes it less concise). The formulation from the paper should ideally be clearly readable in our code to make it as understandable as possible for developers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right, the idea here for the method compute_dora_deltaw was to formulate the 'BAx' that goes into the Vera or LoRA layers. I tried to formulate it this way so that we could still utilize the original com method and keep as much of the original code intact. the method also isn't exactly too extensible since its designed specifically for the "W_oX + BAx" com method

"""This function returns the required weights necessary
to compute the inverse composition where `composition_mode` == add.
"""
result = weights - weights * norm.unsqueeze(1) / m.weight - added
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above, I think we should try to make this more readable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants