Weight LoRA #2406

Vepricov · 2025-03-04T17:25:38Z

This PR brings a new method -- Weight LoRA.

WeightLoRA

Weight LoRA is a less complex, but important, PEFT method that adds a weight $w_i$ to each LoRA adapter (here i -- adapter number). This is done in order to perform, in addition to the classical optimisation over all LoRAs $A_1, B_1, ..., A_n, B_n$, an alternative optimisation over a vector of weights $w := (w_1, ..., w_n)^T \in R^n$ with a wide variety of constraints. In our research paper, we consider two approaches: 1) the vector $w$ must be in simplex $\Delta_{n-1}$, and 2) the vector $w$ has only $K$ non-zero coordinates. Both of these methods solve the problem of finding the most important LoRA adapters in the model and concentrating training on them while disabling the rest.

The abstract from the paper is:

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation (LoRA), which adds trainable adapters to selected layers. Although LoRA may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, WeightLoRA, which overcomes this issue by adaptive selection of the most critical LoRA heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. Finally, we conduct experiments for the series of competitive benchmarks and DeBERTa and BART models, comparing our approach with the most popular LoRA modifications. The experimental results demonstrate the efficacy of WeightLoRA and the superior performance of WeightLoRA+ in comparison to the baselines in nearly all cases.

Original code

BenjaminBossan · 2025-03-05T10:49:20Z

Thanks for this PR that proposes to add Weight LoRA to PEFT. Do you have a link to the full paper? I only skimmed the implementation, but from what I saw, this is basically LoRA but with the only difference being that the scaling parameter is trainable? Just from the abstract you pasted, it appears that there should be additional constraints on w that I don't see in the implementation.

Vepricov · 2025-03-05T14:14:35Z

I only skimmed the implementation, but from what I saw, this is basically LoRA but with the only difference being that the scaling parameter is trainable?

Yes, you are right. However, the fact that the parameters w_i are trainable opens a great potential for methods that use alternative optimization with constraints on the weights w.

Just from the abstract you pasted, it appears that there should be additional constraints on w that I don't see in the implementation.

These constraints should not be taken into account in the WeightLoRA method, but in the implementation of the optimizer step (e.g. SGD with projection). In our paper, we provide the WeightAdam optimizer, where we project weights w onto the desired set.

Do you have a link to the full paper?

Unfortunately we submitted this paper to the ACL 2025 conference and it has double-blind review, therefore I cannot send the full text, but I can share, for example, the results of the experiment in the form of a table.

BenjaminBossan · 2025-03-05T15:14:23Z

These constraints should not be taken into account in the WeightLoRA method, but in the implementation of the optimizer step (e.g. SGD with projection). In our paper, we provide the WeightAdam optimizer, where we project weights w onto the desired set.

Note that this optimizer can be included in the PR, to src/peft/optimizers/. Of course it would be up to the user to actually make use of it, but this can be steered via docs and examples.

Unfortunately we submitted this paper to the ACL 2025 conference and it has double-blind review, therefore I cannot send the full text, but I can share, for example, the results of the experiment in the form of a table.

I would suggest to wait with this PR until the paper is accepted, otherwise it's hard for us to review the PR. Moreover, there could be useful changes during the review process that should be reflected in the integration.

github-actions · 2025-04-04T15:04:01Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2025-04-04T15:13:53Z

not stale

Vepricov added 2 commits March 4, 2025 20:18

weight lora

5d835c7

weight lora

70fb0c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight LoRA #2406

Weight LoRA #2406

Vepricov commented Mar 4, 2025

BenjaminBossan commented Mar 5, 2025

Vepricov commented Mar 5, 2025

BenjaminBossan commented Mar 5, 2025

github-actions bot commented Apr 4, 2025

BenjaminBossan commented Apr 4, 2025

Weight LoRA #2406

Are you sure you want to change the base?

Weight LoRA #2406

Conversation

Vepricov commented Mar 4, 2025

WeightLoRA

BenjaminBossan commented Mar 5, 2025

Vepricov commented Mar 5, 2025

BenjaminBossan commented Mar 5, 2025

github-actions bot commented Apr 4, 2025

BenjaminBossan commented Apr 4, 2025