ENH: Orthogonal LoRA layer initialization (2) #2498

BenjaminBossan · 2025-04-15T13:20:32Z

Continuation of, and supersedes #2389

Check discussion there for further info.

Continuation of, and supersedes huggingface#2389 Check discussion there for further info.

HuggingFaceDocBuilderDev · 2025-04-15T13:24:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-04-15T13:24:53Z

I compared the results with orthogonal init vs normal and Gaussian LoRA init, with all other parameters kept equal, using the MetaMathQA method comparison suite. Test accuracy on GSMK8K improved is 47.8% for default, 49.3% for Gaussian, and 48.9% for orthogonal. As expected, memory usage and runtime are practically identical.

I also plotted the train loss:

There seems to be a slight advantage for orthogonal initialization there is well, though for the first half of the run, it lags behind the other methods.

ENH Orthogonal LoRA layer initialization (2)

4a59a80

Continuation of, and supersedes huggingface#2389 Check discussion there for further info.

BenjaminBossan mentioned this pull request Apr 15, 2025

orthogonal lora layer init #2389

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Orthogonal LoRA layer initialization (2) #2498

ENH: Orthogonal LoRA layer initialization (2) #2498

BenjaminBossan commented Apr 15, 2025

HuggingFaceDocBuilderDev commented Apr 15, 2025

BenjaminBossan commented Apr 15, 2025 •

edited

Loading

ENH: Orthogonal LoRA layer initialization (2) #2498

Are you sure you want to change the base?

ENH: Orthogonal LoRA layer initialization (2) #2498

Conversation

BenjaminBossan commented Apr 15, 2025

HuggingFaceDocBuilderDev commented Apr 15, 2025

BenjaminBossan commented Apr 15, 2025 • edited Loading

BenjaminBossan commented Apr 15, 2025 •

edited

Loading