[peft] Define PEFT base class and LoRA transform by ananthsub · Pull Request #71 · NVIDIA-NeMo/Megatron-Bridge

ananthsub · 2025-06-12T02:35:06Z

Base class implementation leans heavily on implementations from PeFT in NeMo: https://github.com/NVIDIA/NeMo/blob/ec9c486557f1b5fb211a903e299d4cb5be1fd3b9/nemo/lightning/pytorch/callbacks/peft.py#L44

PEFT base class differences:

PEFT is an abstract dataclass so that implementations neatly plug into the ConfigContainer (params_to_save is now set in the post init, which subclasses need to explicitly initialize)
We explicitly pass a training flag to __call__ and freeze_model as we cannot rely on the presence of a lightning trainer object to indicate what stage is being requested
The base class does not yet concern itself with checkpointing outside of setting the params to save and the adapter key filter. This save/restore paths will be sent out in follow up PRs as part of integrating peft into the main training loop flow

LoRA is otherwise nearly identical to what's in NeMo minus the following:

dropout_recompute is not enabled, since it adds an dependency on thunder ([peft] Port base adapter wrapper and lora utils #44 (comment))
we don't use this attribute here (https://github.com/NVIDIA/NeMo/blob/ec9c486557f1b5fb211a903e299d4cb5be1fd3b9/nemo/collections/llm/peft/lora.py#L442) which is set in the base class here (https://github.com/NVIDIA/NeMo/blob/0af8b7df793ad7538f149ad9bcb8c2cae5134c1a/nemo/lightning/pytorch/callbacks/peft.py#L150)

LoRAMerge is identical to what's in NeMo

This PR includes fixes to walk_utils found from new unit tests

See #68 for the ModuleMatcher, which should get merged first as it's independently testable

copy-pr-bot · 2025-06-12T02:35:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ananthsub · 2025-06-12T02:36:01Z

/ok to test 46f3623

marcromeyn · 2025-06-12T07:30:35Z

What's the timeline for adding PEFT to the hub? In the work I was previously doing I have re-implemented PEFT and was thinking to contribute that. But I first need to finish the bridge, so I can likely start with that in a few weeks. The new implementation would also leverage the bridge to support 2-way binding with HF.

The new implementation I worked on doesn't really change the external API, it's mostly internal. So could make sense to first merge a nemo-inspired PEFT and them iterate on it.

ananthsub · 2025-06-12T16:05:09Z

@marcromeyn I'd like to have some basic PEFT support in for 25.07, so as soon as possible. for now I am following the existing nemo pattern as closely as possible to simplify the migration

ananthsub · 2025-06-12T16:32:25Z

/ok to test 09c7d73

src/megatron/hub/peft/module_matcher.py

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub · 2025-06-13T00:26:22Z

/ok to test 4256a0b

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub · 2025-06-13T16:19:52Z

/ok to test c9c5b74

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

ananthsub requested review from cuichenx, hemildesai and maanug-nv June 12, 2025 02:35

ananthsub mentioned this pull request Jun 12, 2025

PEFT integration #27

Closed

ananthsub requested a review from marcromeyn June 12, 2025 02:36

copy-pr-bot bot temporarily deployed to test June 12, 2025 02:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 12, 2025 02:36 Inactive

ananthsub changed the title ~~Define PEFT base class and LoRA transform~~ [peft] Define PEFT base class and LoRA transform Jun 12, 2025

copy-pr-bot bot had a problem deploying to nemo-ci June 12, 2025 02:52 Failure

copy-pr-bot bot temporarily deployed to nemo-ci June 12, 2025 02:52 Inactive

copy-pr-bot bot temporarily deployed to test June 12, 2025 16:32 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 12, 2025 16:32 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 12, 2025 16:34 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 12, 2025 16:52 Inactive

hemildesai reviewed Jun 12, 2025

View reviewed changes

src/megatron/hub/peft/module_matcher.py Outdated Show resolved Hide resolved

ananthsub added 2 commits June 12, 2025 17:10

Define PEFT base class and LoRA transform

5b90f8b

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

run integration on gpu only

b0f815d

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub force-pushed the peft-base branch from 09c7d73 to b0f815d Compare June 13, 2025 00:12

ananthsub added 2 commits June 12, 2025 17:23

rebase

fd014ec

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

rebase

4256a0b

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

copy-pr-bot bot temporarily deployed to test June 13, 2025 00:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 13, 2025 00:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 13, 2025 00:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 13, 2025 01:35 Inactive

update import

c9c5b74

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

copy-pr-bot bot temporarily deployed to test June 13, 2025 16:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 13, 2025 16:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 13, 2025 16:23 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci June 13, 2025 16:39 Inactive

hemildesai approved these changes Jun 13, 2025

View reviewed changes

ananthsub merged commit 9a83e5e into NVIDIA-NeMo:main Jun 13, 2025
10 checks passed

yaoyu-33 pushed a commit that referenced this pull request Jul 10, 2025

fix: Do not initialize reference model for sft (#71)

640297c

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[peft] Define PEFT base class and LoRA transform#71

[peft] Define PEFT base class and LoRA transform#71
ananthsub merged 5 commits intoNVIDIA-NeMo:mainfrom
ananthsub:peft-base

ananthsub commented Jun 12, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jun 12, 2025

Uh oh!

ananthsub commented Jun 12, 2025

Uh oh!

marcromeyn commented Jun 12, 2025

Uh oh!

ananthsub commented Jun 12, 2025

Uh oh!

ananthsub commented Jun 12, 2025

Uh oh!

Uh oh!

ananthsub commented Jun 13, 2025

Uh oh!

ananthsub commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ananthsub commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Jun 12, 2025

Uh oh!

ananthsub commented Jun 12, 2025

Uh oh!

marcromeyn commented Jun 12, 2025

Uh oh!

ananthsub commented Jun 12, 2025

Uh oh!

ananthsub commented Jun 12, 2025

Uh oh!

Uh oh!

ananthsub commented Jun 13, 2025

Uh oh!

ananthsub commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ananthsub commented Jun 12, 2025 •

edited

Loading