Skip to content

feat: MFU logging in train recipes#1413

Open
SwekeR-463 wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
SwekeR-463:feat/mfu-logging
Open

feat: MFU logging in train recipes#1413
SwekeR-463 wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
SwekeR-463:feat/mfu-logging

Conversation

@SwekeR-463
Copy link
Contributor

What does this PR do ?

  • Adds AutoMFU to compute and log Model FLOPs Utilization (MFU) in LLM training recipes, with registry-style model/device handling under _transformers.

Changelog

  • Added new AutoMFU class in _transformers.
  • Added registry-style device mapping for MFU reference TFLOPs.
  • Exported AutoMFU from _transformers/__init__.py.
  • Integrated MFU reporting into LLM train recipes in train_ft.py and train_seq_cls.py.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>
@SwekeR-463 SwekeR-463 changed the title (feat): MFU logging in train recipes feat: MFU logging in train recipes Feb 28, 2026
@hemildesai
Copy link
Contributor

Can you add something like https://github.com/verl-project/verl/blob/main/verl/utils/flops_counter.py#L22-L85 for the device registry?

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>
@SwekeR-463
Copy link
Contributor Author

Can you add something like https://github.com/verl-project/verl/blob/main/verl/utils/flops_counter.py#L22-L85 for the device registry?

Have done the required change.

@akoumpa
Copy link
Contributor

akoumpa commented Mar 3, 2026

/ok to test 7e43c4b

@akoumpa
Copy link
Contributor

akoumpa commented Mar 4, 2026

/ok to test 3bc3ccc

@akoumpa
Copy link
Contributor

akoumpa commented Mar 7, 2026

/ok to test 32bd853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants