MFU tracking for inference by tdene · Pull Request #3856 · NVIDIA/Megatron-LM

tdene · 2026-03-13T12:53:18Z

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

When your PR is ready, click Ready for Review.
An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
- Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

copy-pr-bot · 2026-03-13T12:53:22Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

tdene · 2026-03-16T13:40:38Z

/claude review

megatron/training/training.py

megatron/core/inference/inference_flops.py

claude

Two inline comments:

Potential ZeroDivisionError in training.py: total_tps and e2e_tps don't guard against elapsed_time_per_iteration == 0, unlike the neighboring train_tps / inf_tps lines.
Missing d_conv in from_args(): InferenceFLOPsConfig.d_conv is used in the conv1d FLOPs term but from_args never reads it from args, silently using the default of 4.

Also noting: none of the new modules (inference_flops.py, gpu_peak_flops.py, mfu_tracker.py) have unit tests. The pure-computation logic in InferenceFLOPsCalculator and MFUTracker would be straightforward to test and would catch regressions in the FLOPs formulas.

Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>

tdene force-pushed the tde/mfu_tracking branch from 53d02d1 to a24c196 Compare March 13, 2026 12:58

tdene marked this pull request as ready for review March 16, 2026 03:19

tdene requested review from a team as code owners March 16, 2026 03:19

svcnvidia-nemo-ci requested a review from a team March 16, 2026 03:19

svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 16, 2026

svcnvidia-nemo-ci added the complexity: medium label Mar 16, 2026

copy-pr-bot bot temporarily deployed to test March 16, 2026 06:08 Inactive

claude bot reviewed Mar 16, 2026

View reviewed changes

megatron/training/training.py Show resolved Hide resolved

claude bot reviewed Mar 16, 2026

View reviewed changes

megatron/core/inference/inference_flops.py Show resolved Hide resolved

claude bot reviewed Mar 16, 2026

View reviewed changes

tdene force-pushed the tde/mfu_tracking branch from 3c7908f to 8398d81 Compare March 16, 2026 14:51

copy-pr-bot bot temporarily deployed to test March 16, 2026 14:52 Inactive

copy-pr-bot bot temporarily deployed to test March 16, 2026 19:47 Inactive

tdene and others added 2 commits March 16, 2026 14:51

MFU tracking for inference

5178546

Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>

Address reviewer comments

c619fba

tdene force-pushed the tde/mfu_tracking branch from 6b795bb to c619fba Compare March 16, 2026 19:57

copy-pr-bot bot had a problem deploying to test March 16, 2026 19:59 Error

Fix world size inconsistency

fec22fe

copy-pr-bot bot temporarily deployed to test March 16, 2026 20:10 Inactive

tdene added 2 commits March 16, 2026 17:53

Add unit tests

ce0439e

lint

ae9a814

tdene requested review from a team as code owners March 17, 2026 09:45

copy-pr-bot bot temporarily deployed to test March 17, 2026 09:46 Inactive

Address reviewer comments

ae8b697

copy-pr-bot bot temporarily deployed to test March 18, 2026 04:21 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFU tracking for inference#3856

MFU tracking for inference#3856
tdene wants to merge 6 commits intoNVIDIA:mainfrom
tdene:tde/mfu_tracking

tdene commented Mar 13, 2026

Uh oh!

copy-pr-bot bot commented Mar 13, 2026

Uh oh!

tdene commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tdene commented Mar 13, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

Step 1: Mark PR as "Ready for Review"

Step 2: Final Review

Step 3: Approved

Merge

Uh oh!

copy-pr-bot bot commented Mar 13, 2026

Uh oh!

tdene commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants