Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/claude review |
There was a problem hiding this comment.
Two inline comments:
-
Potential ZeroDivisionError in
training.py:total_tpsande2e_tpsdon't guard againstelapsed_time_per_iteration == 0, unlike the neighboringtrain_tps/inf_tpslines. -
Missing
d_convinfrom_args():InferenceFLOPsConfig.d_convis used in the conv1d FLOPs term butfrom_argsnever reads it from args, silently using the default of 4.
Also noting: none of the new modules (inference_flops.py, gpu_peak_flops.py, mfu_tracker.py) have unit tests. The pure-computation logic in InferenceFLOPsCalculator and MFUTracker would be straightforward to test and would catch regressions in the FLOPs formulas.
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
What does this PR do ?
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.