Commit 74f6303
Llmb nemo r2.4.0 (#14634)
* Set attention backend to "auto" for Nemotron-H (#14042)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Adding TFLOPS per GPU Support for Finetuning (#14048)
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Enable optimizations for Nemotron-H (#13915)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Disable checkpointing for Nemotron-H (#14001)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Cherry pick ea4b47f (#13896)
* perf scripts updates (#13456)
* gb200 recommended cfgs csv fix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 495b h100 fix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* gb200 79b bf16 20 layers recompute
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* csv format fix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* csv format fix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 70b, 340b no fsdp
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* dsv3 perf mode
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* dsv3 perf mode peft
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* dsv3 callback
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* dsv3 callback
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* cudagraphs
Signed-off-by: Malay Nagda <malayn@nvidia.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* import missing callbacks in deepseek recipe
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
* Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix (#13926)
* Onboard LLAMA4 Maverick Finetuning(SFT) with SQUAD Dataset Download Fix
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Tweaks for llama4_e128
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Adding flags for skipping the separate SLURM jobs
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Arg parse changes and tweaks to remove squad dataset check
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Reverting the args for separate SLURM jobs as there is a dependency (run.Partial) with the finetune job
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Removing NullTokenizer due to compatability
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Separate args to have control over the 3 SLURM jobs
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Enabling TokenDropCallback and tp_comm_overlap
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Changes to introduce flags for enabling/disabling the 3 SLURM jobs
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Changing the exp_name based on the SLURM job being run
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Argparse Changes
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Fix for standalone checkpoint and dataload jobs
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Removing NUMA Factor error for dataset and checkpoint download job
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Fix for CUDA Graph error in this version
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Changes to peft_scheme
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Changes to set peft_scheme to None
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Replacing the file name from finetune_ to sft_
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Updates to exp_name format
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Reverting defaults for --finetuning arg to also include lora
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Tweak comment(s) in the finetuine llama4 e128 script
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Reverting the recommended config order change in for b200
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
---------
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>
* Add profiling changes (#13484)
* add profiling changes
Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
* More model changes
Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
---------
Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
* Port nemotron 25.04 patch to r3.2.0 based llmb-nemo (#13533)
* port nemotron patch to r3.2.0 based llmb-nemo
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
* update template for experiment names
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
* review based updates
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
---------
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
* Port run-ai patch to llmb-nemo branch (#13573)
* Port run-ai patch to llmb-nemo branch
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
* Apply isort and black reformatting
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
---------
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>
* Add grok recipe (#13586)
* Add grok recipe
Signed-off-by: mollys <mollys@mollys.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>
* Add copyright header
Signed-off-by: mollys <mollys@mollys.nvidia.com>
---------
Signed-off-by: mollys <mollys@mollys.nvidia.com>
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com>
* transformers_offline=0 and profile changes to llama3.1 405b (#13655)
* transformers_offline=0 and profile changes to llama3.1 405b
Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* nccl added
Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Add perf recipe script for Nemotron-H-56B (#13691)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Pretraining Deepseek changes for LLMB (#13752)
* working changes
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* cleanup
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* make profiling steps overridable
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* add nccl trace ability, cleanup
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>
---------
Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
* Adding FP8 Default Configs for LLAMA4 Maverick (#13698)
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Changing the tokenizer from Scout to Maverick in the pretrain LLAMA4 LLM Recipe (#13664)
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Tweaking LLama4 Maverick PreTrain file to adapt to the user configs parameter format (#13690)
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Grok Nvbug 5311566 (#13765)
* remove unnecessary nemo root check
* remove comment and unused packages
---------
Co-authored-by: mollys <mollys@mollys.nvidia.com>
* Grok nccl trace fix (#13769)
* remove unnecessary nemo root check
* remove comment and unused packages
* transformers online
* fix env vars
* setting transformers offline here doesn't work
---------
Co-authored-by: mollys <mollys@mollys.nvidia.com>
* Fix for config params in pretrain llama4 e128 (#13764)
* Fix for config params in pretrain llama4 e128
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Ignoring unrelated configs
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Cleanup of configs
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Adding all the params in get_user_configs func
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
---------
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Nsys Tweaks to llama4 pretrain (#13778)
* Removign hardcoding of nsys profiling ranges
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Adding NCCL Trace support for pretrain recipe (llama4)
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
---------
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>
* Disable checkpointing for Nemotron-H (#13786)
* Disable checkpointing for Nemotron-H
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Nemotron-H NCCL trace support
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
---------
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
* Llmb nemo r2.3.0 (#13806)
* set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* add all environment variables to container environment (#13808)
Co-authored-by: mollys <mollys@mollys.nvidia.com>
* fix numactl (#13809)
Co-authored-by: mollys <mollys@mollys.nvidia.com>
* Llmb nemo r2.3.0 (#13807)
* set NCCL_NET_GDR_LEVEL=PHB for deepseekv3, grok1_314b, llama31_405b, llama4_e128, nemotron4_15b+340b, nemotronh_56b
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* made experiment naming match standard
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* standardized exp_name for relevant workloads
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* fixing QA checkpoint bug for nemotron4 (#13843)
* fixing QA checkpoint bug for nemotron4
* Apply isort and black reformatting
Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
* arg name change
* Apply isort and black reformatting
Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
---------
Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster>
Co-authored-by: sshiddib <sshiddib@users.noreply.github.com>
* Add gpu metrics option (#13882)
* gpu metrics option
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
* specify nemo run commit
Signed-off-by: ashbhandare <abhandare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Fix linting error
Signed-off-by: ashbhandare <abhandare@nvidia.com>
---------
Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
* Revert "LLAMA4 Maverick SFT Recipe + SQUAD Dataset Download Error Fix"
This reverts commit 755fd36.
* fix nemo/collections/llm/recipes/__init__.py
* fix nemo/collections/llm/recipes/deepseek_v3.py
* new line
* fix nemo/collections/llm/recipes/llama4_e128.py
* fix scripts/performance/llm/finetune_llama4_e128.py
* small updates for grok
* modified: scripts/performance/llm/pretrain_grok1_314b.py
modified: scripts/performance/llm/pretrain_nemotron4_340b.py
* manually add util changes to helpers.py and executors.py
* Fix in Nemotron-H script (#14251)
* Fix in Nemotron-H script
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Fix in Nemotron-H perf script
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
---------
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* updated with some things from NeMo main (double_buffer) (#14305)
* updated with some things from NeMo main (double_buffer)
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Took out cudNN lines b/c of regression with cuDNN normalization kernel (#14360)
* added conditional cudnn to align with nemo main (#14324)
* added conditional cudnn to align with nemo main
Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
* fixed num optimizer instances bug
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
* Adds pyxis container writable and no mount home flags (#14386)
* Add pyxis flags for writable and no-mount home.
Signed-off-by: Alex Filby <afilby@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
---------
Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
Co-authored-by: sudostock <sudostock@users.noreply.github.com>
* Update DeepSeek-V3 perf scripts (#14377)
* Fix callbacks in DSV3 script (#14350)
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Changes to grok to alleviate error: TypeError: '>' not supported betw… (#14326)
* Changes to grok to alleviate error: TypeError: '>' not supported between instances of 'str' and 'int'
* Apply isort and black reformatting
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
* Made the changes where it's not default values hard coded. User can change thru cli
* Apply isort and black reformatting
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
* made suggested changes. Verified successful.
* Apply isort and black reformatting
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
* Made suggested change.
---------
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
* Make VBoost activation conditional (#14453)
* Refactor performance scripts to use build_perf_env_plugin function
* Replaced direct instantiation of PerfEnvPlugin with build_perf_env_plugin in multiple LLM finetuning and pretraining scripts for consistency and maintainability.
* Added build_perf_env_plugin function to helpers.py to streamline performance environment setup based on GPU and pipeline parallelism settings.
This change enhances code readability and reduces redundancy across scripts.
* control vboost enablement via cli
* Update finetune_llama4_e128.py to import build_perf_env_plugin function
* Added the build_perf_env_plugin import to enhance performance environment setup consistency across scripts.
This change aligns with recent refactoring efforts to streamline performance script management.
---------
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
* turned off tp overlap comms for >128 gpus on gb200 so jobs are functi… (#14460)
* turned off tp overlap comms for >128 gpus on gb200 so jobs are functional
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* Remove NCCL tracing option and clean up imports in performance scripts (#14467)
* Remove NCCL tracing option and clean up imports in performance scripts. Updated multiple LLM finetuning and pretraining scripts to eliminate the use of PerfEnvPlugin, enhancing consistency and maintainability.
* Apply isort and black reformatting
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
---------
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>
* Disable tp_comm_overlap for 512 gpus on GB200 (#14474)
...to fix functionality issue
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Workaround for MXFP8 functionality issue (#14426)
* Workaround for MXFP8 functionality issue
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
---------
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
* previous commit was buggy (#14477)
* previous was buggy
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
---------
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
* checkpoint save/load functionality with HF token (#14538)
* checkpoint save/load functionality with HF token
* Apply isort and black reformatting
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
* using use_hf_tokenizer
* reverting back to hf_token
---------
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
* added hf import for 15b/340b pretrain (#14565)
* Llmb nemo r2.4.0 (#14607)
* Update mixed_precision.py
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
* Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
---------
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
---------
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <malayn@nvidia.com>
Signed-off-by: rhmukundan <rhmukundan@users.noreply.github.com>
Signed-off-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: ashbhandare <ashbhandare@users.noreply.github.com>
Signed-off-by: Barys Dubauski <bdubauski@nvdia.com>
Signed-off-by: bdubauski <bdubauski@users.noreply.github.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>
Signed-off-by: nv-mollys <nv-mollys@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@salberdi-mlt.client.nvidia.com>
Signed-off-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Signed-off-by: ashbhandare <abhandare@nvidia.com>
Signed-off-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Signed-off-by: sshiddib <sshiddib@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: sudostock <sudostock@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Signed-off-by: Sebastian Alberdi <salberdi@nvidia.com>
Signed-off-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: nv-mollys <149841089+nv-mollys@users.noreply.github.com>
Co-authored-by: scsudhakaran <scsudhakaran@nvidia.com>
Co-authored-by: rhmukundan <102543536+rhmukundan@users.noreply.github.com>
Co-authored-by: malay-nagda <malayn@nvidia.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: mollys <mollys@mollys.nvidia.com>
Co-authored-by: rhmukundan <rhmukundan@users.noreply.github.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: ashbhandare <ashbhandare@users.noreply.github.com>
Co-authored-by: bdubauski <80418713+bdubauski@users.noreply.github.com>
Co-authored-by: Barys Dubauski <bdubauski@nvdia.com>
Co-authored-by: bdubauski <bdubauski@users.noreply.github.com>
Co-authored-by: nv-mollys <nv-mollys@users.noreply.github.com>
Co-authored-by: salberdi-nvidia <salberdi@nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-preos02.a51.clusters.nvidia.com>
Co-authored-by: salberdi-nvidia <salberdi-nvidia@users.noreply.github.com>
Co-authored-by: Sebastian Alberdi <salberdi@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: ashbhandare <abhandare@nvidia.com>
Co-authored-by: Aishwarya Bhandare <abhandare@login-ptyche01.ptyche.clusters.nvidia.com>
Co-authored-by: scsudhakaran <scsudhakaran@users.noreply.github.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@nvidia.com>
Co-authored-by: Sharada Shiddibhavi <sshiddibhavi@cw-dfw-cs-001-vscode-02.cm.cluster>
Co-authored-by: sshiddib <sshiddib@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Co-authored-by: rsalagame-nvidia <rsalagame@nvidia.com>
Co-authored-by: Sebastian Alberdi <salberdi@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Alex Filby <alexfilby@gmail.com>
Co-authored-by: sudostock <sudostock@users.noreply.github.com>
Co-authored-by: rsalagame-nvidia <rsalagame-nvidia@users.noreply.github.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>1 parent 6489229 commit 74f6303
40 files changed
Lines changed: 1672 additions & 337 deletions
File tree
- nemo
- collections/llm/recipes
- precision
- lightning
- fabric
- pytorch/plugins
- run
- scripts/performance
- llm
- recommended_model_configs
- tests/collections/llm/recipes
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
322 | | - | |
323 | | - | |
324 | | - | |
| 322 | + | |
325 | 323 | | |
326 | 324 | | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | 325 | | |
332 | 326 | | |
333 | 327 | | |
| |||
397 | 391 | | |
398 | 392 | | |
399 | 393 | | |
400 | | - | |
| 394 | + | |
401 | 395 | | |
402 | 396 | | |
| 397 | + | |
403 | 398 | | |
404 | 399 | | |
405 | 400 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | 33 | | |
35 | 34 | | |
36 | 35 | | |
| |||
143 | 142 | | |
144 | 143 | | |
145 | 144 | | |
146 | | - | |
| 145 | + | |
147 | 146 | | |
148 | 147 | | |
149 | 148 | | |
150 | 149 | | |
151 | 150 | | |
152 | 151 | | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | 152 | | |
163 | 153 | | |
164 | 154 | | |
| |||
175 | 165 | | |
176 | 166 | | |
177 | 167 | | |
178 | | - | |
| 168 | + | |
179 | 169 | | |
180 | 170 | | |
181 | 171 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| |||
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| 103 | + | |
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
| |||
112 | 114 | | |
113 | 115 | | |
114 | 116 | | |
| 117 | + | |
115 | 118 | | |
116 | 119 | | |
117 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
| 108 | + | |
107 | 109 | | |
108 | 110 | | |
109 | 111 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| |||
122 | 123 | | |
123 | 124 | | |
124 | 125 | | |
| 126 | + | |
125 | 127 | | |
126 | 128 | | |
127 | 129 | | |
| |||
161 | 163 | | |
162 | 164 | | |
163 | 165 | | |
| 166 | + | |
164 | 167 | | |
165 | 168 | | |
166 | 169 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
| 161 | + | |
161 | 162 | | |
| 163 | + | |
162 | 164 | | |
163 | 165 | | |
164 | 166 | | |
| |||
179 | 181 | | |
180 | 182 | | |
181 | 183 | | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
182 | 199 | | |
183 | 200 | | |
184 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
30 | 35 | | |
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
34 | 39 | | |
35 | 40 | | |
36 | | - | |
| 41 | + | |
37 | 42 | | |
38 | 43 | | |
39 | 44 | | |
| |||
48 | 53 | | |
49 | 54 | | |
50 | 55 | | |
51 | | - | |
| 56 | + | |
52 | 57 | | |
53 | 58 | | |
54 | 59 | | |
55 | 60 | | |
56 | 61 | | |
57 | 62 | | |
58 | 63 | | |
59 | | - | |
| 64 | + | |
60 | 65 | | |
61 | 66 | | |
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
65 | 70 | | |
66 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
67 | 108 | | |
68 | 109 | | |
69 | 110 | | |
| |||
101 | 142 | | |
102 | 143 | | |
103 | 144 | | |
104 | | - | |
| 145 | + | |
105 | 146 | | |
106 | 147 | | |
107 | 148 | | |
| |||
274 | 315 | | |
275 | 316 | | |
276 | 317 | | |
277 | | - | |
| 318 | + | |
278 | 319 | | |
279 | 320 | | |
280 | 321 | | |
| |||
349 | 390 | | |
350 | 391 | | |
351 | 392 | | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
352 | 439 | | |
353 | 440 | | |
354 | 441 | | |
| |||
368 | 455 | | |
369 | 456 | | |
370 | 457 | | |
371 | | - | |
| 458 | + | |
372 | 459 | | |
373 | 460 | | |
374 | 461 | | |
| |||
386 | 473 | | |
387 | 474 | | |
388 | 475 | | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
389 | 523 | | |
0 commit comments