Mdp #1849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

shanmugamr1992 wants to merge 7 commits into update-megatron-lm-5247a1f from mdp

Contributor

shanmugamr1992 commented Jan 29, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

yaoyu-33 and others added 6 commits

January 15, 2026 17:41


          Update Megatron submodule pins


          Bump Megatron submodules

6d03b70


          fix CACHED_DEPENDENCIES

82d1059

Signed-off-by: Anna Shors <ashors@nvidia.com>


          API updates

fde1997

Signed-off-by: Anna Shors <ashors@nvidia.com>


          Working version of qwen30 with uvm and cudagraphs


          Latest changes

48cee6e

shanmugamr1992 requested review from a team as code owners

January 29, 2026 22:05


          Latest changes

a97faac

terrykong reviewed

View reviewed changes

Contributor

terrykong left a comment

i know this is a draft and things are in flux, but leaving some feedback from a first pass. this definitely needs more passes.

cc @ashors1 @ananthsub @yaoyu-33 for design of mcore inf in the policy worker

my opinion is we now need to maybe move the inference to a mixin so that the regular training policy methods are clearly separated from the generation ones because now there are many. After separating, the MegatronInferenceMixin can be added as one of the parent classes (multi-ineheritance). right now the megatron policy worker class has ballooned quite significantly and is very intimidating.

a general feedback is i would love to have more boilerplate get pushed into megatron inference. the amount of code change needed seems like a lot and we seem to need to set state that i would imagine mcore inference APIs might handle (like the local/none thing)

nemo_rl/models/policy/workers/megatron_policy_worker.py

               from megatron.core.process_groups_config import ProcessGroupCollection
               from megatron.core.rerun_state_machine import get_rerun_state_machine
+              from megatron.core.transformer import MegatronModule
+              # Note: delete_cuda_graphs is called internally by toggle_cuda_graphs when reset_cuda_graphs=True

Contributor

terrykong Feb 2, 2026

nit: maybe move this comment to where toggle is called

nemo_rl/models/policy/workers/megatron_policy_worker.py

Comment on lines 161 to 141

Contributor

terrykong Feb 2, 2026

why were these dropped?

nemo_rl/models/policy/workers/megatron_policy_worker.py

Comment on lines +529 to +537

+                      self.dynamic_inference_engine = None
+                      self.inference_client = None
+                      self.inference_context = None
+                      self.inference_wrapped_model = None
+                      self._inference_engine_initialized = False
+                      self._inference_engine_paused = True  # Start paused since we begin with training
+                      self._inference_loop = None  # Event loop for inference operations
+                      self._inference_thread = None  # Thread running the event loop

Contributor

terrykong Feb 2, 2026

high level q: why does mcore inference require so much book keeping by the application?

nemo_rl/models/policy/workers/megatron_policy_worker.py

+                      model_cfg = cfg_from_pretrained.model
+                      cfg_from_pretrained.logger = LoggerConfig()
+                      # Ensure make_vocab_size_divisible_by has a reasonable default (128 is standard)

Contributor

terrykong Feb 2, 2026

@ananthsub @yaoyu-33 can you comment on this?

i feel this is potentially an unsafe thing to default especially if we don't currently have a way to chop off the vocab during HF export.

this is good to have in general though b/c i know vocab parallel will have issues w/o but prob not a good thing to enable globally yet

nemo_rl/models/policy/workers/megatron_policy_worker.py

+                      # Setting moe_router_dtype to higher precision (e.g. fp64) can improve numerical stability,
+                      # especially when using many experts.
+                      model_cfg.moe_router_dtype = self.cfg["megatron_cfg"]["moe_router_dtype"]
+                      model_cfg.moe_token_dispatcher_type = "alltoall"

Contributor

terrykong Feb 2, 2026

what is this hard coded? can this be plumbed in

RL/nemo_rl/models/policy/__init__.py

Line 193 in dacac7e

moe_token_dispatcher_type: str

?

nemo_rl/models/policy/workers/megatron_policy_worker.py

+                      unified_memory_level = mcore_generation_config["unified_memory_level"]
+                      model_config = self.model.config
+                      # Enable CUDA graphs for inference
+                      model_config.cuda_graph_impl = "local"

Contributor

terrykong Feb 2, 2026

another place it's set from local (vs none). is this potentially error prone since we have many places where this needs to be set?

nemo_rl/models/policy/workers/megatron_policy_worker.py

+                      self._inference_engine_paused = True
+                      print(f"[Rank {self.rank}] paused inference engine")
+                  async def pause_engine(self):

Contributor

terrykong Feb 2, 2026

should this be protected? it doesn't appear to be something the user needs to be aware of

Suggested change

      
                async def pause_engine(self):
          
                async def _pause_engine(self):

nemo_rl/models/policy/workers/megatron_policy_worker.py

+                      self._inference_engine_paused = False
+                      print(f"[Rank {self.rank}] Resumed inference engine")
+                  async def resume_engine(self):

Contributor

terrykong Feb 2, 2026

same protected comment

nemo_rl/models/policy/workers/megatron_policy_worker.py


		self._inference_engine_paused = False

		def pause_inference_engine(self):

Contributor

terrykong Feb 2, 2026

shall we use the nomenclature sleep/wake to match the other inference engines at least in the nemo-rl API? i think it's okay that mcore inf calls it pause/resume, but i do think that could potentially be a little confusing if we ever do partial rollouts or in-flight weight updates when n actual pause may be needed

nemo_rl/models/policy/workers/megatron_policy_worker.py

+                          The dynamic inference engine for use during inference.
+                      """
+                      # Get the language module (unwrap from precision wrappers if needed)
+                      lang_module = (

Contributor

terrykong Feb 2, 2026

_get_lang_module ?

yfw force-pushed the update-megatron-lm-5247a1f branch from 0c7d97e to e970643 Compare

February 9, 2026 18:42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet