Skip to content

Remove all nemo2 imports from old repo#628

Open
oyilmaz-nvidia wants to merge 18 commits intomainfrom
fix/ruff-linting
Open

Remove all nemo2 imports from old repo#628
oyilmaz-nvidia wants to merge 18 commits intomainfrom
fix/ruff-linting

Conversation

@oyilmaz-nvidia
Copy link
Contributor

No description provided.

oyilmaz-nvidia and others added 4 commits March 3, 2026 16:41
… dynamic inference

- Add nemo_deploy/llm/inference/nemo_utils.py which vendors standalone NeMo
  utilities (MCoreTokenizerWrappper, ckpt path helpers, constants) with no
  dependency on the nemo package, and re-exports the complex NeMo types
  (GPTConfig, T5Config, io, set_modelopt_spec_if_exists_in_ckpt) under a
  single HAVE_NEMO guard.
- Remove direct from nemo.* imports from inference_base.py and tron_utils.py;
  both files now import from the local nemo_utils module instead.
- Fix AttributeError in create_mcore_engine: GPTInferenceWrapper was called
  with (model, inference_context) but the deployed Megatron-LM API expects
  (model, inference_wrapper_config, inference_context). Add InferenceWrapperConfig
  built from model.config attributes; MCoreEngine then internally creates a
  DynamicInferenceContext and switches to DynamicInferenceEngine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix import ordering in test_inference_base.py (ruff I001)
- Remove direct nemo imports from inference_base.py, nemo_utils.py, tron_utils.py
- Add nemo_io.py with standalone load_context implementation
- Remove HAVE_NEMO guard checks now that nemo is no longer a static dependency
- Update tests to remove HAVE_NEMO patches and use types.SimpleNamespace
- Remove unused StaticInferenceContext import
- Use inner model config for hidden_size/params_dtype instead of outer model
- Add buffer_size_gb param to create_mcore_engine and MegatronLLMDeployable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

oyilmaz-nvidia and others added 9 commits March 5, 2026 07:36
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Move the InferenceWrapperConfig import from module level into the body
of create_mcore_engine, so pytest can collect test_inference_base.py
in the nemo:26.02 container where that megatron-core module path does
not exist. GPU-only tests that call create_mcore_engine are skipped in
CPU CI, so the import never executes there.
Move the InferenceWrapperConfig import from module level into the body
of create_mcore_engine, so pytest can collect test_inference_base.py
in the nemo:26.02 container where that megatron-core module path does
not exist. GPU-only tests that call create_mcore_engine are skipped in
CPU CI, so the import never executes there.
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia oyilmaz-nvidia requested a review from a team as a code owner March 9, 2026 05:46
oyilmaz-nvidia and others added 3 commits March 9, 2026 11:31
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test d8ac4f5

@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test d8ac4f5

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test c398fee

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Contributor Author

/ok to test f63d2c3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants