Skip to content

Conversation

@zianglih
Copy link

@zianglih zianglih commented Jan 28, 2026

@HumansAnd

This PR currently depends on #512 .

Example:

python scripts/run_qwen3_30b_a3b.py --no-enable-eval --hardware GB200 --num-gpus-per-node 8 --rollout-fp8 --extra-args "--debug-first-weight-sync /root/models/debug-weight-sync "

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zianglih, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the debugging capabilities for weight synchronization processes within the miles framework by introducing a dedicated feature to inspect the initial weight transfer from Megatron to SGLang. Concurrently, it integrates support for MXFP8 quantization, optimizing performance and memory usage, particularly for models running on Blackwell hardware. These changes aim to improve the reliability of weight handling and expand hardware compatibility for advanced quantization techniques.

Highlights

  • Debug Weight Sync Feature: Introduced a new command-line argument --debug-first-weight-sync that allows users to save the first Hugging Face checkpoint synced from Megatron to SGLang, compare it against a source checkpoint, and report any bitwise mismatches. This feature terminates the run after the comparison.
  • MXFP8 Quantization Support: Added comprehensive support for MXFP8 quantization, including a new quantizer_mxfp8.py module, integration into the quantize_params function, and exposure of mxfp8_group_quantize from SGLang utilities.
  • Blackwell Hardware Optimization: Updated Qwen scripts (run_qwen3_30b_a3b.py, run_qwen3_4b.py) to leverage MXFP8 quantization specifically for Blackwell (GB200/GB300) hardware, including automatic conversion of Hugging Face models to MXFP8 format and adjusted SGLang backend configurations for FP8 rollout.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable debug feature for weight synchronization and adds support for mxfp8 quantization. However, critical security vulnerabilities were identified, primarily related to insecure deserialization and command injection. Specifically, the use of torch.load() without weights_only=True in the new debugging utility could lead to arbitrary code execution from malicious checkpoints. Furthermore, several training scripts are vulnerable to command injection due to direct interpolation of user-supplied arguments into shell commands. Beyond these security concerns, suggestions were made to enhance code quality by refining exception handling, reducing code duplication, and cleaning up module exports. Addressing these security issues is paramount.

)
return safe_open(path, framework="pt", device="cpu")
if self.fmt == "bin":
obj = torch.load(path, map_location="cpu")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The use of torch.load() without weights_only=True is insecure as it relies on the pickle module, which can execute arbitrary code during deserialization. An attacker could provide a malicious checkpoint file that, when loaded for debugging or comparison, executes arbitrary commands on the system. It is highly recommended to use weights_only=True to restrict deserialization to safe types.

Suggested change
obj = torch.load(path, map_location="cpu")
obj = torch.load(path, map_location="cpu", weights_only=True)

Comment on lines 41 to 43
U.exec_command(
f"huggingface-cli download Qwen/{args.model_name}-FP8 --local-dir /root/models/{args.model_name}-FP8"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The model_name argument is directly interpolated into a shell command string without sanitization. This allows for command injection if an attacker can control the model_name parameter. For example, a model_name like ; touch /tmp/pwned would result in the execution of the injected command. Use shlex.quote() to sanitize any variables used in shell commands.

Comment on lines +48 to +50
U.exec_command(
f"python tools/convert_hf_to_mxfp8.py --model-dir /root/models/{args.model_name} --save-dir {mxfp8_path}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Similar to the previous finding, args.model_name and mxfp8_path (which is derived from args.model_name) are used in a shell command without sanitization, leading to a potential command injection vulnerability.


if args.rollout_fp8:
if args.rollout_fp8 and not use_blackwell_fp8:
U.exec_command(f"hf download Qwen/{args.model_name}-FP8 --local-dir /root/models/{args.model_name}-FP8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The model_name argument is directly interpolated into a shell command string without sanitization. This allows for command injection if an attacker can control the model_name parameter. Use shlex.quote() to sanitize any variables used in shell commands.

Comment on lines +60 to +62
U.exec_command(
f"python tools/convert_hf_to_mxfp8.py --model-dir /root/models/{args.model_name} --save-dir {mxfp8_path}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The model_name and mxfp8_path variables are used in a shell command without sanitization, which can lead to command injection if the input is manipulated.

__all__ = ["remove_padding", "quantize_param", "quantize_params_fp8", "quantize_params_compressed_tensors"]
__all__ = [
"remove_padding",
"quantize_param",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __all__ list includes quantize_param, but this function is not defined or imported in this module. This appears to be a pre-existing issue, but since this block is being modified, it's a good opportunity to correct it. Removing this line will prevent potential NameError exceptions and improve code clarity.

Comment on lines +23 to +73
# experts
expert_pattern = r"mlp.experts\.(.+)\.weight(\d+)"
match = re.match(expert_pattern, rest)
if match:
rest, expert_idx = match.groups()
if rest in [
"linear_fc1",
"linear_fc2",
]:
quantize_named_params = []
for converted_name, param in converted_named_params:
# skip bf16 weight_scale and input_scale
# TODO: find a clearer way.
if converted_name.endswith("_scale"):
continue
quantize_named_params.extend(_quantize_param(converted_name, param))

return quantize_named_params

# shared expert
shared_expert_pattern = r"mlp.shared_experts\.(.+)"
match = re.match(shared_expert_pattern, rest)
if match:
rest = match.groups()[0]
if rest in [
"linear_fc1.weight",
"linear_fc2.weight",
]:
quantize_named_params = []
for converted_name, param in converted_named_params:
quantize_named_params.extend(_quantize_param(converted_name, param))

return quantize_named_params

if rest in [
"self_attention.linear_proj.weight",
"self_attention.linear_qkv.weight",
"mlp.linear_fc1.weight",
"mlp.linear_fc2.weight",
# mla
"self_attention.linear_q_proj.weight",
"self_attention.linear_q_down_proj.weight",
"self_attention.linear_q_up_proj.weight",
"self_attention.linear_kv_down_proj.weight",
"self_attention.linear_kv_up_proj.weight",
]:
quantize_named_params = []
for converted_name, param in converted_named_params:
quantize_named_params.extend(_quantize_param(converted_name, param))

return quantize_named_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication in how quantization is applied for different layer types (experts, shared experts, and other linear layers). The logic to iterate over converted_named_params and call _quantize_param is repeated.

This could be refactored into a helper function to improve maintainability and readability. For example:

def _apply_quantization(converted_named_params, skip_scales=False):
    quantized_params = []
    for name, param in converted_named_params:
        if skip_scales and name.endswith("_scale"):
            continue
        quantized_params.extend(_quantize_param(name, param))
    return quantized_params

# ... inside quantize_params_mxfp8, you can then determine if quantization is needed
# and call the helper, e.g.:
# if should_quantize:
#     return _apply_quantization(converted_named_params, skip_scales=is_expert_layer)

Comment on lines +17 to +18
except Exception: # pragma: no cover - optional dependency
safe_open = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception for an optional import can hide other unexpected errors. It's better to catch the specific ImportError that occurs when the optional dependency is not installed.

Suggested change
except Exception: # pragma: no cover - optional dependency
safe_open = None
except ImportError: # pragma: no cover - optional dependency
safe_open = None

Comment on lines +191 to +196
except Exception as exc: # pragma: no cover - optional dependency
logger.warning(
"Cannot resolve HF repo id %s (huggingface_hub unavailable): %s",
path_or_repo,
exc,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, catching a broad Exception for an optional import can mask other issues. It's more precise to catch ImportError here as well.

Suggested change
except Exception as exc: # pragma: no cover - optional dependency
logger.warning(
"Cannot resolve HF repo id %s (huggingface_hub unavailable): %s",
path_or_repo,
exc,
)
except ImportError as exc: # pragma: no cover - optional dependency
logger.warning(
"Cannot resolve HF repo id %s (huggingface_hub unavailable): %s",
path_or_repo,
exc,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant