Skip to content

Add Qwen 3.5 recipes#2654

Open
cuichenx wants to merge 16 commits intomainfrom
chcui/qwen35_recipes
Open

Add Qwen 3.5 recipes#2654
cuichenx wants to merge 16 commits intomainfrom
chcui/qwen35_recipes

Conversation

@cuichenx
Copy link
Contributor

@cuichenx cuichenx commented Mar 5, 2026

What does this PR do?

Add Qwen3.5 VL (dense + MoE) bridge, provider, fine-tuning recipes, and example scripts for the full model family.

Changelog

  • Add Qwen35VLBridge (dense) and Qwen35VLMoEBridge for bidirectional HF ↔ Megatron checkpoint conversion with GDN + Gated Attention hybrid architecture support
  • Add Qwen35VLModelProvider / Qwen35VLMoEModelProvider with selective attention spec patching (_patch_standard_attention_specs) for mRoPE on standard attention layers only
  • Add fine-tuning recipes for 8 model sizes: 800M, 2B, 4B, 9B, 27B (dense) and 35B-A3B, 122B-A10B, 397B-A17B (MoE), with recommended TP/PP/EP defaults for SFT and LoRA
  • Add recipes/qwen_vl/__init__.py to register the new recipe module
  • Add example scripts: conversion.sh, inference.sh, slurm_sft.sh, slurm_peft.sh
  • Add unit tests (test_qwen35_vl_recipes.py) and functional tests (test_qwen35_vl_recipes_finetune.py)
  • Fix test_qwen35_vl_bridge.py flaky assertions

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Requires transformers >= 5.2.0. Import guards are in place for both dense and MoE transformers classes.

cuichenx added 2 commits March 2, 2026 10:06
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
@cuichenx cuichenx marked this pull request as draft March 5, 2026 00:05
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Chen Cui <chcui@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive fine-tuning support for Qwen3.5 Vision-Language models. It introduces parameterized SLURM scripts for SFT and PEFT training, a new recipe module with eight factory functions for different model sizes, validation logic for parallelism settings, and extensive test coverage. Scripts are updated to dynamically handle multiple model variants with appropriate tensor/pipeline/expert parallelism configurations.

Changes

Cohort / File(s) Summary
Conversion & Inference Scripts
examples/models/vlm/qwen35_vl/conversion.sh, examples/models/vlm/qwen35_vl/inference.sh
Added strict error handling and parameterized parallelism flags (EP, PP, TP). Conversion script supports multiple dense and MoE variants. Inference script dynamically sets EP based on model type (dense vs. MoE).
SLURM Training Scripts
examples/models/vlm/qwen35_vl/slurm_peft.sh, examples/models/vlm/qwen35_vl/slurm_sft.sh
New scripts for orchestrating PEFT (LoRA) and SFT training via SLURM. Both accept model size arguments, map to HF model names and recipes, configure container execution, build CLI overrides, and integrate with Weights & Biases logging.
Qwen3.5 VL Recipe Module
src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py
New module with eight public factory functions for sizes 800M–397B-A17B (dense and MoE). Shared _qwen35_vl_common builds ConfigContainer with model, training, optimizer, scheduler, PEFT, dataset provider selection, and validation. Supports dataset type selection (mock, preloaded, HF) with error handling.
Bridge & Provider Updates
src/megatron/bridge/models/qwen_vl/qwen35_vl_bridge.py, src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
Bridge now uses top-level HF config for embedding sharing instead of text_config default. Provider adds validate_parallelism() method to both dense and MoE variants, enforcing tensor_model_parallel_size ≤ num_query_groups.
Recipe Package Exports
src/megatron/bridge/recipes/qwen_vl/__init__.py
Added imports and all exports for eight new Qwen3.5-VL finetune config functions, grouped by density (dense and MoE).
Functional Tests
tests/functional_tests/recipes/test_qwen35_vl_recipes_finetune.py
New test class with four GPU-only SFT scenarios: nothing frozen, language frozen, vision+language frozen, and recompute. Includes autouse fixture to reset global microbatch calculator between tests.
Unit Tests
tests/unit_tests/recipes/qwen_vl/test_qwen35_vl_recipes.py, tests/unit_tests/models/qwen_vl/test_qwen35_vl_bridge.py
Comprehensive recipe unit tests with fake AutoBridge and model config doubles, parameterized coverage of 8 model sizes, SFT/PEFT scenarios, dataset type selection, and override propagation. Bridge tests updated with Mock(spec=[]) and torch_dtype attribute adjustments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #2613: Directly related—identical mock adjustments in test_qwen35_vl_bridge.py (Mock(spec=[]) and torch_dtype additions).
  • PR #2614: Related refactoring of VLM finetune recipe surface and shared SFT/PEFT helpers.

Suggested labels

Run CICD

Suggested reviewers

  • yaoyu-33
  • yashaswikarnati
  • chtruong814
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major changes (691 new lines, 8 model variants, SLURM scripts) with comprehensive tests (955 lines), but PR description lacks test results, coverage metrics, or validation information. Add PR description section documenting executed tests, their results, coverage of new recipe functions and model variants, performance validation, and test run outputs.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Qwen 3.5 recipes' is concise and directly reflects the primary objective of the PR, which is to introduce fine-tuning recipes for various Qwen3.5 VL model variants (dense and MoE) across multiple sizes.
Docstring Coverage ✅ Passed Docstring coverage is 88.14% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chcui/qwen35_recipes

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (1)
tests/functional_tests/recipes/test_qwen35_vl_recipes_finetune.py (1)

138-199: Prefer subprocess isolation for these functional training runs.

These tests delegate to run_pretrain_vl_recipe_test (see tests/functional_tests/recipes/utils.py, Lines 199-298), which initializes distributed state and runs real pretraining in-process. Moving each scenario to a subprocess will reduce inter-test state coupling and flakes.

As per coding guidelines, "tests/functional_tests/**/*.py: Use subprocess for functional tests that require process isolation".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional_tests/recipes/test_qwen35_vl_recipes_finetune.py` around
lines 138 - 199, The four functional tests (test_sft_nothing_frozen,
test_sft_language_model_frozen, test_sft_vision_and_language_frozen,
test_recompute) currently call run_pretrain_vl_recipe_test in-process and must
be converted to run each scenario in a separate subprocess to avoid shared
distributed state; replace the direct call to run_pretrain_vl_recipe_test with
logic that spawns a subprocess (e.g., subprocess.run or multiprocessing.Process
invoking a small runner module/function that imports run_pretrain_vl_recipe_test
and executes it with the same arguments), pass the same
config_func/recipe_name/parallelism_overrides/model_overrides/tmp_path inputs
(serialize as needed), wait for completion, and assert the subprocess exit code
is zero; ensure the new runner entrypoint and the test wrappers use unique
identifiers matching the existing test parameterization so behavior and
signatures remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/vlm/qwen35_vl/inference.sh`:
- Around line 26-33: The EP assignment in the MODEL_NAME case block currently
sets EP=4 for all MoE variants; update the case for Qwen3.5-35B-A3B,
Qwen3.5-122B-A10B, and Qwen3.5-397B-A17B to use per-model defaults (set EP to
the specific values used in the VL scripts instead of a single shared 4) by
modifying the MODEL_NAME case in inference.sh to assign the correct EP for each
model name (reference the MODEL_NAME case block and the EP variable).

In `@examples/models/vlm/qwen35_vl/slurm_peft.sh`:
- Around line 116-121: The script currently overwrites externally provided
CONTAINER_IMAGE and CONTAINER_MOUNTS with empty strings (symbols CONTAINER_IMAGE
and CONTAINER_MOUNTS), which breaks the later guard that checks for a container
image; change the assignment so you only set defaults when the variables are
unset or empty (e.g., use a conditional or shell parameter expansion to leave an
externally supplied value intact), and apply the same change to the mounts
handling and any subsequent logic that validates CONTAINER_IMAGE to ensure it
respects environment-supplied values.
- Around line 49-50: SBATCH output/error paths reference a logs/ directory that
may not exist when Slurm opens the files; either stop using a non-existent
subdirectory or ensure it is created before submission. Fix by changing the
SBATCH directives in the header (the lines with “#SBATCH
--output=logs/qwen35vl_lora_%j.out” and “#SBATCH
--error=logs/qwen35vl_lora_%j.err”) to write to the current directory (e.g.,
remove the "logs/" prefix or use "%x_%j.out" and "%x_%j.err") or ensure the logs
directory is created by the submitter before calling sbatch (add a pre-submit
mkdir -p logs in the wrapper that writes/submits this script rather than inside
the job script itself).

In `@examples/models/vlm/qwen35_vl/slurm_sft.sh`:
- Around line 1-15: Add strict shell options to the script to fail fast on
errors and unset variables: immediately after the shebang (#!/bin/bash) in
slurm_sft.sh enable "set -euo pipefail" and set a safe IFS (e.g., IFS=$'\n\t')
so command failures and unset variables cannot silently continue during SLURM
training runs.
- Around line 27-35: Commented parallelism recommendations in slurm_sft.sh are
inconsistent with the canonical recipe values in
src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py; update or remove the
mismatched lines (e.g., entries for 122B and 397B) so they reflect the recipe's
TP/PP/EP defaults. Open examples/models/vlm/qwen35_vl/slurm_sft.sh and either
correct the TP/PP/EP numbers to match the values defined in qwen35_vl.py or
replace the block with a short note directing users to the recipe file for
authoritative parallelism settings, ensuring the commented sizes (e.g., 27B,
35B, 122B, 397B) match the recipe's PP/EP/TP/EP entries. Make sure the text
explicitly references the recipe as the source of truth to avoid future drift.

In `@src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py`:
- Around line 354-357: The documented default pipeline-parallel size for 122B
full-SFT (text saying "PP=4") is inconsistent with the implementation which sets
pipeline_model_parallel_size to 6; update one to match the other: either change
the documentation block in qwen35_vl.py to "PP=6" to reflect the code, or change
the implementation variable pipeline_model_parallel_size (and any related
FULL_SFT config assignments) to 4 so the code matches the docs; ensure both the
default config comment block and the pipeline_model_parallel_size assignment
(and any other FULL_SFT-specific parallelism constants) are updated together.
- Around line 593-595: The TokenizerConfig is being wired to hf_path while
preprocessing uses _processor_model (set from tokenizer_model or hf_path),
causing config drift; update the TokenizerConfig instantiation (and any other
places that currently pass hf_path, e.g., around the TokenizerConfig creation at
the block using hf_path) to use _processor_model (or tokenizer_model if present)
instead so the tokenizer config and dataset preprocessing use the same value
(ensure both the dataset preprocessing branch that relies on _processor_model
and the TokenizerConfig/tokenizer creation use the same symbol:
_processor_model).

In `@tests/functional_tests/recipes/test_qwen35_vl_recipes_finetune.py`:
- Around line 110-199: Add an explicit test category marker (e.g.,
pytest.mark.integration or pytest.mark.system) to this module/class so tests are
properly categorized; for example, set a module-level pytestmark or decorate the
TestQwen35VLFinetuneRecipes class with pytest.mark.integration. Update the
file-level or class-level marker near TestQwen35VLFinetuneRecipes so the GPU
gating remains (run_only_on stays on individual tests) but the module/class also
has the integration/system pytest.mark.
- Around line 113-127: The fixture _reset_microbatch_calculator only destroys
the leaked calculator after each test; change it to also clear any existing
global state before tests start by checking _GLOBAL_NUM_MICROBATCHES_CALCULATOR
and calling destroy_num_microbatches_calculator() prior to yield (in addition to
the existing post-yield cleanup). Update the fixture body that references
_GLOBAL_NUM_MICROBATCHES_CALCULATOR and destroy_num_microbatches_calculator so
both pre- and post-yield paths ensure the global is None.

In `@tests/unit_tests/recipes/qwen_vl/test_qwen35_vl_recipes.py`:
- Around line 98-102: Rename unused parameters to start with an underscore to
silence ruff ARG002/ARG004: in from_hf_pretrained change hf_path to _hf_path and
**kwargs to **_kwargs; in to_megatron_provider change self to _self (or keep
self and prefix unused params) and load_weights to _load_weights so the intent
is explicit and linters stop flagging _FakeAutoBridge and _FakeModelCfg factory
helpers.
- Around line 132-756: Add a pytest category marker so these are discoverable as
unit tests: declare a module-level marker variable `pytestmark =
pytest.mark.unit` near the top of the file (ensure `pytest` is imported), rather
than decorating every function; this will mark all functions such as
`test_each_qwen35_vl_recipe_builds_config`,
`test_qwen35_vl_dataset_type_selection`, `test_sft_nothing_frozen`, etc., as
unit tests for CI selection.

---

Nitpick comments:
In `@tests/functional_tests/recipes/test_qwen35_vl_recipes_finetune.py`:
- Around line 138-199: The four functional tests (test_sft_nothing_frozen,
test_sft_language_model_frozen, test_sft_vision_and_language_frozen,
test_recompute) currently call run_pretrain_vl_recipe_test in-process and must
be converted to run each scenario in a separate subprocess to avoid shared
distributed state; replace the direct call to run_pretrain_vl_recipe_test with
logic that spawns a subprocess (e.g., subprocess.run or multiprocessing.Process
invoking a small runner module/function that imports run_pretrain_vl_recipe_test
and executes it with the same arguments), pass the same
config_func/recipe_name/parallelism_overrides/model_overrides/tmp_path inputs
(serialize as needed), wait for completion, and assert the subprocess exit code
is zero; ensure the new runner entrypoint and the test wrappers use unique
identifiers matching the existing test parameterization so behavior and
signatures remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09eb4a86-1e7b-4867-8806-73d971b7578d

📥 Commits

Reviewing files that changed from the base of the PR and between 394037d and cfd9d90.

📒 Files selected for processing (11)
  • examples/models/vlm/qwen35_vl/conversion.sh
  • examples/models/vlm/qwen35_vl/inference.sh
  • examples/models/vlm/qwen35_vl/slurm_peft.sh
  • examples/models/vlm/qwen35_vl/slurm_sft.sh
  • src/megatron/bridge/models/qwen_vl/qwen35_vl_bridge.py
  • src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
  • src/megatron/bridge/recipes/qwen_vl/__init__.py
  • src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py
  • tests/functional_tests/recipes/test_qwen35_vl_recipes_finetune.py
  • tests/unit_tests/models/qwen_vl/test_qwen35_vl_bridge.py
  • tests/unit_tests/recipes/qwen_vl/test_qwen35_vl_recipes.py

Signed-off-by: Chen Cui <chcui@nvidia.com>
@cuichenx cuichenx marked this pull request as ready for review March 5, 2026 06:44
cuichenx added 2 commits March 4, 2026 22:51
- Add `set -euo pipefail` to slurm_sft.sh and slurm_peft.sh for fail-fast behavior
- Remove `logs/` prefix from SBATCH output/error paths (directory doesn't exist at job start)
- Remove now-unnecessary `mkdir -p logs` calls
- Fix 122B-A10B parallelism comment in slurm_sft.sh to match recipe (TP=2, PP=6, EP=8)
- Add `pytestmark = pytest.mark.integration` to functional test module
- Reset microbatch calculator both before and after each test in fixture

Signed-off-by: Chen Cui <chcui@nvidia.com>
Made-with: Cursor
cuichenx added 2 commits March 5, 2026 13:55
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
cuichenx added 2 commits March 5, 2026 16:25
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
cuichenx added 2 commits March 5, 2026 16:55
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
@cuichenx
Copy link
Contributor Author

cuichenx commented Mar 6, 2026

/ok to test 63395f9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant