[OMNIML-2917] export layer config using actual prefix instead of hard… #470
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…coded model.layers
This is change set 1 from working on OMNIML-2917.
When we export quantized model to hf unified format, we hard code modules with a "model.layers" prefix. This is something completely odd and unnecessary. The major problem is that we may output some quant config that have completely wrong prefix, such as in exclude_modules. For example, for the Qwen3-VL models, there are 2 transformer blocks: language_model and vision. Before this change, for language_model, we will output:
model.layers.language_model.layers.0.xxx
model.layers.language_model.layers.1.xxx
The prefixes are completely wrong therefore when inference systems such as vllm try to read the quant config, it will fail.
Fix it by simply use the prefixes from parsing the model itself.
What does this PR do?
Type of change: ?
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information