Skip to content

fix: [Memory Estimator] Fix text config attribute fetching for new multimodal models (e.g., Qwen-3-VL)#67

Merged
ISEEKYAN merged 2 commits intoISEEKYAN:mainfrom
LZY2275:main
Jan 20, 2026
Merged

fix: [Memory Estimator] Fix text config attribute fetching for new multimodal models (e.g., Qwen-3-VL)#67
ISEEKYAN merged 2 commits intoISEEKYAN:mainfrom
LZY2275:main

Conversation

@LZY2275
Copy link
Contributor

@LZY2275 LZY2275 commented Jan 20, 2026

Background
Most modern multimodal models extended from text models (e.g., Qwen-3-VL) encapsulate core text-related attributes (vocab_size, max_position_embeddings, hidden_size, etc.) in hf_config.text_config. However, some older multimodal models (e.g., Qwen-2-vl-7B-Instruct) keep these attributes at the top level of the config.

The original code only fetched attributes from the top-level config, which worked for older models like Qwen-2-vl but failed for newer ones like Qwen-3-VL (since their text attributes are nested in text_config).

Changes Made
Add support for text_config-nested attributes:

  • Fetch vocab_size and max_position_embeddings from either top-level config or text_config sub-object

@LZY2275 LZY2275 changed the title fix: Fix text config attribute fetching for new multimodal models (e.g., Qwen-3-VL) fix: [Memory Estimator] Fix text config attribute fetching for new multimodal models (e.g., Qwen-3-VL) Jan 20, 2026
@ISEEKYAN
Copy link
Owner

thank you @LZY2275 for your interests in the memory estimator, I did not tested this on qwen-vl models and no vision part is developed. I think now only the language part can be precisely estimated, do you have any estimation results on qwen-vl? Do you think it necessary to precisely estimate the vision part? Feel free to discuss with me!

@ISEEKYAN ISEEKYAN merged commit 63bd59a into ISEEKYAN:main Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants