-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Add Kimi-K2.5 support #19170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Kimi-K2.5 support #19170
Conversation
Yep, seems something is not quite right yet. |
| **cfg["media_proc_cfg"], | ||
| } | ||
| # merge configs | ||
| self.preprocessor_config = {**self.preprocessor_config, **cfg} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.preprocessor_config is empty at this point, so not really necessary to merge, but will allow it for consistent looks.
Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key
|
Vision is working now for images, uploaded MMPROJ files to my repo. @ngxson I left comments about the places that confused me the most.
I think the rest of the changes are pretty sane. |
|
Great work AesSedai! I just downloaded the BF16 for mmproj. Is there any reason to get anything higher than Q8_0? What about ctk/ctv is there any good reason to run them in f16 instead of lower since the model is INT4? |
|
@segmond Thanks, for the MMPROJ some cards are more or less compatible with different versions. The BF16's don't work very well on my 3090s IIRC. The Q8_0 should be fine to use quality-wise. Regarding CTK / CTV, you do not want to quantize the cache on this model at all. The model weight quantization is different than the cache quantization. MLA / GQA already comes with some pretty severe compression on the cache so by further quantizing it you'll degrade it more. Besides, the context is very lightweight anyways. Something like 165k context in FP16 is like ballpark 10GB or so. |





Adding support for https://huggingface.co/moonshotai/Kimi-K2.5
Since this model includes compressed-tensors (INT4 for the conditional experts), I moved the
dequant_modelto theprepare_tensorscall at @compilade's suggestion. The model conversion fails otherwise because thequantization_configis nested under thetext_configin the config.json.Additionally, this model adds some new keys for the vision tower, prefixed as
vt_, and the preprocessor_config.json has the expected fields nested in themedia_proc_cfgkey.This PR does not include the "hacked" Q4_0 changes by @jukofyork, referred to in this comment.
I have added a first pass at vision support, heavily aided by LLM assistance. I entirely expect @ngxson to tear it to shreds or call me a dummy and show me an easier way to add that vision support :)