Skip to content

Conversation

@AesSedai
Copy link

@AesSedai AesSedai commented Jan 29, 2026

Adding support for https://huggingface.co/moonshotai/Kimi-K2.5

Since this model includes compressed-tensors (INT4 for the conditional experts), I moved the dequant_model to the prepare_tensors call at @compilade's suggestion. The model conversion fails otherwise because the quantization_config is nested under the text_config in the config.json.

Additionally, this model adds some new keys for the vision tower, prefixed as vt_, and the preprocessor_config.json has the expected fields nested in the media_proc_cfg key.

This PR does not include the "hacked" Q4_0 changes by @jukofyork, referred to in this comment.

I have added a first pass at vision support, heavily aided by LLM assistance. I entirely expect @ngxson to tear it to shreds or call me a dummy and show me an easier way to add that vision support :)

@AesSedai AesSedai requested a review from CISC as a code owner January 29, 2026 06:28
@github-actions github-actions bot added the python python script changes label Jan 29, 2026
@AesSedai
Copy link
Author

AesSedai commented Jan 29, 2026

image

I was testing the VL locally with this image, and the thinking block shows the oddness:

This is an interesting image. The user is asking me to identify what's in the picture. Let me look carefully.

The image shows what appears to be a furry animal resting on a tree branch. It looks like it has grayish-brown fur, and there are distinctive white markings on its face/ears that look like little tufts or tassels. The animal appears to be sleeping or resting with its face tucked in.

Based on the appearance - specifically the white ear tufts and the overall body shape - this looks like a **Common Marmoset** (Callithrix jacchus), also known as the White-tufted-ear Marmoset. These are small New World monkeys native to Brazil. The white ear tufts are their most distinctive feature.

However, I should also consider if it could be something else. Could it be a raccoon? No, the body shape is wrong and raccoons don't have white ear tufts like that. Could it be a type of lemur? Some lemurs have ear tufts, but the body shape here looks more like a marmoset or tamarin. The Common Marmoset fits best with those prominent white ear tufts.

The animal is curled up on a branch, likely sleeping or resting, with its tail visible (appears to be a long, furry tail). The background is green foliage, suggesting a natural or semi-natural habitat.

So my answer should identify it as a Common Marmoset or White-tufted Marmoset, pointing out the distinctive ear tufts and the fact that it appears to be resting on a branch.

Wait, looking closer at the image... is it possible this is multiple animals? Let me check. The pattern seems to show overlapping forms. Actually, looking carefully, this might be TWO marmosets huddled together? Or is it just one curled up?

Actually, looking at the fur pattern and the way the bodies are arranged, this appears to be two common marmosets huddled together on the branch, sleeping. I can see what looks like two heads with white ear tufts close together. Yes, that makes sense - they often sleep huddled together for warmth and security.

So the answer is: Common Marmosets (White-tufted Marmosets), likely two of them, sleeping or resting on a branch.

I should mention:
- Species: Common Marmoset (Callithrix jacchus) or White-tufted-ear Marmoset
- Distinctive white ear tufts
- They appear to be sleeping/resting on a tree branch
- Possibly two individuals huddled together
- Native to Brazil
- Small New World monkeys

This seems accurate based on the visual evidence.

Specifically, this line lead me to believe there's a problem:

The pattern seems to show overlapping forms

and I've seen that in other local generations with this image too, eg:

Actually, looking at the pattern again, this looks like it might be multiple images stacked or a glitch effect? No, looking closely, it appears to be a single image of a raccoon resting on a branch, but there might be some artifacting or the image is showing the raccoon in a specific pose. The repetition pattern suggests it might be a glitch or the image is tiled strangely? No, actually, looking carefully, it seems like the image might be corrupted or displaying incorrectly, or it's an artistic effect?

Wait, no, looking at the image again - it appears to be a raccoon lying on a tree branch, and the image quality or compression creates a somewhat repetitive pattern, but it's clearly a raccoon. The distinctive facial markings (black mask, white nose/forehead), the gray fur, the bushy tail possibly visible - this is definitely a Common Raccoon (Procyon lotor).

In comparison, this is a bit of the thinking from the OpenRouter API for Kimi-K2.5:

The user wants to know what's in the picture. Looking at the image, it's clearly a raccoon lying on a tree branch. The raccoon has the distinctive black mask around its eyes, gray fur, and is draped over the branch in a relaxed or tired pose. The background shows a forest or wooded area with green foliage.

This is a straightforward image description task. I should identify the animal correctly as a raccoon and describe what it's doing (resting on a branch). I don't need to overcomplicate this or add fictional elements since the user asked a direct question about the image content.

Vastly different feel in the confidence of its answer based on what the VL sees.

@CISC
Copy link
Collaborator

CISC commented Jan 29, 2026

While the mmproj conversion appears to work and the model loads and can decode images, I've got some weird output when using the vision component that leads me to believe there is a conversion issue somewhere or some other missing component. I think I need some review from @ngxson to help get it working correctly.

Yep, seems something is not quite right yet.

**cfg["media_proc_cfg"],
}
# merge configs
self.preprocessor_config = {**self.preprocessor_config, **cfg}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.preprocessor_config is empty at this point, so not really necessary to merge, but will allow it for consistent looks.

Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key
@AesSedai
Copy link
Author

AesSedai commented Feb 1, 2026

Vision is working now for images, uploaded MMPROJ files to my repo.

@ngxson I left comments about the places that confused me the most.

  1. the resize_position_embeddings_3d - might be combinable with the clip_graph::resize_position_embeddings if the tensors are handled better?
  2. clip_graph::build_rope_2d_interleaved roughly makes sense to me from a 10,000 foot view, but I was thinking that maybe zipping or transposing the pos_w / pos_h tensors might make the square peg fit in the round hole with a bit of a different math approach?
  3. I have no idea why inp = ggml_add(ctx0, inp, learned_pos_embd); wasn't working in build_vit by passing in the learned_pos_embed.

I think the rest of the changes are pretty sane.

@AesSedai
Copy link
Author

AesSedai commented Feb 1, 2026

Some test samples that I ran locally. A very basic OCR test:
chatlog (13)

A more complicated OCR test that includes transcription:
chatlog (14)

And the interpretation of the raccoon photo from earlier:
chatlog (15)

The two things that concern me still are:

  • Image 1: there is a mention of a "vertical black line/border on the left side" in the thinking, plus mention of a "Border: There is a thick vertical black line running along the left side of the image" in the response. The image padding is black, so perhaps something related to that?
  • Image 3: In the thinking, item 6 mentions: "There's a visible seam or line in the image, suggesting it might be a composite or stitched image, or perhaps just an artifact". There isn't a seam like that, so I'm concerned.

@AesSedai AesSedai marked this pull request as ready for review February 1, 2026 11:27
@AesSedai AesSedai requested a review from ngxson as a code owner February 1, 2026 11:27
@segmond
Copy link

segmond commented Feb 1, 2026

Great work AesSedai! I just downloaded the BF16 for mmproj. Is there any reason to get anything higher than Q8_0? What about ctk/ctv is there any good reason to run them in f16 instead of lower since the model is INT4?

@segmond
Copy link

segmond commented Feb 1, 2026

I'm happy to report that I have tested this branch and it works great. I ran it with the Q4_X quant and my ctk/ctv at q8_0. Using the BF16 mmproj.

Screen Shot 2026-02-01 at 12 31 36 PM

@AesSedai
Copy link
Author

AesSedai commented Feb 1, 2026

@segmond Thanks, for the MMPROJ some cards are more or less compatible with different versions. The BF16's don't work very well on my 3090s IIRC. The Q8_0 should be fine to use quality-wise.

Regarding CTK / CTV, you do not want to quantize the cache on this model at all. The model weight quantization is different than the cache quantization. MLA / GQA already comes with some pretty severe compression on the cache so by further quantizing it you'll degrade it more. Besides, the context is very lightweight anyways. Something like 165k context in FP16 is like ballpark 10GB or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants