Created ReplicateKVHeadTransform to integrate KV-heads replication module within Qefficient library. by quic-dhirajku · Pull Request #625 · quic/efficient-transformers

quic-dhirajku · 2025-11-19T06:09:54Z

The Transform enables KV-head replication for CausalLMs and VLMs as well.
The feature is enabled by passing n_kv_head_repeat parameter during initialization of the QEff wrapper class for the corresponding model.
n_kv_head_repeat param acts as the multiplier for the number of repeats to be done to original count of KV heads. This operation also causes the config and the hash params of the respective model to update the num_key_value_heads parameter and add a paramter orig_kv_heads to it; It allows us to export the same model with different number of kv_heads without causing a hash conflict.
Added tests for both CausalLMs and VLMs with this functionality to compare outputs of Pytorch HF model and the AIC model. Two new optional paramters n_kv_head_repeat and test_kv_replicate are added for testing purpose. Setting test_kv_replicate to True performs a KV-head replication of every model such that the number of KV-heads and attention heads becomes equal. This was done to ensure tests don't fail due to misalignment issues when we simply repeat num_key_value_heads twice and thus cause a divisibility error on hum_heads.

quic-rishinr · 2025-11-20T08:55:13Z

@ochougul @quic-amitraj please review

…dule within Qefficient library. The Transform enables KV-head replication for CausalLMs and VLMs as well. The feature is enabled by passing n_kv_head_repeat parameter during initialization of the QEff wrapper class for the corresponding model. n_kv_head_repeat param acts as the multiplier for the number of repeats to be done to original count of KV heads. This operation also causes the config and the hash params of the respective model to update the num_key_value_heads parameter and add a paramter orig_kv_heads to it; It allows us to export the same model with different number of kv_heads without causing a hash conflict. Also added tests for both CausalLMs and VLMs with this functionality to compare outputs of Pytorch HF model and the AIC model. Two new optional paramters n_kv_head_repeat and test_kv_replicate are added for testing purpose. Setting test_kv_replicate to True performs a KV-head replication of every model such that the number of KV-heads and attention heads becomes equal. This was done to ensure tests don't fail due to misalignment issues when we simply repeat num_key_value_heads twice and thus cause a divisibility error on hum_heads. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

… Doing so would prevent any issues during Transforms when we don't wish to apply it. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…orm. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…changes to repeat Bias factor appropriately on quantized layers. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

ochougul

Write a test that makes sure onnx hash is different when different number of kv heads are passed.

ochougul · 2026-02-03T07:55:18Z

+        # InternVL causes an error if we pass the num_kv_heads_repeat parameter
+        num_kv_heads_repeat = kwargs.pop("num_kv_heads_repeat", 1)


ochougul · 2026-02-03T07:58:48Z

+        self.model, replicate_kv_transformed = ReplicateKVHeadTransform.apply(self.model, **kwargs)
+        if replicate_kv_transformed:
+            self.hash_params["config"] = model.config.to_diff_dict()


better add it to _pytorch_transforms if we are always going to call it.

ochougul · 2026-02-03T08:02:13Z

+        if replicate_kv_transformed:
+            self.lang_model.hash_params["config"] = model.config.to_diff_dict()
+            self.vision_model.hash_params["config"] = model.config.to_diff_dict()


don't we already dump config somewhere? in _generate_export_hash?
You can just always add repeat_kv_heads value to self.hash_params which will be 1 if nothing is passed.

ochougul · 2026-02-03T08:07:59Z

    }


+class ReplicateKVHeadTransform:


Make this inherit ModuleMutatorTransform
You may need to implement mutate method which is similar to apply here

quic-hemagnih · 2026-02-24T09:26:31Z

@quic-dhirajku Please take this PR post 595

quic-mamta · 2026-03-17T06:31:02Z

+                layer.bias.data = torch.repeat_interleave(
+                    layer.bias.data.view(orig_kv_heads, head_dim), repeat, 0
+                ).view(new_kv_heads * head_dim)
+            if layer.bias is not None:


lines 782-785 are repeated here, please remove

quic-dhirajku requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners November 19, 2025 06:09

quic-dhirajku force-pushed the replicate_kv_heads_transform branch from 1dfdea6 to 3c90390 Compare November 19, 2025 06:15

quic-rishinr assigned ochougul, quic-amitraj and quic-dhirajku Nov 25, 2025

quic-dhirajku force-pushed the replicate_kv_heads_transform branch from 8c4a1fc to e502542 Compare January 19, 2026 09:01

quic-dhirajku added 5 commits January 21, 2026 05:35

Modified modeling_auto to use 1 as default value of n_kv_head_repeat.…

df4ffad

… Doing so would prevent any issues during Transforms when we don't wish to apply it. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Minor patch to handle custom config case

bd370b8

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Updated list of model architectures on which we can apply this transf…

22dc3e4

…orm. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

minor fix for ruff check

08032e1

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-dhirajku force-pushed the replicate_kv_heads_transform branch from 870cc8d to 08032e1 Compare January 21, 2026 05:35

vbaddi requested changes Jan 21, 2026

View reviewed changes

Comment thread tests/transformers/models/image_text_to_text/test_image_text_to_text_models.py Outdated

Renamed n_kv_head_repeat to num_kv_heads_repeat. Added the transform …

fb21640

…changes to repeat Bias factor appropriately on quantized layers. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

ochougul requested changes Feb 3, 2026

View reviewed changes

quic-hemagnih mentioned this pull request Feb 24, 2026

Modified qwen_2.5 modelling file to allow replicate_kv_script to work for custom num_kv_heads. #595

Closed

quic-mamta reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created ReplicateKVHeadTransform to integrate KV-heads replication module within Qefficient library.#625

Created ReplicateKVHeadTransform to integrate KV-heads replication module within Qefficient library.#625
quic-dhirajku wants to merge 6 commits intoquic:mainfrom
quic-dhirajku:replicate_kv_heads_transform

quic-dhirajku commented Nov 19, 2025

Uh oh!

quic-rishinr commented Nov 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

ochougul left a comment

Uh oh!

ochougul Feb 3, 2026

Uh oh!

ochougul Feb 3, 2026

Uh oh!

ochougul Feb 3, 2026

Uh oh!

ochougul Feb 3, 2026

Uh oh!

quic-hemagnih commented Feb 24, 2026

Uh oh!

quic-mamta Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

		# InternVL causes an error if we pass the num_kv_heads_repeat parameter
		num_kv_heads_repeat = kwargs.pop("num_kv_heads_repeat", 1)

Conversation

quic-dhirajku commented Nov 19, 2025

Uh oh!

quic-rishinr commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ochougul left a comment

Choose a reason for hiding this comment

Uh oh!

ochougul Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ochougul Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ochougul Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ochougul Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

quic-hemagnih commented Feb 24, 2026

Uh oh!

quic-mamta Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

quic-rishinr commented Nov 20, 2025 •

edited

Loading