Add Support for Frequency Penalties in On Device Sampling #523

quic-sanising · 2025-07-24T22:06:52Z

✨ Add Frequency Penalty Support to On Device Sampling

This PR adds support for the frequency_penalty parameter in On Device Sampling for QEffForCausalLM models. This parameter adjusts token selection based on how often tokens have already appeared in the generated output:

Positive values discourage repetition and promote diversity.
Negative values encourage repetition.
Zero disables the penalty.

The implementation tracks token frequencies directly on the QAIC device using optimized scratch buffers, ensuring minimal overhead and maintaining high throughput. This feature integrates seamlessly with the existing include_sampler=True workflow and complements other supported strategies like repetition and presence penalties.

Signed-off-by: quic-sanising <[email protected]>

Signed-off-by: sanising <[email protected]>

Signed-off-by: quic-sanising <[email protected]>

quic-sanising · 2025-07-24T23:29:59Z

Depends on PR #463.

Signed-off-by: quic-sanising <[email protected]>

quic-sanising · 2025-09-18T19:19:56Z

QEfficient/transformers/sampler/sampler.py

-        scatter_values,
+        torch.ones(last_accepted_output_tokens.shape, dtype=torch.bool),
    )
+    gather_values = past_presence_penalty_buffer[batch_index, last_accepted_output_tokens]


For improved performance, I intended to use CtxGatherFuncCB3D but it doesn't work as last_accepted_output_tokens is a tensor of shape (batch_size, seq_len) whereas the function expects it to be of shape (batch_size, 1). Please let me know if there is a workaround.

quic-rishinr · 2025-10-17T10:26:15Z

@quic-sanising is it ready for review? Can you rebase the PR?

quic-sanising and others added 17 commits June 18, 2025 13:38

Add sampler transform test

8417d8f

Signed-off-by: quic-sanising <[email protected]>

Merge branch 'main' into ods-unit-tests

27d8dd5

Add example script

067f9b5

Signed-off-by: sanising <[email protected]>

Update docs

931860f

Signed-off-by: sanising <[email protected]>

Enable On Device Sampling for _continuous_batching_execution()

79b6c95

Signed-off-by: sanising <[email protected]>

Disable On Device Sampling for _regular_model_execution()

75eac30

Signed-off-by: sanising <[email protected]>

Use same sampling parameters for each sequence in a batch

eb6e2eb

Signed-off-by: sanising <[email protected]>

Enable On Device Sampling for _regular_model_execution()

48b35e3

Signed-off-by: sanising <[email protected]>

Add test for greedy sampling

c83a631

Signed-off-by: sanising <[email protected]>

Add test for random sampling

f698a24

Signed-off-by: sanising <[email protected]>

Remove else block

7b34a07

Signed-off-by: sanising <[email protected]>

Merge branch 'main' into ods-unit-tests

5fa7269

Signed-off-by: sanising <[email protected]>

Reformat code

0ee201a

Signed-off-by: sanising <[email protected]>

Add new sampling param: frequency_penalties

df4aadd

Signed-off-by: quic-sanising <[email protected]>

Merge branch 'ods-unit-tests' into freq

315ccaf

Enable frequency penalties

c3f827c

Signed-off-by: quic-sanising <[email protected]>

Remove CtxGatherFuncCB3D as it does not support 2D ctx_indices

574b8ad

Signed-off-by: quic-sanising <[email protected]>

quic-sanising added 2 commits September 18, 2025 14:09

Merge branch 'main' into freq

4dcaef1

Fix merge errors

bb290a8

Signed-off-by: quic-sanising <[email protected]>

quic-sanising commented Sep 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support for Frequency Penalties in On Device Sampling #523

Add Support for Frequency Penalties in On Device Sampling #523

quic-sanising commented Jul 24, 2025

Uh oh!

quic-sanising commented Jul 24, 2025 •

edited

Loading

Uh oh!

quic-sanising Sep 18, 2025

Uh oh!

quic-rishinr commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Support for Frequency Penalties in On Device Sampling #523

Are you sure you want to change the base?

Add Support for Frequency Penalties in On Device Sampling #523

Conversation

quic-sanising commented Jul 24, 2025

✨ Add Frequency Penalty Support to On Device Sampling

Uh oh!

quic-sanising commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-sanising Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

quic-rishinr commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quic-sanising commented Jul 24, 2025 •

edited

Loading