Skip to content

Conversation

quic-sanising
Copy link
Contributor

✨ Add Frequency Penalty Support to On Device Sampling

This PR adds support for the frequency_penalty parameter in On Device Sampling for QEffForCausalLM models. This parameter adjusts token selection based on how often tokens have already appeared in the generated output:

  • Positive values discourage repetition and promote diversity.
  • Negative values encourage repetition.
  • Zero disables the penalty.

The implementation tracks token frequencies directly on the QAIC device using optimized scratch buffers, ensuring minimal overhead and maintaining high throughput. This feature integrates seamlessly with the existing include_sampler=True workflow and complements other supported strategies like repetition and presence penalties.

quic-sanising and others added 17 commits June 18, 2025 13:38
Signed-off-by: quic-sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: quic-sanising <[email protected]>
@quic-sanising
Copy link
Contributor Author

quic-sanising commented Jul 24, 2025

Depends on PR #463.

scatter_values,
torch.ones(last_accepted_output_tokens.shape, dtype=torch.bool),
)
gather_values = past_presence_penalty_buffer[batch_index, last_accepted_output_tokens]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For improved performance, I intended to use CtxGatherFuncCB3D but it doesn't work as last_accepted_output_tokens is a tensor of shape (batch_size, seq_len) whereas the function expects it to be of shape (batch_size, 1). Please let me know if there is a workaround.

@quic-rishinr
Copy link
Contributor

@quic-sanising is it ready for review? Can you rebase the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants