Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: logit_bias Persists Across Requests When cache_prompt Is Enabled in llama.cpp Server #9477

Open
jeanromainroy opened this issue Sep 14, 2024 · 1 comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

Comments

@jeanromainroy
Copy link

jeanromainroy commented Sep 14, 2024

What happened?

When using the llama.cpp server with cache_prompt enabled, I've encountered an issue where the logit_bias specified in one request persists and influences subsequent requests, even when those requests do not include any logit_bias. This results in unexpected, biased outputs in later requests, where the model continues to favor tokens from a previous logit_bias setting.

Expected Behavior:

  • logit_bias specified in one request should not affect others.
  • Enabling cache_prompt should not cause parameters like logit_bias to carry over between requests.

Steps to Reproduce:

  1. Start the llama.cpp server with cache_prompt enabled.
  2. First Request with logit_bias:
{
  "prompt": "Is the sky blue?\nAnswer with 'Yes', 'No', or 'N/A':",
  "max_tokens": 1,
  "logit_bias": [["Yes", 20], ["No", 20], ["N/A", 20]],
  "cache_prompt": true
}
  • Expected Output: "Yes", "No", or "N/A".
  • Actual Output: "Yes" (as expected).
  1. Second Request without logit_bias:
{
  "prompt": "Is the sky blue?",
  "max_tokens": 10,
  "cache_prompt": true
}
  • Expected Output: An unbiased response based solely on the prompt.
  • Actual Output: The model outputs "Yes" or shows bias toward previous tokens, indicating lingering logit_bias.

Name and Version

./llama-cli --version
version: 3733 (1b28061)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@jeanromainroy jeanromainroy added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Sep 14, 2024
@slaren
Copy link
Collaborator

slaren commented Sep 14, 2024

I cannot reproduce this. It is easy to test that the logit biases are applied in every request by giving a specific token a very high bias, effectively ensuring that it will be selected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Projects
None yet
Development

No branches or pull requests

2 participants