Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3.1-405B non-zero temperature #2116

Open
psyhtest opened this issue Feb 18, 2025 · 4 comments
Open

Llama3.1-405B non-zero temperature #2116

psyhtest opened this issue Feb 18, 2025 · 4 comments

Comments

@psyhtest
Copy link
Contributor

psyhtest commented Feb 18, 2025

In the reference implementation of Llama3.1-405B, temperature is set to 1. Is this intentional?

Normally, temperature should be set to zero for outputs for be more deterministic. Can an optimized implementation use a different temperature?

@psyhtest
Copy link
Contributor Author

N.B: We are aware of an open vLLM issue, due to which setting temperature to zero still results in non-determinism. Maybe we will need to recalibrate reference accuracy for the next round (v5.1).

@psyhtest
Copy link
Contributor Author

Inference WG 18/Feb/2025: multiple parties run the reference implementation and obtained identical results. Maybe it's a by-product of topk=1. Optimized submissions should use the same parameters.

@psyhtest
Copy link
Contributor Author

psyhtest commented Feb 21, 2025

Does a temperature of 0 result in non-determinism?

A common case of confusion is if a temperature of 0 generates non-deterministic replies. In theory, yes. In practice, no.

As noted by this OpenAI Forum Thread, achieving non-determinism is impossible. A temperature of 0 does force the SoftMax function to choose the most likely response—which is the definition of greedy sampling and is non-deterministic. However, LLMs are not run in a vacuum; race conditions of multi-threaded code impacts the established likelihoods of tokens. Consequently, while temperature reduces randomness to a minimum, it doesn’t eliminate them.

However, the randomness is minimized to the extent that developers can expect near non-determinism. For most queries that specify the structure of the expected output, this reduction in randomness is sufficient.

Am I alone in thinking that by "non-determinism" the authors of this article actually mean "determinism"? Surely, "the randomness is minimized" should mean "near determinism"?

@psyhtest
Copy link
Contributor Author

It's like flammable vs inflammable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant