If it is possible to use kv-cache for the auto-regressive inference? #176

audreyeternal · 2025-03-03T21:54:50Z

Hi, I wonder if it is possible to use kv-cache for the auto-regressive inference?
For the moment it is super slow for the auto-regressive inference. When I want to use use_cache=True in model.generate() in app.py, there is always error showing that the attention_mask is none. Thank you!

The text was updated successfully, but these errors were encountered:

audreyeternal · 2025-03-03T21:58:10Z

#117

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If it is possible to use kv-cache for the auto-regressive inference? #176

If it is possible to use kv-cache for the auto-regressive inference? #176

audreyeternal commented Mar 3, 2025

audreyeternal commented Mar 3, 2025

If it is possible to use kv-cache for the auto-regressive inference? #176

If it is possible to use kv-cache for the auto-regressive inference? #176

Comments

audreyeternal commented Mar 3, 2025

audreyeternal commented Mar 3, 2025