You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I wonder if it is possible to use kv-cache for the auto-regressive inference?
For the moment it is super slow for the auto-regressive inference. When I want to use use_cache=True in model.generate() in app.py, there is always error showing that the attention_mask is none. Thank you!
The text was updated successfully, but these errors were encountered:
Hi, I wonder if it is possible to use kv-cache for the auto-regressive inference?
For the moment it is super slow for the auto-regressive inference. When I want to use
use_cache=True
inmodel.generate()
inapp.py
, there is always error showing that theattention_mask
is none. Thank you!The text was updated successfully, but these errors were encountered: