Learned KV Cache Prefix

Create a mode which learns the input embeddings for the prefix.

Create a mode where each element of kv cache is learned separately.

This can be done for STT, and dirct translation datasets.