Create a mode which learns the input embeddings for the prefix. Create a mode where each element of kv cache is learned separately. This can be done for STT, and dirct translation datasets.