When using the model with a context length greater than 32K, I have observed that while the inference speed does show an improvement, the model's capabilities degrade significantly.
I believe this is an important issue that may impact the usability of the Qwen2.5 or other models in scenarios where longer context lengths are required. I would appreciate it if the relevant team or community members could look into this and provide some insights or possible solutions.