Hi,
Thank you for the great repo! I learn a lot from the implementation. Have you tried combining SeerAttention and SeerAttention-R to allow both sparse prefill and decoding? I guess a simple way is to use different linear layers and token budgets for prefill and decoding? I'm just curious about the results or your thoughts on this.
Thank you!