Skip to content

Conversation

@pralay-das
Copy link

@pralay-das pralay-das commented Sep 19, 2025

In this PR I have supported Rotary Position embedding on both Q, K tensor on global memory.

build and run flash attn test

ninja cutlass_test_unit_flash_attention_prefill_bf16_fp32_bf16_h64_xe
./test/unit/flash_attention/flash_attention_prefill/cutlass_test_unit_flash_attention_prefill_bf16_fp32_bf16_h64_xe

@pralay-das pralay-das changed the title [WIP] Added support for Rotary Embedding in flash_attention [PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention Oct 7, 2025
@pralay-das pralay-das marked this pull request as ready for review October 13, 2025 03:04
@Antonyvance Antonyvance requested a review from petercad October 15, 2025 05:15
@Antonyvance
Copy link

Can you check this PR 547 and redesign accordingly? @pralay-das

@Antonyvance Antonyvance added the redesign required Implementation require a redesign label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redesign required Implementation require a redesign

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants