[PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention #523

pralay-das · 2025-09-19T13:48:10Z

In this PR I have supported Rotary Position embedding on both Q, K tensor on global memory.

build and run flash attn test

ninja cutlass_test_unit_flash_attention_prefill_bf16_fp32_bf16_h64_xe
./test/unit/flash_attention/flash_attention_prefill/cutlass_test_unit_flash_attention_prefill_bf16_fp32_bf16_h64_xe

applications/flash_attention_v2/collective/xe_flash_attn_prefill_mma.hpp

test/unit/flash_attention/flash_attention_prefill/xe_flash_prefill.cpp

applications/flash_attention_v2/collective/xe_rotary.h

applications/flash_attention_v2/collective/xe_flash_attn_prefill_mma.hpp

applications/flash_attention_v2/collective/xe_rotary.h

…ite issue

Antonyvance · 2025-10-17T21:25:24Z

Can you check this PR 547 and redesign accordingly? @pralay-das

[WIP] Added support for Rotary Embedding in flash_attention

cd3cd03

sanchitintel reviewed Sep 19, 2025

View reviewed changes

applications/flash_attention_v2/collective/xe_flash_attn_prefill_mma.hpp Outdated Show resolved Hide resolved

added xe_rotary.h

f4445c1

sanchitintel reviewed Sep 22, 2025

View reviewed changes

test/unit/flash_attention/flash_attention_prefill/xe_flash_prefill.cpp Show resolved Hide resolved

applications/flash_attention_v2/collective/xe_rotary.h Show resolved Hide resolved

applications/flash_attention_v2/collective/xe_flash_attn_prefill_mma.hpp Outdated Show resolved Hide resolved

sanchitintel reviewed Sep 22, 2025

View reviewed changes

applications/flash_attention_v2/collective/xe_flash_attn_prefill_mma.hpp Outdated Show resolved Hide resolved

sanchitintel reviewed Sep 22, 2025

View reviewed changes

applications/flash_attention_v2/collective/xe_rotary.h Show resolved Hide resolved

pralay-das added 7 commits September 22, 2025 16:55

modified blockwise thread access for K tensor to overcome multiple wr…

cf5b49f

…ite issue

fixed build failure issue

8181dfa

fixed multiple write issue on K tensor

84034e7

Merge branch 'main' into dev/pralay-das/rope_on_gmem

e95548d

fixed the testcase issue

326dcff

enable noncausal_rope test

fad63fe

cleaned the code

9694871

pralay-das changed the title ~~[WIP] Added support for Rotary Embedding in flash_attention~~ [PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention Oct 7, 2025

sdp and others added 3 commits October 10, 2025 23:47

fixed issue precision issue when head_dim 96, 128

4bd65a5

fixed issue for head_dim=192

c151912

cleaned the code

0bb63fc

pralay-das marked this pull request as ready for review October 13, 2025 03:04

Removed fp8 support for RoPE

7b5ddb9

Antonyvance requested a review from petercad October 15, 2025 05:15

Antonyvance added the redesign required Implementation require a redesign label Oct 17, 2025

instruction reordering for Q, K RoPE calculation

fe4aa17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention #523

[PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention #523

Uh oh!

pralay-das commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Antonyvance commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention #523

Are you sure you want to change the base?

[PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention #523

Uh oh!

Conversation

pralay-das commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

build and run flash attn test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Antonyvance commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pralay-das commented Sep 19, 2025 •

edited

Loading