Skip to content

Conversation

@Beichen-Ma
Copy link

Ring attention implementation does not support Flash Attention 3 yet. Using --attn-implementation flash_attention_3 with --context-parallel-size > 1 would silently cause NaN loss during training. This change adds an early validation check
that raises a clear error with actionable guidance.

Copy link
Collaborator

@PopSoda2002 PopSoda2002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the working, I think CP currently needs to be enabled with a enable_experimental flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants