-
Notifications
You must be signed in to change notification settings - Fork 76
[BACKEND] Enhance the remove layout implementation to reduce the duplicated values with different layout in scf.for. #4527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
486ed4a to
f42bd66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the remove layout implementation to better handle layout propagation across scf.for operations, addressing limitations that create performance bottlenecks on Intel GPU. The changes focus on reducing duplicated layout conversion operations by improving support for multi-result operations and nested basic blocks.
- Adds support for propagating layouts through
scf.foroperations with a newincludeForOpparameter - Refactors
mappedValuesto handle multiple attribute mappings per value instead of single mappings - Includes debug output and unreachable code handling for
scf.foroperations
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| Utility.h | Adds includeForOp parameter to getConvertBackwardSlice function signature |
| Utility.cpp | Implements scf.for layout propagation logic with early return check and debug output |
| RemoveLayoutConversions.cpp | Updates data structures to support multiple encodings per value and enables scf.for processing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
f42bd66 to
7a8f858
Compare
|
The flex attn backward ttgir has been simplified by these changes. There are only two root tiling layout of the dpas and the transpose of dot of dpas. Another major in-efficient issue on Xe-Xe3 is that the regular pointer under different layout like: The simplified ttgir |
7a8f858 to
5828b55
Compare
I checkout this branch and run the benchmark code in flex attn bwdBut the TTGI I get is not the same as the one you mentioned, the IR around the Snippet from TTGIRThe IR still contains the convert_layout operations and there are other ops (e.g. ttg.local_alloc) which do not appear for you. How did you compile the benchmark? Does this PR contains all of your code ? |
5828b55 to
d85a2ba
Compare
|
@etiotto Added the missed changes for debug the backward. |
…d values with different layout in scf.for. Signed-off-by: Lu,Chengjun <[email protected]>
Signed-off-by: Lu,Chengjun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
…ntel/intel-xpu-backend-for-triton into chengjun/enhance_remove_layout
…ntel/intel-xpu-backend-for-triton into chengjun/enhance_remove_layout
Signed-off-by: Ettore Tiotto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Ettore Tiotto <[email protected]>
Signed-off-by: Ettore Tiotto <[email protected]>
Signed-off-by: Ettore Tiotto <[email protected]>
|
I run inductor tests, there were 3 test that fails in the autotune suite, however the same failure show up using the latest main branch too. So I think this PR is clear. @whitneywhtsang can you give me a review please ? |
whitneywhtsang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a lit test
…icated values with different layout in scf.for. (#4527)
The code change cannot easily be tested using a LIT test. Given that no regression are detected, and that the code simply allow for more than one encoding (a list of encoding) for each mapped value I think the code can be reviewed as is. @chengjunlu this was originally your PR. Do you think this code is still valuable to land ? |
Yes, I think so. It has benifits for the regualr pointer. |
Can you please update the PR description? |

No description provided.