-
Notifications
You must be signed in to change notification settings - Fork 29
[AIE2][AIE2P] Add combiners related to extract/insert/broadcast #723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
78a35a0 to
595cb40
Compare
| combine_extract_concat, | ||
| combine_unmerge_concat, | ||
| combine_upd_to_concat, | ||
| combine_upd_to_concat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I removed one unnecessary space after...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you guys figured out how to run clang format on td files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry! I just saw this space and deleted it....
|
|
||
| // Check that the insert source vector is G_IMPLICIT_DEF | ||
| const MachineInstr *InsertSrcMI = MRI.getVRegDef(InsertSrcVecReg); | ||
| if (!InsertSrcMI || InsertSrcMI->getOpcode() != TargetOpcode::G_IMPLICIT_DEF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I have been using broadcast instead of insert into implicit def for the FMUL implementation. It will not work there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, I was working on the FADD mir. It looks we use different approaches in different legalizations.
…x), index) into copy Postlegalizer combiner that matches a pattern where: %18:_(<16 x s32>) = COPY $x0 %10:_(<16 x s32>) = G_IMPLICIT_DEF %9:_(s32) = G_CONSTANT i32 0 %8:_(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT %18(<16 x s32>), %9(s32) %22:_(<16 x s32>) = G_AIE_INSERT_VECTOR_ELT %10, %8(s32), %9(s32) And turns it into: %22:_(<16 x s32>) = COPY %18
This patch adds a new postlegalizer combiner that optimizes patterns where an element is extracted from position 0, broadcast to a vector, and then only position 0 is used through a chain of operations. Pattern optimized: %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0 %broadcast = G_AIE_BROADCAST_VECTOR %extract ... (chain of concat/unmerge/vector ops) %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0 Transforms to: %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0 %broadcast = COPY %src // Broadcast eliminated! ... (chain of operations) %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0 The combiner uses a conservative whitelist approach to verify that operations in the chain don't shift vector elements. It handles: - G_CONCAT_VECTORS (broadcast must be first operand) - G_UNMERGE_VALUES (only first output used, others dead) - Whitelisted operations: G_FADD, G_FMUL, G_FSUB, and specific AIE2P intrinsics - Rejects: G_BITCAST, type mismatches, non-zero extracts, multiple uses This optimization is particularly beneficial when broadcasts are used unnecessarily when only the first element is needed. The combiner runs as a postlegalizer custom combiner for both AIE2 and AIE2P.
595cb40 to
36316a6
Compare
…x), index) into copy
Postlegalizer combiner that matches a pattern where:
%18:(<16 x s32>) = COPY $x0
%10:(<16 x s32>) = G_IMPLICIT_DEF
%9:(s32) = G_CONSTANT i32 0
%8:(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT %18(<16 x s32>), %9(s32)
%22:_(<16 x s32>) = G_AIE_INSERT_VECTOR_ELT %10, %8(s32), %9(s32)
And turns it into:
%22:_(<16 x s32>) = COPY %18
FYI @martien-de-jong.