-
Notifications
You must be signed in to change notification settings - Fork 29
[AIE][Legalize] FMUL type minimization looks through fneg. FADD implemented in vector unit #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
22b9f5c to
83610f4
Compare
83610f4 to
e1007fa
Compare
|
Could we also split this commit e1007fa in two, one for FMUL (combiner) and another one for the (FADD) legalizer? |
795096f to
ec36fe8
Compare
llvm/test/CodeGen/AIE/aie2p/GlobalIsel/prelegalizercombiner-widen-fmul16.mir
Outdated
Show resolved
Hide resolved
6d03445 to
a858ab0
Compare
…x), index) into copy Postlegalizer combiner that matches a pattern where: %18:_(<16 x s32>) = COPY $x0 %10:_(<16 x s32>) = G_IMPLICIT_DEF %9:_(s32) = G_CONSTANT i32 0 %8:_(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT %18(<16 x s32>), %9(s32) %22:_(<16 x s32>) = G_AIE_INSERT_VECTOR_ELT %10, %8(s32), %9(s32) And turns it into: %22:_(<16 x s32>) = COPY %18 Factor out helper function
This patch adds a new postlegalizer combiner that optimizes patterns where an element is extracted from position 0, broadcast to a vector, and then only position 0 is used through a chain of operations. Pattern optimized: %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0 %broadcast = G_AIE_BROADCAST_VECTOR %extract ... (chain of concat/unmerge/vector ops) %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0 Transforms to: %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0 %broadcast = COPY %src // Broadcast eliminated! ... (chain of operations) %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0 The combiner uses a conservative whitelist approach to verify that operations in the chain don't shift vector elements. It handles: - G_CONCAT_VECTORS (broadcast must be first operand) - G_UNMERGE_VALUES (only first output used, others dead) - Whitelisted operations: G_FADD, G_FMUL, G_FSUB, and specific AIE2P intrinsics - Rejects: G_BITCAST, type mismatches, non-zero extracts, multiple uses This optimization is particularly beneficial when broadcasts are used unnecessarily when only the first element is needed. The combiner runs as a postlegalizer custom combiner for both AIE2 and AIE2P.
…o S32 We legalize a scalar to one element in a vector. We lose some IEEE compliance, but we get more equivalance between vectors and scalars in return.
a858ab0 to
87e3368
Compare
| } | ||
|
|
||
| } // namespace | ||
| Register buildInsertInUndef(MachineIRBuilder &B, Register Src, LLT VecTy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are kind of generating legal code here. For example, we use AIE-specific G opcode that demands s32 types. On the other hand, we could also use standard generic opcods that can work on vector Elt type to insert - and let the legalizer do the rest.
I recommend to consider this for a follow up PR - not now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return Vector; | ||
| } | ||
|
|
||
| Register buildScalarAsVector(MachineIRBuilder &B, Register Src, LLT VecTy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we take this a common utility, we could refactor also legalization code (future).
| return Acc; | ||
| } | ||
|
|
||
| void buildFirstElement(MachineIRBuilder &B, Register DstReg, Register Vec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a good candidate for utility function. We could also have something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Register buildGetFirstElement ?
This is a combine pattern that pushes S32 FNEGs through FPEXT These FNEGs prevent standard InstrCombines, and appear in some kernels. Make buildScalarAsVector switchable by cl option to either use broadcast or insert in undef. The latter is easier for a follow-up combine, but is less clean because it may have NaN effects on flags. We wired the default to broadcast, since the combine to cover the important case is implemented already We also fix some cosmetics from earlier commits
87e3368 to
b56d439
Compare
andcarminati
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
A widened fmul (fpext(a), fpext(b)) with possible negates is legalized to a vector MUL.
fadd(s32) is mapped to VADD.f