Skip to content

Conversation

@martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Nov 20, 2025

A widened fmul (fpext(a), fpext(b)) with possible negates is legalized to a vector MUL.
fadd(s32) is mapped to VADD.f

@martien-de-jong martien-de-jong force-pushed the martien.vectorize-fadd branch 2 times, most recently from 22b9f5c to 83610f4 Compare November 27, 2025 16:58
@martien-de-jong martien-de-jong changed the title [InstCombine] FP type minimization looks through fneg. [AIE][Legalize] FMUL type minimization looks through fneg. FADD implemented in vector unit Nov 27, 2025
@andcarminati
Copy link
Collaborator

Could we also split this commit e1007fa in two, one for FMUL (combiner) and another one for the (FADD) legalizer?

@martien-de-jong martien-de-jong force-pushed the martien.vectorize-fadd branch 2 times, most recently from 795096f to ec36fe8 Compare November 28, 2025 13:28
@martien-de-jong martien-de-jong force-pushed the martien.vectorize-fadd branch 3 times, most recently from 6d03445 to a858ab0 Compare December 1, 2025 16:54
andcarminati and others added 4 commits December 2, 2025 09:22
…x), index) into copy

Postlegalizer combiner that matches a pattern where:
 %18:_(<16 x s32>) = COPY $x0
 %10:_(<16 x s32>) = G_IMPLICIT_DEF
 %9:_(s32) = G_CONSTANT i32 0
 %8:_(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT %18(<16 x s32>), %9(s32)
 %22:_(<16 x s32>) = G_AIE_INSERT_VECTOR_ELT %10, %8(s32), %9(s32)

And turns it into:
 %22:_(<16 x s32>) = COPY %18

Factor out helper function
This patch adds a new postlegalizer combiner that optimizes patterns where
an element is extracted from position 0, broadcast to a vector, and then
only position 0 is used through a chain of operations.

Pattern optimized:
  %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0
  %broadcast = G_AIE_BROADCAST_VECTOR %extract
  ... (chain of concat/unmerge/vector ops)
  %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0

Transforms to:
  %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0
  %broadcast = COPY %src  // Broadcast eliminated!
  ... (chain of operations)
  %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0

The combiner uses a conservative whitelist approach to verify that operations
in the chain don't shift vector elements. It handles:
- G_CONCAT_VECTORS (broadcast must be first operand)
- G_UNMERGE_VALUES (only first output used, others dead)
- Whitelisted operations: G_FADD, G_FMUL, G_FSUB, and specific AIE2P intrinsics
- Rejects: G_BITCAST, type mismatches, non-zero extracts, multiple uses

This optimization is particularly beneficial when broadcasts are used unnecessarily when
only the first element is needed.

The combiner runs as a postlegalizer custom combiner for both AIE2 and AIE2P.
…o S32

We legalize a scalar to one element in a vector. We lose some IEEE compliance,
but we get more equivalance between vectors and scalars in return.
}

} // namespace
Register buildInsertInUndef(MachineIRBuilder &B, Register Src, LLT VecTy) {
Copy link
Collaborator

@andcarminati andcarminati Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are kind of generating legal code here. For example, we use AIE-specific G opcode that demands s32 types. On the other hand, we could also use standard generic opcods that can work on vector Elt type to insert - and let the legalizer do the rest.

I recommend to consider this for a follow up PR - not now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Vector;
}

Register buildScalarAsVector(MachineIRBuilder &B, Register Src, LLT VecTy) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take this a common utility, we could refactor also legalization code (future).

return Acc;
}

void buildFirstElement(MachineIRBuilder &B, Register DstReg, Register Vec) {
Copy link
Collaborator

@andcarminati andcarminati Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a good candidate for utility function. We could also have something like:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Register buildGetFirstElement ?

This is a combine pattern that pushes S32 FNEGs through FPEXT
These FNEGs prevent standard InstrCombines, and appear in some kernels.

Make buildScalarAsVector switchable by cl option to either
use broadcast or insert in undef.
The latter is easier for a follow-up combine, but is less clean because
it may have NaN effects on flags.
We wired the default to broadcast, since the combine to cover the
important case is implemented already

We also fix some cosmetics from earlier commits
Copy link
Collaborator

@andcarminati andcarminati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@martien-de-jong martien-de-jong merged commit 3e61fde into aie-public Dec 2, 2025
11 checks passed
@martien-de-jong martien-de-jong deleted the martien.vectorize-fadd branch December 2, 2025 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants