Skip to content

Conversation

@andcarminati
Copy link
Collaborator

…x), index) into copy

Postlegalizer combiner that matches a pattern where:
%18:(<16 x s32>) = COPY $x0
%10:
(<16 x s32>) = G_IMPLICIT_DEF
%9:(s32) = G_CONSTANT i32 0
%8:
(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT %18(<16 x s32>), %9(s32)
%22:_(<16 x s32>) = G_AIE_INSERT_VECTOR_ELT %10, %8(s32), %9(s32)

And turns it into:
%22:_(<16 x s32>) = COPY %18

FYI @martien-de-jong.

combine_extract_concat,
combine_unmerge_concat,
combine_upd_to_concat,
combine_upd_to_concat,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I removed one unnecessary space after...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you guys figured out how to run clang format on td files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry! I just saw this space and deleted it....


// Check that the insert source vector is G_IMPLICIT_DEF
const MachineInstr *InsertSrcMI = MRI.getVRegDef(InsertSrcVecReg);
if (!InsertSrcMI || InsertSrcMI->getOpcode() != TargetOpcode::G_IMPLICIT_DEF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I have been using broadcast instead of insert into implicit def for the FMUL implementation. It will not work there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I was working on the FADD mir. It looks we use different approaches in different legalizations.

…x), index) into copy

Postlegalizer combiner that matches a pattern where:
 %18:_(<16 x s32>) = COPY $x0
 %10:_(<16 x s32>) = G_IMPLICIT_DEF
 %9:_(s32) = G_CONSTANT i32 0
 %8:_(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT %18(<16 x s32>), %9(s32)
 %22:_(<16 x s32>) = G_AIE_INSERT_VECTOR_ELT %10, %8(s32), %9(s32)

And turns it into:
 %22:_(<16 x s32>) = COPY %18
This patch adds a new postlegalizer combiner that optimizes patterns where
an element is extracted from position 0, broadcast to a vector, and then
only position 0 is used through a chain of operations.

Pattern optimized:
  %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0
  %broadcast = G_AIE_BROADCAST_VECTOR %extract
  ... (chain of concat/unmerge/vector ops)
  %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0

Transforms to:
  %extract = G_AIE_SEXT_EXTRACT_VECTOR_ELT %src(<N x T>), 0
  %broadcast = COPY %src  // Broadcast eliminated!
  ... (chain of operations)
  %result = G_AIE_SEXT_EXTRACT_VECTOR_ELT %final, 0

The combiner uses a conservative whitelist approach to verify that operations
in the chain don't shift vector elements. It handles:
- G_CONCAT_VECTORS (broadcast must be first operand)
- G_UNMERGE_VALUES (only first output used, others dead)
- Whitelisted operations: G_FADD, G_FMUL, G_FSUB, and specific AIE2P intrinsics
- Rejects: G_BITCAST, type mismatches, non-zero extracts, multiple uses

This optimization is particularly beneficial when broadcasts are used unnecessarily when
only the first element is needed.

The combiner runs as a postlegalizer custom combiner for both AIE2 and AIE2P.
@andcarminati andcarminati force-pushed the andreu.extract.insert.postlegalizer branch from 595cb40 to 36316a6 Compare December 1, 2025 13:55
@andcarminati andcarminati changed the title [AIE2][AIE2P] Add combiner to convert insert(undef, extract(vec, inde… [AIE2][AIE2P] Add combiners related to extract/insert/broadcast Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants