Skip to content

Conversation

@andcarminati
Copy link
Collaborator

@andcarminati andcarminati commented Oct 21, 2025

This work is intended to avoid 2D/3D (when possible) register spills.

The idea and rationale behind this work is in a previous Draft PR: #442.

To review, I recommend to follow this PR commit by commit.

Credits also for the co-author @krishnamtibrewala.

@andcarminati
Copy link
Collaborator Author

andcarminati commented Oct 21, 2025

QoR results:

Core_Insn_Count Core_StackSize_absolute Core_PMSize_absolute

@krishnamtibrewala
Copy link
Collaborator

Thanks you @andcarminati, very much !!

@krishnamtibrewala
Copy link
Collaborator

Also do you think following commit will help ?
a681b6e

@andcarminati
Copy link
Collaborator Author

Also do you think following commit will help ? a681b6e

Maybe yes! As mentioned before, I prefer to keep just the minimal necessary changes. We can test after, on top of this PR.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 31b7e71 to a27561f Compare October 22, 2025 08:35
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from a27561f to e124649 Compare October 31, 2025 14:07
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 3 times, most recently from 80e6f7c to 4c47705 Compare October 31, 2025 14:36
const AIEBaseRegisterInfo &TRI,
std::set<Register> &VisitedVRegs);

SmallSet<int, 8> getRewritableSubRegs(Register Reg,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a comment what this function does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a refactor but it is always good to document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we swap the documentation here?
I think the high level function should get the documentation, the actual implementation function (the one above this) should get the slimmed documentation.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 2 times, most recently from a88a541 to ab8c08d Compare November 7, 2025 13:02
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 3 times, most recently from 9e25710 to 8dba52e Compare November 19, 2025 10:11
}

/// Rewrite a full copy into multiple copies using the subregs in \p CopySubRegs
void rewriteFullCopy(MachineInstr &CopyMI, LiveIntervals &LIS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: in the current state we will never rewrite subreg copies, e.g. copies of d0 or d4 in the case of a 3d_0 register.
Do you know why they are not relevant for the Superregrewriter?

Copy link
Collaborator Author

@andcarminati andcarminati Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will never occur because it will be prevented by. getRewritableSubRegs.

++VRegIdx) {
const Register Reg = Register::index2VirtReg(VRegIdx);

// Ignore un-used od already allocated registers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: or

//
// (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates
//
//===----------------------------------------------------------------------===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What registers aren't assigned after the previous 2d/3d reg allocs?
Are we changing here the copies and moves not the padds?
Could you Comment why this pass is necessary and the previous passes do not pick this up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will complement this part.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 8dba52e to b8476f5 Compare November 21, 2025 12:34
@F-Stuckmann
Copy link
Collaborator

LGTM @martien-de-jong what do you say

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from b8476f5 to 57f2573 Compare November 24, 2025 13:25
int SubReg = RegOp.getSubReg();
assert(SubReg);
RegOp.setReg(SubRegToVReg[SubReg]);
RegOp.setSubReg(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: 3D and 2D reg alloc will have already allocated the the 3D and 2D users except for the copies.
We spill them in line 178, so that we only work on subregs.
The users we now encounter are the subreg copies from the rewriteFullCopy

Copy link
Collaborator Author

@andcarminati andcarminati Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that rewriteFullCopy is to prevent failures during the rewrite. We can rewrite also operands in load and move (also immediate) instructions. See instructions covered by ItinRegClassPair.

andcarminati and others added 9 commits November 26, 2025 23:43
Now we filter by register class and usage. Basically, we exclude here
instructions like copies and non-2D/3D ones.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
The goal of this test is to check if we properly insert undef flag on the def side
of a expanded full copy.  On a sub-register def operand, it refers to the part of the
register that isn't written. A sub-register def implicitly reads the other parts of the
register being redefined unless the <undef> flag is set, and a missing flag can
force the related register to be inserted in liveout set of the predecessors block,
causing dominance problems.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
This will handle properly use of non-dominating definitions. We also
change the handling of the destination registers in two parts:

*Copy expansion: we replace the ogininal index by the index of the first
lane copy to avoid the creation LRs with just one instruction, in this
way we keep que LI correct.

*Rewrite: reset dead flags if necessary.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
If we don't need a full register, we can expand to individual lanes.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
This avoids cycles in bundles that appear in VirtRegRewriter.
We also update LIs related to src and dst operands of those
expanded copies.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 57f2573 to a08adb9 Compare November 27, 2025 07:04
@andcarminati andcarminati merged commit 74d4ba4 into aie-public Nov 27, 2025
7 checks passed
@andcarminati andcarminati deleted the andreu.extend.2d3d.allocation branch November 27, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants