[WaveTransform] Use S_AND_SAVEEXEC for the EXEC update at divergent OriginBranch nodes#3040
Conversation
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/508 |
d047b7f to
8452297
Compare
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/530 |
vg0204
left a comment
There was a problem hiding this comment.
Just reviewed the code, still in process of reviewing LIT test.
| // Donot mask CurReg if CurReg = S_AND_SAVEEXEC(_TERM) Reg | ||
| // Contributions from this Opc implies we are building the rejoin merge at | ||
| // secondary block and the contribution should be used as is , without EXEC | ||
| // AND masking. |
There was a problem hiding this comment.
Can you make this comment more modular with punctuations to convey it more clearly!
There was a problem hiding this comment.
This comment should clearly mention its(SavedExec) all places of usage , divergent & rejoin block, right?
There was a problem hiding this comment.
I cannot see the any S_OR using this savedExec as depicted in divergent block example between S_AND_SAVEEXEC & Branch instr. Why?
There was a problem hiding this comment.
Its used in L2345, the updater constructs the OR instr when we call Updater.getValueInMiddleOfBlock().
There was a problem hiding this comment.
Double XOR redundancy removed here
|
This work needs a proper discussion before proceeding with fruther discussions on the PR. We have to collect all the missing optimizations and further cleanup discussed so far, and have a clear plan how we want to proceed further. It is really crucial for how we want to shape the exec mask operations inserted by the wave transform pass at various blocks. Let's do that after we fix the full lit test cases and the extended PSDB is launched. |
At each divergent
OriginBranch, the Wave Transform pass previously emitted a separate XOR to compute the rejoin (exit) lane delta and anS_MOVto write the primary-successor mask intoEXEC. This replaces that pair with a single fusedS_AND_SAVEEXEC, which both writes the primary mask intoEXECand saves the oldEXECvalue in one instruction. The saved value is then reused directly as the rejoin contribution, eliminating the explicitS_XOR.This also resolves a redundant double-XOR sequence that arose when the divergence condition was itself an inverted (XOR'd) mask — both XORs and the AND-to-
EXECnow collapse into the singleS_AND_SAVEEXEC.Codegen change
Case 1 — a divergent branch
Before:
After:
Case 2 — a block that rejoins lanes, then diverges again
Before:
After:
How it works
From the
OriginBranch, the primary lanes flow toPrimarySucc; all the lanes come back together (rejoin) atSecSucc. The only thing that changes is what we accumulate into%RejoinAcc:Rejoin = exec & ~PrimarySuccLanes(only the lanes that are not taking the primary edge).Rejoin = exec(the full oldEXEC, saved byS_AND_SAVEEXEC).Since
PrimarySuccLanesis a subset ofexec, the rejoin stepexec = exec OR RejoinAccatSecSuccproduces the same final mask either way: the extra primary-successor lanes carried in the "after" form are already present inexecat the rejoin point, so OR-ing them back in changes nothing. The fused form simply defers adding those lanes instead of masking them out up front.