JIT: Merge all RETURN/THROW blocks#128515
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
@AndyAyersMS PTAL. |
| // Avoid splitting a return away from a possible tail call | ||
| // | ||
| if (!block->hasSingleStmt()) | ||
| if (block->isEmpty()) |
There was a problem hiding this comment.
This check was here before, but I dont think we actually need it. Because we only accept RETURN or THROW blocks and these should never be empty?
AndyAyersMS
left a comment
There was a problem hiding this comment.
Can we do this without repeatedly searching all blocks for returns and throws?
| do | ||
| { | ||
| predInfo.Reset(); | ||
| for (BasicBlock* const block : Blocks()) |
There was a problem hiding this comment.
The set of eligible return and throw blocks never changes, so do we need to repeatedly walk the entire block list here?
There was a problem hiding this comment.
I would also think we don't need to, however when I tried to hoist that it caused asserts.
I didn't look further into it because the same approach is already done in iterateTailMerge() and there are also multiple comments arround this code about improving algorithm efficiency.
So I'd prefer properly understanding the entire code and improving efficiency in a separate PR, in the future.
There was a problem hiding this comment.
iterateTailMerge just walks the preds of a given block, not all blocks.
What asserts did you see?
There was a problem hiding this comment.
iterateTailMerge just walks the preds of a given block, not all blocks.
Yeah it happens to be only the preds here and so less of an issue but the fundamental thing of not needing to regenerate the set still applies I think.
What asserts did you see?
runtime/src/coreclr/jit/fgstmt.cpp
Lines 542 to 550 in 7f58900
Took a quick look, the issue might be that we don't remove entries from predInfo after we merged them.
So it will try to merge them a second time on the second iter - and is never making any progress.
Let me see if I can fix it...
…cate-all-return-throw-blocks
…of reinvoking and re-gathering candidates every timme * hack to suppress positive diffs
|
I recommend keeping refactoring/renaming changes and functionality in separate PRs, otherwise reviews are more likely to miss important things. Also, does tail merging returns lead to new tail merge opportunities like it does for other blocks (eg should we be populating "retry blocks")? |
Yes, deduplicating return blocks often does expose new opportunities to tail merge. We are already pushing merged blocks to the runtime/src/coreclr/jit/fgopt.cpp Lines 5370 to 5372 in 5341a84 Here is an example (for myself to harden understanding): static int Example(bool cond1, bool cond2, ref int x, ref int y)
{
if (cond1)
{
y = 8;
x = 9;
return 10;
}
if (cond2)
{
y = 8;
x = 9;
return 10;
}
return 2;
}First we pull out the A set of 2 return/throw blocks end with the same tree
STMT00005 ( 0x017[E--] ... 0x019 )
[000017] ----------- * RETURN int
[000016] ----------- \--* CNS_INT int 10
New Basic Block BB06 [0005] created.
setting likelihood of BB02 -> BB06 to 1
Will cross-jump to newly split off BB06
unlinking STMT00005 ( 0x017[E--] ... 0x019 )
[000017] ----------- * RETURN int
[000016] ----------- \--* CNS_INT int 10
from BB04
setting likelihood of BB04 -> BB06 to 1
Deduplicated 1 set of return/throw blocksAfter that we look at the predecessors of the new All 2 preds of BB06 end with the same tree, moving
STMT00004 ( 0x013[E--] ... 0x016 )
[000015] -A-XG------ * STOREIND int
[000013] ----------- +--* LCL_VAR byref V02 arg2
[000014] ----------- \--* CNS_INT int 9
unlinking STMT00004 ( 0x013[E--] ... 0x016 )
[000015] -A-XG------ * STOREIND int
[000013] ----------- +--* LCL_VAR byref V02 arg2
[000014] ----------- \--* CNS_INT int 9
from BB04
unlinking STMT00007 ( 0x006[E--] ... 0x009 )
[000023] -A-XG------ * STOREIND int
[000021] ----------- +--* LCL_VAR byref V02 arg2
[000022] ----------- \--* CNS_INT int 9
from BB02
Merged 1 set of tails going into BB06And so one-by-one we work ourselves through the equivalent statements. Regathering predecessors at each step. Note: For some cases we might be able to consider tails equivalent even though their exact stmt order isnt the same (?), granted they can be re-ordered accordingly. Update: I just moved de-duplicating return/throw blocks before tail merging and no longer pushing to |
…f using a BitVec to sparsely mark them as processed * move de-duplication before tail-merging and then no longer add them to the retry list as it isnt needed * use stl iterator tag to be able to call std::stable_partition * and assert to vector indexer
…s in downstream phases because the way we choose the crossJumpVictim is order-dependent and non optimal (for example we'd want to avoid new BBF_NEEDS_GCPOLL) * also remove the std::reverse - same reason
Fix #128514
tailMergePreds(nullptr)was called once, but my understanding is it needs to be called repeatedly as it only processes one set at at time.