-
Notifications
You must be signed in to change notification settings - Fork 15.7k
[SimplifyCFG] Enable icmp-to-switch for direct ret cases #174473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-llvm-transforms Author: Jannik Glückert (Jannik2099) ChangesWhen an icmp chain directly leads to return (or select + return), convert it to a branch so that In practice, this allows us to optimize direct uses of a comparison chain into a switch - for example errno checks in #121262 I have used the inputs from #56087 as test cases. Since I reuse the constructed Note that this PR was made with the assistance of an LLM to identify the ideal location in SimplifyCFG for this change. Patch is 47.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174473.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 2a957737697c3..417acb2b5b2c7 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -267,267 +267,6 @@ struct ValueEqualityComparisonCase {
bool operator==(BasicBlock *RHSDest) const { return Dest == RHSDest; }
};
-class SimplifyCFGOpt {
- const TargetTransformInfo &TTI;
- DomTreeUpdater *DTU;
- const DataLayout &DL;
- ArrayRef<WeakVH> LoopHeaders;
- const SimplifyCFGOptions &Options;
- bool Resimplify;
-
- Value *isValueEqualityComparison(Instruction *TI);
- BasicBlock *getValueEqualityComparisonCases(
- Instruction *TI, std::vector<ValueEqualityComparisonCase> &Cases);
- bool simplifyEqualityComparisonWithOnlyPredecessor(Instruction *TI,
- BasicBlock *Pred,
- IRBuilder<> &Builder);
- bool performValueComparisonIntoPredecessorFolding(Instruction *TI, Value *&CV,
- Instruction *PTI,
- IRBuilder<> &Builder);
- bool foldValueComparisonIntoPredecessors(Instruction *TI,
- IRBuilder<> &Builder);
-
- bool simplifyResume(ResumeInst *RI, IRBuilder<> &Builder);
- bool simplifySingleResume(ResumeInst *RI);
- bool simplifyCommonResume(ResumeInst *RI);
- bool simplifyCleanupReturn(CleanupReturnInst *RI);
- bool simplifyUnreachable(UnreachableInst *UI);
- bool simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder);
- bool simplifyDuplicateSwitchArms(SwitchInst *SI, DomTreeUpdater *DTU);
- bool simplifyIndirectBr(IndirectBrInst *IBI);
- bool simplifyBranch(BranchInst *Branch, IRBuilder<> &Builder);
- bool simplifyUncondBranch(BranchInst *BI, IRBuilder<> &Builder);
- bool simplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder);
- bool foldCondBranchOnValueKnownInPredecessor(BranchInst *BI);
-
- bool tryToSimplifyUncondBranchWithICmpInIt(ICmpInst *ICI,
- IRBuilder<> &Builder);
- bool tryToSimplifyUncondBranchWithICmpSelectInIt(ICmpInst *ICI,
- SelectInst *Select,
- IRBuilder<> &Builder);
- bool hoistCommonCodeFromSuccessors(Instruction *TI, bool AllInstsEqOnly);
- bool hoistSuccIdenticalTerminatorToSwitchOrIf(
- Instruction *TI, Instruction *I1,
- SmallVectorImpl<Instruction *> &OtherSuccTIs,
- ArrayRef<BasicBlock *> UniqueSuccessors);
- bool speculativelyExecuteBB(BranchInst *BI, BasicBlock *ThenBB);
- bool simplifyTerminatorOnSelect(Instruction *OldTerm, Value *Cond,
- BasicBlock *TrueBB, BasicBlock *FalseBB,
- uint32_t TrueWeight, uint32_t FalseWeight);
- bool simplifyBranchOnICmpChain(BranchInst *BI, IRBuilder<> &Builder,
- const DataLayout &DL);
- bool simplifySwitchOnSelect(SwitchInst *SI, SelectInst *Select);
- bool simplifyIndirectBrOnSelect(IndirectBrInst *IBI, SelectInst *SI);
- bool turnSwitchRangeIntoICmp(SwitchInst *SI, IRBuilder<> &Builder);
-
-public:
- SimplifyCFGOpt(const TargetTransformInfo &TTI, DomTreeUpdater *DTU,
- const DataLayout &DL, ArrayRef<WeakVH> LoopHeaders,
- const SimplifyCFGOptions &Opts)
- : TTI(TTI), DTU(DTU), DL(DL), LoopHeaders(LoopHeaders), Options(Opts) {
- assert((!DTU || !DTU->hasPostDomTree()) &&
- "SimplifyCFG is not yet capable of maintaining validity of a "
- "PostDomTree, so don't ask for it.");
- }
-
- bool simplifyOnce(BasicBlock *BB);
- bool run(BasicBlock *BB);
-
- // Helper to set Resimplify and return change indication.
- bool requestResimplify() {
- Resimplify = true;
- return true;
- }
-};
-
-// we synthesize a || b as select a, true, b
-// we synthesize a && b as select a, b, false
-// this function determines if SI is playing one of those roles.
-[[maybe_unused]] bool
-isSelectInRoleOfConjunctionOrDisjunction(const SelectInst *SI) {
- return ((isa<ConstantInt>(SI->getTrueValue()) &&
- (dyn_cast<ConstantInt>(SI->getTrueValue())->isOne())) ||
- (isa<ConstantInt>(SI->getFalseValue()) &&
- (dyn_cast<ConstantInt>(SI->getFalseValue())->isNullValue())));
-}
-
-} // end anonymous namespace
-
-/// Return true if all the PHI nodes in the basic block \p BB
-/// receive compatible (identical) incoming values when coming from
-/// all of the predecessor blocks that are specified in \p IncomingBlocks.
-///
-/// Note that if the values aren't exactly identical, but \p EquivalenceSet
-/// is provided, and *both* of the values are present in the set,
-/// then they are considered equal.
-static bool incomingValuesAreCompatible(
- BasicBlock *BB, ArrayRef<BasicBlock *> IncomingBlocks,
- SmallPtrSetImpl<Value *> *EquivalenceSet = nullptr) {
- assert(IncomingBlocks.size() == 2 &&
- "Only for a pair of incoming blocks at the time!");
-
- // FIXME: it is okay if one of the incoming values is an `undef` value,
- // iff the other incoming value is guaranteed to be a non-poison value.
- // FIXME: it is okay if one of the incoming values is a `poison` value.
- return all_of(BB->phis(), [IncomingBlocks, EquivalenceSet](PHINode &PN) {
- Value *IV0 = PN.getIncomingValueForBlock(IncomingBlocks[0]);
- Value *IV1 = PN.getIncomingValueForBlock(IncomingBlocks[1]);
- if (IV0 == IV1)
- return true;
- if (EquivalenceSet && EquivalenceSet->contains(IV0) &&
- EquivalenceSet->contains(IV1))
- return true;
- return false;
- });
-}
-
-/// Return true if it is safe to merge these two
-/// terminator instructions together.
-static bool
-safeToMergeTerminators(Instruction *SI1, Instruction *SI2,
- SmallSetVector<BasicBlock *, 4> *FailBlocks = nullptr) {
- if (SI1 == SI2)
- return false; // Can't merge with self!
-
- // It is not safe to merge these two switch instructions if they have a common
- // successor, and if that successor has a PHI node, and if *that* PHI node has
- // conflicting incoming values from the two switch blocks.
- BasicBlock *SI1BB = SI1->getParent();
- BasicBlock *SI2BB = SI2->getParent();
-
- SmallPtrSet<BasicBlock *, 16> SI1Succs(llvm::from_range, successors(SI1BB));
- bool Fail = false;
- for (BasicBlock *Succ : successors(SI2BB)) {
- if (!SI1Succs.count(Succ))
- continue;
- if (incomingValuesAreCompatible(Succ, {SI1BB, SI2BB}))
- continue;
- Fail = true;
- if (FailBlocks)
- FailBlocks->insert(Succ);
- else
- break;
- }
-
- return !Fail;
-}
-
-/// Update PHI nodes in Succ to indicate that there will now be entries in it
-/// from the 'NewPred' block. The values that will be flowing into the PHI nodes
-/// will be the same as those coming in from ExistPred, an existing predecessor
-/// of Succ.
-static void addPredecessorToBlock(BasicBlock *Succ, BasicBlock *NewPred,
- BasicBlock *ExistPred,
- MemorySSAUpdater *MSSAU = nullptr) {
- for (PHINode &PN : Succ->phis())
- PN.addIncoming(PN.getIncomingValueForBlock(ExistPred), NewPred);
- if (MSSAU)
- if (auto *MPhi = MSSAU->getMemorySSA()->getMemoryAccess(Succ))
- MPhi->addIncoming(MPhi->getIncomingValueForBlock(ExistPred), NewPred);
-}
-
-/// Compute an abstract "cost" of speculating the given instruction,
-/// which is assumed to be safe to speculate. TCC_Free means cheap,
-/// TCC_Basic means less cheap, and TCC_Expensive means prohibitively
-/// expensive.
-static InstructionCost computeSpeculationCost(const User *I,
- const TargetTransformInfo &TTI) {
- return TTI.getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency);
-}
-
-/// If we have a merge point of an "if condition" as accepted above,
-/// return true if the specified value dominates the block. We don't handle
-/// the true generality of domination here, just a special case which works
-/// well enough for us.
-///
-/// If AggressiveInsts is non-null, and if V does not dominate BB, we check to
-/// see if V (which must be an instruction) and its recursive operands
-/// that do not dominate BB have a combined cost lower than Budget and
-/// are non-trapping. If both are true, the instruction is inserted into the
-/// set and true is returned.
-///
-/// The cost for most non-trapping instructions is defined as 1 except for
-/// Select whose cost is 2.
-///
-/// After this function returns, Cost is increased by the cost of
-/// V plus its non-dominating operands. If that cost is greater than
-/// Budget, false is returned and Cost is undefined.
-static bool dominatesMergePoint(
- Value *V, BasicBlock *BB, Instruction *InsertPt,
- SmallPtrSetImpl<Instruction *> &AggressiveInsts, InstructionCost &Cost,
- InstructionCost Budget, const TargetTransformInfo &TTI, AssumptionCache *AC,
- SmallPtrSetImpl<Instruction *> &ZeroCostInstructions, unsigned Depth = 0) {
- // It is possible to hit a zero-cost cycle (phi/gep instructions for example),
- // so limit the recursion depth.
- // TODO: While this recursion limit does prevent pathological behavior, it
- // would be better to track visited instructions to avoid cycles.
- if (Depth == MaxSpeculationDepth)
- return false;
-
- Instruction *I = dyn_cast<Instruction>(V);
- if (!I) {
- // Non-instructions dominate all instructions and can be executed
- // unconditionally.
- return true;
- }
- BasicBlock *PBB = I->getParent();
-
- // We don't want to allow weird loops that might have the "if condition" in
- // the bottom of this block.
- if (PBB == BB)
- return false;
-
- // If this instruction is defined in a block that contains an unconditional
- // branch to BB, then it must be in the 'conditional' part of the "if
- // statement". If not, it definitely dominates the region.
- BranchInst *BI = dyn_cast<BranchInst>(PBB->getTerminator());
- if (!BI || BI->isConditional() || BI->getSuccessor(0) != BB)
- return true;
-
- // If we have seen this instruction before, don't count it again.
- if (AggressiveInsts.count(I))
- return true;
-
- // Okay, it looks like the instruction IS in the "condition". Check to
- // see if it's a cheap instruction to unconditionally compute, and if it
- // only uses stuff defined outside of the condition. If so, hoist it out.
- if (!isSafeToSpeculativelyExecute(I, InsertPt, AC))
- return false;
-
- // Overflow arithmetic instruction plus extract value are usually generated
- // when a division is being replaced. But, in this case, the zero check may
- // still be kept in the code. In that case it would be worth to hoist these
- // two instruction out of the basic block. Let's treat this pattern as one
- // single cheap instruction here!
- WithOverflowInst *OverflowInst;
- if (match(I, m_ExtractValue<1>(m_OneUse(m_WithOverflowInst(OverflowInst))))) {
- ZeroCostInstructions.insert(OverflowInst);
- Cost += 1;
- } else if (!ZeroCostInstructions.contains(I))
- Cost += computeSpeculationCost(I, TTI);
-
- // Allow exactly one instruction to be speculated regardless of its cost
- // (as long as it is safe to do so).
- // This is intended to flatten the CFG even if the instruction is a division
- // or other expensive operation. The speculation of an expensive instruction
- // is expected to be undone in CodeGenPrepare if the speculation has not
- // enabled further IR optimizations.
- if (Cost > Budget &&
- (!SpeculateOneExpensiveInst || !AggressiveInsts.empty() || Depth > 0 ||
- !Cost.isValid()))
- return false;
-
- // Okay, we can only really hoist these out if their operands do
- // not take us over the cost threshold.
- for (Use &Op : I->operands())
- if (!dominatesMergePoint(Op, BB, InsertPt, AggressiveInsts, Cost, Budget,
- TTI, AC, ZeroCostInstructions, Depth + 1))
- return false;
- // Okay, it's safe to do this! Remember this instruction.
- AggressiveInsts.insert(I);
- return true;
-}
-
/// Extract ConstantInt from value, looking through IntToPtr
/// and PointerNullValue. Return NULL if value is not a constant int.
static ConstantInt *getConstantInt(Value *V, const DataLayout &DL) {
@@ -564,8 +303,6 @@ static ConstantInt *getConstantInt(Value *V, const DataLayout &DL) {
return nullptr;
}
-namespace {
-
/// Given a chain of or (||) or and (&&) comparison of a value against a
/// constant, this will try to recover the information required for a switch
/// structure.
@@ -577,7 +314,7 @@ namespace {
/// while for a chain of '&&' it will build the set elements that make the test
/// fail.
struct ConstantComparesGatherer {
- const DataLayout &DL;
+ const DataLayout *DL;
/// Value found for the switch comparison
Value *CompValue = nullptr;
@@ -599,7 +336,7 @@ struct ConstantComparesGatherer {
bool MultipleMatches = false;
/// Construct and compute the result for the comparison instruction Cond
- ConstantComparesGatherer(Instruction *Cond, const DataLayout &DL) : DL(DL) {
+ ConstantComparesGatherer(Instruction *Cond, const DataLayout &DL) : DL(&DL) {
gather(Cond);
if (CompValue || !MultipleMatches)
return;
@@ -613,6 +350,9 @@ struct ConstantComparesGatherer {
ConstantComparesGatherer(const ConstantComparesGatherer &) = delete;
ConstantComparesGatherer &
operator=(const ConstantComparesGatherer &) = delete;
+ ConstantComparesGatherer(ConstantComparesGatherer &&) = default;
+ ConstantComparesGatherer &
+ operator=(ConstantComparesGatherer &&) = default;
private:
/// Try to set the current value used for the comparison, it succeeds only if
@@ -654,7 +394,7 @@ struct ConstantComparesGatherer {
ICmpInst *ICI;
ConstantInt *C;
if (!((ICI = dyn_cast<ICmpInst>(I)) &&
- (C = getConstantInt(I->getOperand(1), DL)))) {
+ (C = getConstantInt(I->getOperand(1), *DL)))) {
return false;
}
@@ -748,101 +488,365 @@ struct ConstantComparesGatherer {
if (!setValueOnce(ICI->getOperand(0)))
return false;
- UsedICmps++;
- Vals.push_back(C);
+ UsedICmps++;
+ Vals.push_back(C);
+ return true;
+ }
+
+ // If we have "x ult 3", for example, then we can add 0,1,2 to the set.
+ ConstantRange Span =
+ ConstantRange::makeExactICmpRegion(ICI->getPredicate(), C->getValue());
+
+ // Shift the range if the compare is fed by an add. This is the range
+ // compare idiom as emitted by instcombine.
+ Value *CandidateVal = I->getOperand(0);
+ if (match(I->getOperand(0), m_Add(m_Value(RHSVal), m_APInt(RHSC)))) {
+ Span = Span.subtract(*RHSC);
+ CandidateVal = RHSVal;
+ }
+
+ // If this is an and/!= check, then we are looking to build the set of
+ // value that *don't* pass the and chain. I.e. to turn "x ugt 2" into
+ // x != 0 && x != 1.
+ if (!isEQ)
+ Span = Span.inverse();
+
+ // If there are a ton of values, we don't want to make a ginormous switch.
+ if (Span.isSizeLargerThan(8) || Span.isEmptySet()) {
+ return false;
+ }
+
+ // If we already have a value for the switch, it has to match!
+ if (!setValueOnce(CandidateVal))
+ return false;
+
+ // Add all values from the range to the set
+ APInt Tmp = Span.getLower();
+ do
+ Vals.push_back(ConstantInt::get(I->getContext(), Tmp));
+ while (++Tmp != Span.getUpper());
+
+ UsedICmps++;
+ return true;
+ }
+
+ /// Given a potentially 'or'd or 'and'd together collection of icmp
+ /// eq/ne/lt/gt instructions that compare a value against a constant, extract
+ /// the value being compared, and stick the list constants into the Vals
+ /// vector.
+ /// One "Extra" case is allowed to differ from the other.
+ void gather(Value *V) {
+ Value *Op0, *Op1;
+ if (match(V, m_LogicalOr(m_Value(Op0), m_Value(Op1))))
+ IsEq = true;
+ else if (match(V, m_LogicalAnd(m_Value(Op0), m_Value(Op1))))
+ IsEq = false;
+ else
+ return;
+ // Keep a stack (SmallVector for efficiency) for depth-first traversal
+ SmallVector<Value *, 8> DFT{Op0, Op1};
+ SmallPtrSet<Value *, 8> Visited{V, Op0, Op1};
+
+ while (!DFT.empty()) {
+ V = DFT.pop_back_val();
+
+ if (Instruction *I = dyn_cast<Instruction>(V)) {
+ // If it is a || (or && depending on isEQ), process the operands.
+ if (IsEq ? match(I, m_LogicalOr(m_Value(Op0), m_Value(Op1)))
+ : match(I, m_LogicalAnd(m_Value(Op0), m_Value(Op1)))) {
+ if (Visited.insert(Op1).second)
+ DFT.push_back(Op1);
+ if (Visited.insert(Op0).second)
+ DFT.push_back(Op0);
+
+ continue;
+ }
+
+ // Try to match the current instruction
+ if (matchInstruction(I, IsEq))
+ // Match succeed, continue the loop
+ continue;
+ }
+
+ // One element of the sequence of || (or &&) could not be match as a
+ // comparison against the same value as the others.
+ // We allow only one "Extra" case to be checked before the switch
+ if (!Extra) {
+ Extra = V;
+ continue;
+ }
+ // Failed to parse a proper sequence, abort now
+ CompValue = nullptr;
+ break;
+ }
+ }
+};
+
+class SimplifyCFGOpt {
+ const TargetTransformInfo &TTI;
+ DomTreeUpdater *DTU;
+ const DataLayout &DL;
+ ArrayRef<WeakVH> LoopHeaders;
+ const SimplifyCFGOptions &Options;
+ bool Resimplify;
+
+ Value *isValueEqualityComparison(Instruction *TI);
+ BasicBlock *getValueEqualityComparisonCases(
+ Instruction *TI, std::vector<ValueEqualityComparisonCase> &Cases);
+ bool simplifyEqualityComparisonWithOnlyPredecessor(Instruction *TI,
+ BasicBlock *Pred,
+ IRBuilder<> &Builder);
+ bool performValueComparisonIntoPredecessorFolding(Instruction *TI, Value *&CV,
+ Instruction *PTI,
+ IRBuilder<> &Builder);
+ bool foldValueComparisonIntoPredecessors(Instruction *TI,
+ IRBuilder<> &Builder);
+
+ bool simplifyResume(ResumeInst *RI, IRBuilder<> &Builder);
+ bool simplifySingleResume(ResumeInst *RI);
+ bool simplifyCommonResume(ResumeInst *RI);
+ bool simplifyCleanupReturn(CleanupReturnInst *RI);
+ bool simplifyUnreachable(UnreachableInst *UI);
+ bool simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder);
+ bool simplifyDuplicateSwitchArms(SwitchInst *SI, DomTreeUpdater *DTU);
+ bool simplifyIndirectBr(IndirectBrInst *IBI);
+ bool simplifyBranch(BranchInst *Branch, IRBuilder<> &Builder);
+ bool simplifyUncondBranch(BranchInst *BI, IRBuilder<> &Builder);
+ bool simplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder);
+ bool foldCondBranchOnValueKnownInPredecessor(BranchInst *BI);
+
+ bool tryToSimplifyUncondBranchWithICmpInIt(ICmpInst *ICI,
+ IRBuilder<> &Builder);
+ bool tryToSimplifyUncondBranchWithICmpSelectInIt(ICmpInst *ICI,
+ SelectInst *Select,
+ IRBuilder<> &Builder);
+ bool hoistCommonCodeFromSuccessors(Instruction *TI, bool AllInstsEqOnly);
+ bool hoistSuccIdenticalTerminatorToSwitchOrIf(
+ Instruction *TI, Instruction *I1,
+ SmallVectorImpl<Instruction *> &OtherSuccTIs,
+ ArrayRef<BasicBlock *> UniqueSuccessors);
+ bool speculativelyExecuteBB(BranchInst *BI, BasicBlock *ThenBB);
+ bool simplifyTerminatorOnSelect(Instruction *OldTerm, Value *Cond,
+ BasicBlock *TrueBB, BasicBlock *FalseBB,
+ uint32_t TrueWeight, uint32_t FalseWeight);
+ bool simplifyBranchOnICmpChain(BranchInst *BI, IRBuilder<> &Builder,
+ const DataLayout &DL,
+ std::optional<ConstantComparesGatherer>
+ ExistingConstantComparesGatherer = std::nullopt);
+ bool simplifyRetOfIcmpChain(ReturnInst &RI,
+ ...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
When an icmp chain directly leads to return (or select + return), convert it to a branch so that simplifyBranchOnICmpChain can convert it to a switch. Fixes: llvm#56087 Fixes: llvm#121262 Assisted-by: Claude Opus 4.5 via Github Copilot
bfa4bdd to
a4f14e0
Compare
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has some overlap with #169557 I think. Does that PR apply to your motivating case?
|
Good spot, thanks. That PR only applies to switch...case blocks / resulting icmps thereof that can be transformed to bitwise operations. It then does all the bit ops in IR. My PR transforms any suitable icmp chain into a switch. Backends can then lower that into bit ops if possible. So for operations that can be universally expressed as bit ops, that PR is indeed ideal. However, for all others, reordering an icmp chain to a switch is likely still beneficial as it's more accessible for other transformations / backend lowering. Tldr: there's overlap, but I think my transformation solves a more general case. I'm currently building the PR and will compare results. |
|
#169557 does not fix any of the missing optimizations from the original issue, no. Likely because the cases in question do not form a bitmask. |
When an icmp chain directly leads to return (or select + return), convert it to a branch so that
simplifyBranchOnICmpChaincan convert it to a switch.In practice, this allows us to optimize direct uses of a comparison chain into a switch - for example errno checks in #121262
I have used the inputs from #56087 as test cases.
Since I reuse the constructed
ConstantComparesGathererforsimplifyBranchOnICmpChain, I had to give it a move ctor & add it to the function signature ofsimplifyBranchOnICmpChain. This required reordering the class definition aboveSimplifyCFGOpt, and as a result the diff looks horrendeous.As a result, clang-format also fails on the moved-around parts.
Suggestions welcome?
Note that this PR was made with the assistance of an LLM to identify the ideal location in SimplifyCFG for this change.