Skip to content

Conversation

@gandhi56
Copy link
Contributor

Avoid inserting depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.

@gandhi56 gandhi56 self-assigned this Dec 14, 2025
@gandhi56 gandhi56 requested a review from nikic as a code owner December 14, 2025 22:43
@gandhi56 gandhi56 requested review from Copilot and removed request for nikic December 14, 2025 22:44
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Dec 14, 2025
@gandhi56 gandhi56 requested a review from nikic December 14, 2025 22:44
@llvmbot
Copy link
Member

llvmbot commented Dec 14, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Anshil Gandhi (gandhi56)

Changes

Avoid inserting depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.


Full diff: https://github.com/llvm/llvm-project/pull/172234.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+4-1)
  • (added) llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll (+223)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index e1f90264be7a2..518aba36c5e6d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1172,9 +1172,12 @@ const SCEV *ScalarEvolution::getTruncateExpr(const SCEV *Op, Type *Ty,
     return getTruncateOrZeroExtend(SZ->getOperand(), Ty, Depth + 1);
 
   if (Depth > MaxCastDepth) {
+    // Avoid caching depth-limited unsimplified results. Later queries at lower
+    // depth should be able to simplify (e.g., truncate of AddRec -> AddRec).
+    // Caching here would return this unsimplified SCEVTruncateExpr for future
+    // queries, preventing proper simplification.
     SCEV *S =
         new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
-    UniqueSCEVs.InsertNode(S, IP);
     registerUser(S, Op);
     return S;
   }
diff --git a/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
new file mode 100644
index 0000000000000..4f994f2458a36
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
@@ -0,0 +1,223 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt %s -passes="loop(loop-idiom,indvars,loop-deletion,loop-unroll-full)" -S | FileCheck %s
+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(i32 %conv5, i1 %cmp13, i1 %cmp20, i1 %cmp27, i1 %cmp34, i1 %cmp41) local_unnamed_addr {
+; CHECK-LABEL: define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(
+; CHECK-SAME: i32 [[CONV5:%.*]], i1 [[CMP13:%.*]], i1 [[CMP20:%.*]], i1 [[CMP27:%.*]], i1 [[CMP34:%.*]], i1 [[CMP41:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[CONV5]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND:.*]]
+; CHECK:       [[FOR_COND_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND]]
+; CHECK:       [[FOR_COND]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = zext i32 [[CONV5]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND2:.*]]
+; CHECK:       [[FOR_COND2]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_COND_CLEANUP14:.*]] ], [ 0, %[[FOR_COND]] ]
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV]], [[TMP2]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[FOR_COND9_PREHEADER:.*]], label %[[FOR_COND_LOOPEXIT]]
+; CHECK:       [[FOR_COND9_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND9:.*]]
+; CHECK:       [[FOR_COND9_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND9]]
+; CHECK:       [[FOR_COND9]]:
+; CHECK-NEXT:    br i1 [[CMP13]], label %[[FOR_COND16_PREHEADER:.*]], label %[[FOR_COND_CLEANUP14]]
+; CHECK:       [[FOR_COND16_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND16:.*]]
+; CHECK:       [[FOR_COND_CLEANUP14]]:
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    br label %[[FOR_COND2]]
+; CHECK:       [[FOR_COND16_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND16]]
+; CHECK:       [[FOR_COND16]]:
+; CHECK-NEXT:    br i1 [[CMP20]], label %[[FOR_COND23_PREHEADER:.*]], label %[[FOR_COND9_LOOPEXIT]]
+; CHECK:       [[FOR_COND23_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND23:.*]]
+; CHECK:       [[FOR_COND23_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND23]]
+; CHECK:       [[FOR_COND23]]:
+; CHECK-NEXT:    br i1 [[CMP27]], label %[[FOR_COND30_PREHEADER:.*]], label %[[FOR_COND16_LOOPEXIT]]
+; CHECK:       [[FOR_COND30_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND30:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND30_LOOPEXIT:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[FOR_COND30]]
+; CHECK:       [[FOR_COND30]]:
+; CHECK-NEXT:    br i1 [[CMP34]], label %[[FOR_COND37_PREHEADER:.*]], label %[[FOR_COND23_LOOPEXIT]]
+; CHECK:       [[FOR_COND37_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_BEGIN:.*]]
+; CHECK:       [[FOR_COND37_PEEL_BEGIN]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL:.*]]
+; CHECK:       [[FOR_COND37_PEEL]]:
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43_PEEL:.*]], label %[[FOR_COND30_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43_PEEL]]:
+; CHECK-NEXT:    [[CONV45_PEEL:%.*]] = zext i32 0 to i64
+; CHECK-NEXT:    [[CALL31_I_I_PEEL:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[MUL79_I_I_PEEL:%.*]] = mul i64 [[CALL31_I_I_PEEL]], [[INDVARS_IV]]
+; CHECK-NEXT:    [[DOTIDX1_PEEL:%.*]] = add i64 [[CONV45_PEEL]], [[MUL79_I_I_PEEL]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_PEEL:%.*]] = shl i64 [[DOTIDX1_PEEL]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV_PEEL:%.*]] = ashr exact i64 [[SUB_PTR_LHS_CAST_PEEL]], 1
+; CHECK-NEXT:    [[CMP55_PEEL:%.*]] = icmp sgt i64 0, 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55_PEEL]])
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT1:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT1]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PREHEADER_PEEL_NEWPH:.*]]
+; CHECK:       [[FOR_COND37_PREHEADER_PEEL_NEWPH]]:
+; CHECK-NEXT:    br label %[[FOR_COND37:.*]]
+; CHECK:       [[FOR_COND37]]:
+; CHECK-NEXT:    [[OFFSET_619:%.*]] = phi i64 [ [[SUB_PTR_DIV:%.*]], %[[FOR_BODY43:.*]] ], [ [[SUB_PTR_DIV_PEEL]], %[[FOR_COND37_PREHEADER_PEEL_NEWPH]] ]
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43]], label %[[FOR_COND30_LOOPEXIT_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43]]:
+; CHECK-NEXT:    [[CALL31_I_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I_I:%.*]] = add i64 [[INDVARS_IV]], [[CALL31_I_I]]
+; CHECK-NEXT:    [[MUL42_I_I:%.*]] = mul i64 [[TMP1]], [[ADD33_I_I]]
+; CHECK-NEXT:    [[ADD43_I_I:%.*]] = add i64 [[MUL42_I_I]], 1
+; CHECK-NEXT:    [[MUL52_I_I:%.*]] = mul i64 [[TMP1]], [[ADD43_I_I]]
+; CHECK-NEXT:    [[ADD53_I_I:%.*]] = add i64 [[MUL52_I_I]], 1
+; CHECK-NEXT:    [[MUL62_I_I:%.*]] = mul i64 [[TMP1]], [[ADD53_I_I]]
+; CHECK-NEXT:    [[ADD63_I_I:%.*]] = add i64 [[MUL62_I_I]], 1
+; CHECK-NEXT:    [[MUL72_I_I:%.*]] = mul i64 [[INDVARS_IV]], [[ADD63_I_I]]
+; CHECK-NEXT:    [[MUL79_I_I:%.*]] = mul i64 [[CALL31_I_I]], [[MUL72_I_I]]
+; CHECK-NEXT:    [[DOTIDX1:%.*]] = add i64 [[TMP1]], [[MUL79_I_I]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST:%.*]] = shl i64 [[DOTIDX1]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV]] = ashr exact i64 [[SUB_PTR_LHS_CAST]], 1
+; CHECK-NEXT:    [[CMP55:%.*]] = icmp sgt i64 [[OFFSET_619]], 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55]])
+; CHECK-NEXT:    br label %[[FOR_COND37]], !llvm.loop [[LOOP0:![0-9]+]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.cond2, %entry
+  br label %for.cond2
+
+for.cond2:                                        ; preds = %for.cond.cleanup14, %for.cond
+  %i5.0 = phi i32 [ 0, %for.cond ], [ %inc70, %for.cond.cleanup14 ]
+  %cmp6 = icmp ult i32 %i5.0, %conv5
+  br i1 %cmp6, label %for.cond9, label %for.cond
+
+for.cond9:                                        ; preds = %for.cond16, %for.cond2
+  br i1 %cmp13, label %for.cond16, label %for.cond.cleanup14
+
+for.cond.cleanup14:                               ; preds = %for.cond9
+  %inc70 = add i32 %i5.0, 1
+  br label %for.cond2
+
+for.cond16:                                       ; preds = %for.cond23, %for.cond9
+  br i1 %cmp20, label %for.cond23, label %for.cond9
+
+for.cond23:                                       ; preds = %for.cond30, %for.cond16
+  br i1 %cmp27, label %for.cond30, label %for.cond16
+
+for.cond30:                                       ; preds = %for.cond37, %for.cond23
+  br i1 %cmp34, label %for.cond37, label %for.cond23
+
+for.cond37:                                       ; preds = %for.body43, %for.cond30
+  %i0.018 = phi i32 [ %inc, %for.body43 ], [ 0, %for.cond30 ]
+  %offset.619 = phi i64 [ %sub.ptr.div, %for.body43 ], [ 0, %for.cond30 ]
+  br i1 %cmp41, label %for.body43, label %for.cond30
+
+for.body43:                                       ; preds = %for.cond37
+  %conv45 = zext i32 %i0.018 to i64
+  %conv50 = zext i32 %i5.0 to i64
+  %call31.i.i = load volatile i64, ptr null, align 8
+  %add33.i.i = add i64 %conv50, %call31.i.i
+  %mul42.i.i = mul i64 %conv45, %add33.i.i
+  %add43.i.i = add i64 %mul42.i.i, 1
+  %mul52.i.i = mul i64 %conv45, %add43.i.i
+  %add53.i.i = add i64 %mul52.i.i, 1
+  %mul62.i.i = mul i64 %conv45, %add53.i.i
+  %add63.i.i = add i64 %mul62.i.i, 1
+  %mul72.i.i = mul i64 %conv50, %add63.i.i
+  %mul79.i.i = mul i64 %call31.i.i, %mul72.i.i
+  %.idx1 = add i64 %conv45, %mul79.i.i
+  %sub.ptr.lhs.cast = shl i64 %.idx1, 2
+  %sub.ptr.div = ashr exact i64 %sub.ptr.lhs.cast, 1
+  %cmp55 = icmp sgt i64 %offset.619, 0
+  call void @llvm.assume(i1 %cmp55)
+  %inc = add i32 %conv5, 1
+  br label %for.cond37
+}
+
+define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(i64 %i0, i64 %i5) local_unnamed_addr {
+; CHECK-LABEL: define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(
+; CHECK-SAME: i64 [[I0:%.*]], i64 [[I5:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I:%.*]] = add i64 [[I5]], [[CALL31_I]]
+; CHECK-NEXT:    [[MUL42_I:%.*]] = mul i64 [[I0]], [[ADD33_I]]
+; CHECK-NEXT:    [[ADD43_I:%.*]] = add i64 [[MUL42_I]], 1
+; CHECK-NEXT:    [[MUL52_I:%.*]] = mul i64 [[I0]], [[ADD43_I]]
+; CHECK-NEXT:    [[ADD53_I:%.*]] = add i64 [[MUL52_I]], 1
+; CHECK-NEXT:    [[MUL62_I:%.*]] = mul i64 [[I0]], [[ADD53_I]]
+; CHECK-NEXT:    [[ADD63_I:%.*]] = add i64 [[MUL62_I]], 1
+; CHECK-NEXT:    [[MUL72_I:%.*]] = mul i64 [[I5]], [[ADD63_I]]
+; CHECK-NEXT:    [[MUL79_I:%.*]] = mul i64 [[CALL31_I]], [[MUL72_I]]
+; CHECK-NEXT:    [[ADD80_I:%.*]] = add i64 [[I0]], [[MUL79_I]]
+; CHECK-NEXT:    [[ARRAYIDX_I:%.*]] = getelementptr i32, ptr null, i64 [[ADD80_I]]
+; CHECK-NEXT:    ret ptr [[ARRAYIDX_I]]
+;
+entry:
+  %call31.i = load volatile i64, ptr null, align 8
+  %add33.i = add i64 %i5, %call31.i
+  %mul42.i = mul i64 %i0, %add33.i
+  %add43.i = add i64 %mul42.i, 1
+  %mul52.i = mul i64 %i0, %add43.i
+  %add53.i = add i64 %mul52.i, 1
+  %mul62.i = mul i64 %i0, %add53.i
+  %add63.i = add i64 %mul62.i, 1
+  %mul72.i = mul i64 %i5, %add63.i
+  %mul79.i = mul i64 %call31.i, %mul72.i
+  %add80.i = add i64 %i0, %mul79.i
+  %arrayidx.i = getelementptr i32, ptr null, i64 %add80.i
+  ret ptr %arrayidx.i
+}
+
+define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(i64 %index_offsets, i64 %index_offsets9) local_unnamed_addr {
+; CHECK-LABEL: define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(
+; CHECK-SAME: i64 [[INDEX_OFFSETS:%.*]], i64 [[INDEX_OFFSETS9:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33:%.*]] = add i64 [[INDEX_OFFSETS9]], [[CALL31]]
+; CHECK-NEXT:    [[MUL42:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD33]]
+; CHECK-NEXT:    [[ADD43:%.*]] = add i64 [[MUL42]], 1
+; CHECK-NEXT:    [[MUL52:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD43]]
+; CHECK-NEXT:    [[ADD53:%.*]] = add i64 [[MUL52]], 1
+; CHECK-NEXT:    [[MUL62:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD53]]
+; CHECK-NEXT:    [[ADD63:%.*]] = add i64 [[MUL62]], 1
+; CHECK-NEXT:    [[MUL72:%.*]] = mul i64 [[INDEX_OFFSETS9]], [[ADD63]]
+; CHECK-NEXT:    [[MUL79:%.*]] = mul i64 [[CALL31]], [[MUL72]]
+; CHECK-NEXT:    [[ADD80:%.*]] = add i64 [[INDEX_OFFSETS]], [[MUL79]]
+; CHECK-NEXT:    ret i64 [[ADD80]]
+;
+entry:
+  %call31 = load volatile i64, ptr null, align 8
+  %add33 = add i64 %index_offsets9, %call31
+  %mul42 = mul i64 %index_offsets, %add33
+  %add43 = add i64 %mul42, 1
+  %mul52 = mul i64 %index_offsets, %add43
+  %add53 = add i64 %mul52, 1
+  %mul62 = mul i64 %index_offsets, %add53
+  %add63 = add i64 %mul62, 1
+  %mul72 = mul i64 %index_offsets9, %add63
+  %mul79 = mul i64 %call31, %mul72
+  %add80 = add i64 %index_offsets, %mul79
+  ret i64 %add80
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
+declare void @llvm.assume(i1 noundef) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.peeled.count", i32 1}
+;.

@llvmbot
Copy link
Member

llvmbot commented Dec 14, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Anshil Gandhi (gandhi56)

Changes

Avoid inserting depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.


Full diff: https://github.com/llvm/llvm-project/pull/172234.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+4-1)
  • (added) llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll (+223)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index e1f90264be7a2..518aba36c5e6d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1172,9 +1172,12 @@ const SCEV *ScalarEvolution::getTruncateExpr(const SCEV *Op, Type *Ty,
     return getTruncateOrZeroExtend(SZ->getOperand(), Ty, Depth + 1);
 
   if (Depth > MaxCastDepth) {
+    // Avoid caching depth-limited unsimplified results. Later queries at lower
+    // depth should be able to simplify (e.g., truncate of AddRec -> AddRec).
+    // Caching here would return this unsimplified SCEVTruncateExpr for future
+    // queries, preventing proper simplification.
     SCEV *S =
         new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
-    UniqueSCEVs.InsertNode(S, IP);
     registerUser(S, Op);
     return S;
   }
diff --git a/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
new file mode 100644
index 0000000000000..4f994f2458a36
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
@@ -0,0 +1,223 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt %s -passes="loop(loop-idiom,indvars,loop-deletion,loop-unroll-full)" -S | FileCheck %s
+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(i32 %conv5, i1 %cmp13, i1 %cmp20, i1 %cmp27, i1 %cmp34, i1 %cmp41) local_unnamed_addr {
+; CHECK-LABEL: define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(
+; CHECK-SAME: i32 [[CONV5:%.*]], i1 [[CMP13:%.*]], i1 [[CMP20:%.*]], i1 [[CMP27:%.*]], i1 [[CMP34:%.*]], i1 [[CMP41:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[CONV5]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND:.*]]
+; CHECK:       [[FOR_COND_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND]]
+; CHECK:       [[FOR_COND]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = zext i32 [[CONV5]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND2:.*]]
+; CHECK:       [[FOR_COND2]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_COND_CLEANUP14:.*]] ], [ 0, %[[FOR_COND]] ]
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV]], [[TMP2]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[FOR_COND9_PREHEADER:.*]], label %[[FOR_COND_LOOPEXIT]]
+; CHECK:       [[FOR_COND9_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND9:.*]]
+; CHECK:       [[FOR_COND9_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND9]]
+; CHECK:       [[FOR_COND9]]:
+; CHECK-NEXT:    br i1 [[CMP13]], label %[[FOR_COND16_PREHEADER:.*]], label %[[FOR_COND_CLEANUP14]]
+; CHECK:       [[FOR_COND16_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND16:.*]]
+; CHECK:       [[FOR_COND_CLEANUP14]]:
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    br label %[[FOR_COND2]]
+; CHECK:       [[FOR_COND16_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND16]]
+; CHECK:       [[FOR_COND16]]:
+; CHECK-NEXT:    br i1 [[CMP20]], label %[[FOR_COND23_PREHEADER:.*]], label %[[FOR_COND9_LOOPEXIT]]
+; CHECK:       [[FOR_COND23_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND23:.*]]
+; CHECK:       [[FOR_COND23_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND23]]
+; CHECK:       [[FOR_COND23]]:
+; CHECK-NEXT:    br i1 [[CMP27]], label %[[FOR_COND30_PREHEADER:.*]], label %[[FOR_COND16_LOOPEXIT]]
+; CHECK:       [[FOR_COND30_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND30:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND30_LOOPEXIT:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[FOR_COND30]]
+; CHECK:       [[FOR_COND30]]:
+; CHECK-NEXT:    br i1 [[CMP34]], label %[[FOR_COND37_PREHEADER:.*]], label %[[FOR_COND23_LOOPEXIT]]
+; CHECK:       [[FOR_COND37_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_BEGIN:.*]]
+; CHECK:       [[FOR_COND37_PEEL_BEGIN]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL:.*]]
+; CHECK:       [[FOR_COND37_PEEL]]:
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43_PEEL:.*]], label %[[FOR_COND30_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43_PEEL]]:
+; CHECK-NEXT:    [[CONV45_PEEL:%.*]] = zext i32 0 to i64
+; CHECK-NEXT:    [[CALL31_I_I_PEEL:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[MUL79_I_I_PEEL:%.*]] = mul i64 [[CALL31_I_I_PEEL]], [[INDVARS_IV]]
+; CHECK-NEXT:    [[DOTIDX1_PEEL:%.*]] = add i64 [[CONV45_PEEL]], [[MUL79_I_I_PEEL]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_PEEL:%.*]] = shl i64 [[DOTIDX1_PEEL]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV_PEEL:%.*]] = ashr exact i64 [[SUB_PTR_LHS_CAST_PEEL]], 1
+; CHECK-NEXT:    [[CMP55_PEEL:%.*]] = icmp sgt i64 0, 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55_PEEL]])
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT1:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT1]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PREHEADER_PEEL_NEWPH:.*]]
+; CHECK:       [[FOR_COND37_PREHEADER_PEEL_NEWPH]]:
+; CHECK-NEXT:    br label %[[FOR_COND37:.*]]
+; CHECK:       [[FOR_COND37]]:
+; CHECK-NEXT:    [[OFFSET_619:%.*]] = phi i64 [ [[SUB_PTR_DIV:%.*]], %[[FOR_BODY43:.*]] ], [ [[SUB_PTR_DIV_PEEL]], %[[FOR_COND37_PREHEADER_PEEL_NEWPH]] ]
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43]], label %[[FOR_COND30_LOOPEXIT_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43]]:
+; CHECK-NEXT:    [[CALL31_I_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I_I:%.*]] = add i64 [[INDVARS_IV]], [[CALL31_I_I]]
+; CHECK-NEXT:    [[MUL42_I_I:%.*]] = mul i64 [[TMP1]], [[ADD33_I_I]]
+; CHECK-NEXT:    [[ADD43_I_I:%.*]] = add i64 [[MUL42_I_I]], 1
+; CHECK-NEXT:    [[MUL52_I_I:%.*]] = mul i64 [[TMP1]], [[ADD43_I_I]]
+; CHECK-NEXT:    [[ADD53_I_I:%.*]] = add i64 [[MUL52_I_I]], 1
+; CHECK-NEXT:    [[MUL62_I_I:%.*]] = mul i64 [[TMP1]], [[ADD53_I_I]]
+; CHECK-NEXT:    [[ADD63_I_I:%.*]] = add i64 [[MUL62_I_I]], 1
+; CHECK-NEXT:    [[MUL72_I_I:%.*]] = mul i64 [[INDVARS_IV]], [[ADD63_I_I]]
+; CHECK-NEXT:    [[MUL79_I_I:%.*]] = mul i64 [[CALL31_I_I]], [[MUL72_I_I]]
+; CHECK-NEXT:    [[DOTIDX1:%.*]] = add i64 [[TMP1]], [[MUL79_I_I]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST:%.*]] = shl i64 [[DOTIDX1]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV]] = ashr exact i64 [[SUB_PTR_LHS_CAST]], 1
+; CHECK-NEXT:    [[CMP55:%.*]] = icmp sgt i64 [[OFFSET_619]], 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55]])
+; CHECK-NEXT:    br label %[[FOR_COND37]], !llvm.loop [[LOOP0:![0-9]+]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.cond2, %entry
+  br label %for.cond2
+
+for.cond2:                                        ; preds = %for.cond.cleanup14, %for.cond
+  %i5.0 = phi i32 [ 0, %for.cond ], [ %inc70, %for.cond.cleanup14 ]
+  %cmp6 = icmp ult i32 %i5.0, %conv5
+  br i1 %cmp6, label %for.cond9, label %for.cond
+
+for.cond9:                                        ; preds = %for.cond16, %for.cond2
+  br i1 %cmp13, label %for.cond16, label %for.cond.cleanup14
+
+for.cond.cleanup14:                               ; preds = %for.cond9
+  %inc70 = add i32 %i5.0, 1
+  br label %for.cond2
+
+for.cond16:                                       ; preds = %for.cond23, %for.cond9
+  br i1 %cmp20, label %for.cond23, label %for.cond9
+
+for.cond23:                                       ; preds = %for.cond30, %for.cond16
+  br i1 %cmp27, label %for.cond30, label %for.cond16
+
+for.cond30:                                       ; preds = %for.cond37, %for.cond23
+  br i1 %cmp34, label %for.cond37, label %for.cond23
+
+for.cond37:                                       ; preds = %for.body43, %for.cond30
+  %i0.018 = phi i32 [ %inc, %for.body43 ], [ 0, %for.cond30 ]
+  %offset.619 = phi i64 [ %sub.ptr.div, %for.body43 ], [ 0, %for.cond30 ]
+  br i1 %cmp41, label %for.body43, label %for.cond30
+
+for.body43:                                       ; preds = %for.cond37
+  %conv45 = zext i32 %i0.018 to i64
+  %conv50 = zext i32 %i5.0 to i64
+  %call31.i.i = load volatile i64, ptr null, align 8
+  %add33.i.i = add i64 %conv50, %call31.i.i
+  %mul42.i.i = mul i64 %conv45, %add33.i.i
+  %add43.i.i = add i64 %mul42.i.i, 1
+  %mul52.i.i = mul i64 %conv45, %add43.i.i
+  %add53.i.i = add i64 %mul52.i.i, 1
+  %mul62.i.i = mul i64 %conv45, %add53.i.i
+  %add63.i.i = add i64 %mul62.i.i, 1
+  %mul72.i.i = mul i64 %conv50, %add63.i.i
+  %mul79.i.i = mul i64 %call31.i.i, %mul72.i.i
+  %.idx1 = add i64 %conv45, %mul79.i.i
+  %sub.ptr.lhs.cast = shl i64 %.idx1, 2
+  %sub.ptr.div = ashr exact i64 %sub.ptr.lhs.cast, 1
+  %cmp55 = icmp sgt i64 %offset.619, 0
+  call void @llvm.assume(i1 %cmp55)
+  %inc = add i32 %conv5, 1
+  br label %for.cond37
+}
+
+define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(i64 %i0, i64 %i5) local_unnamed_addr {
+; CHECK-LABEL: define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(
+; CHECK-SAME: i64 [[I0:%.*]], i64 [[I5:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I:%.*]] = add i64 [[I5]], [[CALL31_I]]
+; CHECK-NEXT:    [[MUL42_I:%.*]] = mul i64 [[I0]], [[ADD33_I]]
+; CHECK-NEXT:    [[ADD43_I:%.*]] = add i64 [[MUL42_I]], 1
+; CHECK-NEXT:    [[MUL52_I:%.*]] = mul i64 [[I0]], [[ADD43_I]]
+; CHECK-NEXT:    [[ADD53_I:%.*]] = add i64 [[MUL52_I]], 1
+; CHECK-NEXT:    [[MUL62_I:%.*]] = mul i64 [[I0]], [[ADD53_I]]
+; CHECK-NEXT:    [[ADD63_I:%.*]] = add i64 [[MUL62_I]], 1
+; CHECK-NEXT:    [[MUL72_I:%.*]] = mul i64 [[I5]], [[ADD63_I]]
+; CHECK-NEXT:    [[MUL79_I:%.*]] = mul i64 [[CALL31_I]], [[MUL72_I]]
+; CHECK-NEXT:    [[ADD80_I:%.*]] = add i64 [[I0]], [[MUL79_I]]
+; CHECK-NEXT:    [[ARRAYIDX_I:%.*]] = getelementptr i32, ptr null, i64 [[ADD80_I]]
+; CHECK-NEXT:    ret ptr [[ARRAYIDX_I]]
+;
+entry:
+  %call31.i = load volatile i64, ptr null, align 8
+  %add33.i = add i64 %i5, %call31.i
+  %mul42.i = mul i64 %i0, %add33.i
+  %add43.i = add i64 %mul42.i, 1
+  %mul52.i = mul i64 %i0, %add43.i
+  %add53.i = add i64 %mul52.i, 1
+  %mul62.i = mul i64 %i0, %add53.i
+  %add63.i = add i64 %mul62.i, 1
+  %mul72.i = mul i64 %i5, %add63.i
+  %mul79.i = mul i64 %call31.i, %mul72.i
+  %add80.i = add i64 %i0, %mul79.i
+  %arrayidx.i = getelementptr i32, ptr null, i64 %add80.i
+  ret ptr %arrayidx.i
+}
+
+define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(i64 %index_offsets, i64 %index_offsets9) local_unnamed_addr {
+; CHECK-LABEL: define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(
+; CHECK-SAME: i64 [[INDEX_OFFSETS:%.*]], i64 [[INDEX_OFFSETS9:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33:%.*]] = add i64 [[INDEX_OFFSETS9]], [[CALL31]]
+; CHECK-NEXT:    [[MUL42:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD33]]
+; CHECK-NEXT:    [[ADD43:%.*]] = add i64 [[MUL42]], 1
+; CHECK-NEXT:    [[MUL52:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD43]]
+; CHECK-NEXT:    [[ADD53:%.*]] = add i64 [[MUL52]], 1
+; CHECK-NEXT:    [[MUL62:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD53]]
+; CHECK-NEXT:    [[ADD63:%.*]] = add i64 [[MUL62]], 1
+; CHECK-NEXT:    [[MUL72:%.*]] = mul i64 [[INDEX_OFFSETS9]], [[ADD63]]
+; CHECK-NEXT:    [[MUL79:%.*]] = mul i64 [[CALL31]], [[MUL72]]
+; CHECK-NEXT:    [[ADD80:%.*]] = add i64 [[INDEX_OFFSETS]], [[MUL79]]
+; CHECK-NEXT:    ret i64 [[ADD80]]
+;
+entry:
+  %call31 = load volatile i64, ptr null, align 8
+  %add33 = add i64 %index_offsets9, %call31
+  %mul42 = mul i64 %index_offsets, %add33
+  %add43 = add i64 %mul42, 1
+  %mul52 = mul i64 %index_offsets, %add43
+  %add53 = add i64 %mul52, 1
+  %mul62 = mul i64 %index_offsets, %add53
+  %add63 = add i64 %mul62, 1
+  %mul72 = mul i64 %index_offsets9, %add63
+  %mul79 = mul i64 %call31, %mul72
+  %add80 = add i64 %index_offsets, %mul79
+  ret i64 %add80
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
+declare void @llvm.assume(i1 noundef) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.peeled.count", i32 1}
+;.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a cache poisoning issue in SCEV where depth-limited unsimplified truncate expressions were being cached, preventing later queries at lower depths from properly simplifying expressions (e.g., preventing truncate of AddRecExpr from becoming AddRecExpr).

  • Removes caching of depth-limited SCEVTruncateExpr results in getTruncateExpr
  • Adds comprehensive test case demonstrating the fix prevents assertion failures in IndVarSimplify

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
llvm/lib/Analysis/ScalarEvolution.cpp Removes UniqueSCEVs.InsertNode(S, IP) call for depth-limited truncate expressions to prevent cache poisoning
llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll Adds regression test with nested loops that previously triggered assertion failure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1174 to 1183
if (Depth > MaxCastDepth) {
// Avoid caching depth-limited unsimplified results. Later queries at lower
// depth should be able to simplify (e.g., truncate of AddRec -> AddRec).
// Caching here would return this unsimplified SCEVTruncateExpr for future
// queries, preventing proper simplification.
SCEV *S =
new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
UniqueSCEVs.InsertNode(S, IP);
registerUser(S, Op);
return S;
}
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix removes caching of depth-limited truncate expressions to prevent cache poisoning. However, the same caching issue exists in getZeroExtendExprImpl (line 1603) and getSignExtendExprImpl (line 1943), where depth-limited unsimplified zero/sign extend expressions are still being inserted into UniqueSCEVs cache. This inconsistency means that while truncate expressions are now properly handled, zero and sign extend expressions can still cause the same cache poisoning problem, where a depth-limited unsimplified expression prevents later queries at lower depth from properly simplifying.

Copilot uses AI. Check for mistakes.
@gandhi56 gandhi56 force-pushed the scev/avoid-caching-unsimplified-trunc branch from c71fae1 to c8d18ae Compare December 14, 2025 23:08
Don't insert depth-limited unsimplified SCEVTruncateExpr into the
UniqueSCEVs cache. Caching these prevents later queries at lower depth
from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr
on an AddRecExpr returned a cached SCEVTruncateExpr instead of the
expected SCEVAddRecExpr.
@gandhi56 gandhi56 force-pushed the scev/avoid-caching-unsimplified-trunc branch from c8d18ae to 5e1e5d0 Compare December 14, 2025 23:09
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reference the issue you are fixing. I remember there being one for this.

I initially thought that this change may cause compile-time issues for the pathological cases the depth limit is intended to address, but after considering more carefully, I don't think it will really affect them. The cache will only be hit when using the argument after the recursion has already happened. (We have a different cache that will use the original argument, but that's not used for trunc.)

@@ -0,0 +1,223 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
; RUN: opt %s -passes="loop(loop-idiom,indvars,loop-deletion,loop-unroll-full)" -S | FileCheck %s
; REQUIRES: asserts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not need asserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an assertion in IndVarSimplify to ensure that the truncated expression is a SCEVAddRecExpr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue involved the following assertion to fail:

assert(ExitCnt->getType()->isPointerTy() ==

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need to add REQUIRES: asserts if the test will fail in a build without assertions, e.g. because you're using -debug. It's not intended for tests that happened to assert in the past.

; REQUIRES: asserts

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use triple.

ret ptr %arrayidx.i
}

define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(i64 %index_offsets, i64 %index_offsets9) local_unnamed_addr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like a reduced test case -- these functions don't look relevant.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but thinking about this again, this change is going to break SCEV uniquing. It means that we will be creating this node multiple times, breaking equality comparisons -- and creating a memory leak. If the query that hits this depth limit is run many times, it's going to create a new, different SCEV node every time, and these will never get cleaned up. I don't think we can do this.

@nikic
Copy link
Contributor

nikic commented Dec 15, 2025

I think the right way to fix the IndVarSimplify issue is to not assume the truncation is going to return a specific type. If it does not return an addrec, we can use a fallback implementation that truncates at the end instead of the start. That's less efficient but shouldn't really matter for this edge case.

@nikic nikic requested review from efriedma-quic and fhahn December 15, 2025 09:04
@gandhi56
Copy link
Contributor Author

Please reference the issue you are fixing. I remember there being one for this.

This issue showed up during a kokkos build. Upon a quick iteration over github issues, I didn't find a related issue. I will mention them as soon as I find one.

@nikic
Copy link
Contributor

nikic commented Dec 15, 2025

Please reference the issue you are fixing. I remember there being one for this.

This issue showed up during a kokkos build. Upon a quick iteration over github issues, I didn't find a related issue. I will mention them as soon as I find one.

I think it may be #153090.

@efriedma-quic
Copy link
Collaborator

We can force an addrec with truncated operands by expanding out the code to create it: instead of truncating the AddRec itself, truncate the operands of the AddRec. (Assuming the step is non-zero after we truncate it, which it is in this case because we assert it's exactly one.)

Long-term I suspect we just don't want to limit depth the way we do. We should use iterative algorithms to avoid stack overflow... and if computations are still too expensive at that point, we should consider adding explicit markers to the IR to limit SCEV depth.

@gandhi56
Copy link
Contributor Author

I think the right way to fix the IndVarSimplify issue is to not assume the truncation is going to return a specific type. If it does not return an addrec, we can use a fallback implementation that truncates at the end instead of the start. That's less efficient but shouldn't really matter for this edge case.

I would prefer to perform the truncation within the SCEV framework instead of IndVarSimplify. This way, all consumers of SCEV will only need to handle canonical SCEV expressions. Furthermore, this problem of caching un-simplified expressions occur in other parts of the pass as well, such as getZeroExtendExprImpl and getSignExtendExprImpl. So handling this corner case in IndVarSimplify may resolve this bug, but there will still be other bugs lying around due to unsimplified zext/sext expressions.

Long-term I suspect we just don't want to limit depth the way we do. We should use iterative algorithms to avoid stack overflow... and if computations are still too expensive at that point, we should consider adding explicit markers to the IR to limit SCEV depth.

Converting the SCEV recursive algorithm into an iterative algorithm seems worth a chance.

@gandhi56
Copy link
Contributor Author

Hm, but thinking about this again, this change is going to break SCEV uniquing. It means that we will be creating this node multiple times, breaking equality comparisons -- and creating a memory leak. If the query that hits this depth limit is run many times, it's going to create a new, different SCEV node every time, and these will never get cleaned up. I don't think we can do this.

This memory leak issue can be avoided via an iterative algorithm, by simply caching expressions which are important and ensuring that the worklist is cleared up before the algorithm terminates.

@nikic
Copy link
Contributor

nikic commented Dec 16, 2025

We can force an addrec with truncated operands by expanding out the code to create it: instead of truncating the AddRec itself, truncate the operands of the AddRec. (Assuming the step is non-zero after we truncate it, which it is in this case because we assert it's exactly one.)

Another option would be to only check for an existing node after trying to push the trunc into the addrec. It should be fine to perform any unconditional transforms before checking for an existing node. We just shouldn't perform conditional ones (like the add/mul variant) before checking to avoid unnecessary work.

Long-term I suspect we just don't want to limit depth the way we do. We should use iterative algorithms to avoid stack overflow... and if computations are still too expensive at that point, we should consider adding explicit markers to the IR to limit SCEV depth.

Not sure I follow what you have in mind here. How would such an IR marker look like?

@efriedma-quic
Copy link
Collaborator

Not sure I follow what you have in mind here. How would such an IR marker look like?

The depth limit sort of serves two purposes: one is avoiding stack overflow, but the other is just to limit the total amount of work we do in various transforms. The marker is targeted at the latter.

i32 @llvm.scev.marker(i32) just returns the argument... but SCEV treats it as SCEVUnknown. When we detect SCEVs with a high depth, we insert this to reduce the total depth. Or something like that. This allows limiting complex SCEVs while still ensuring SCEV computations are reproducible: the cutoff is always in the same place. I'm not completely sure how we'd go about inserting the markers.

The underlying idea is that SCEV operations become complex mostly due to pulling in trees of irrelevant arithmetic; if we hide the complexity, we get the clean result we want. (This is sort of inspired by some work I did in polly... although the penalties for overly complex expressions are a lot more extreme there.)

Having this in the IR isn't great, maybe, but I can't think of a better way to represent it. We could maybe keep it in SCEV itself, but I'm not sure how we ensure we choose the location of the barrier consistently if it's stored in a transient datastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants