[SCEV] Fix cache poisoning from depth-limited truncate expressions #172234

gandhi56 · 2025-12-14T22:43:53Z

Avoid inserting depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.

llvmbot · 2025-12-14T22:44:22Z

@llvm/pr-subscribers-llvm-transforms

Author: Anshil Gandhi (gandhi56)

Changes

Avoid inserting depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.

Full diff: https://github.com/llvm/llvm-project/pull/172234.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+4-1)
(added) llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll (+223)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index e1f90264be7a2..518aba36c5e6d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1172,9 +1172,12 @@ const SCEV *ScalarEvolution::getTruncateExpr(const SCEV *Op, Type *Ty,
     return getTruncateOrZeroExtend(SZ->getOperand(), Ty, Depth + 1);
 
   if (Depth > MaxCastDepth) {
+    // Avoid caching depth-limited unsimplified results. Later queries at lower
+    // depth should be able to simplify (e.g., truncate of AddRec -> AddRec).
+    // Caching here would return this unsimplified SCEVTruncateExpr for future
+    // queries, preventing proper simplification.
     SCEV *S =
         new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
-    UniqueSCEVs.InsertNode(S, IP);
     registerUser(S, Op);
     return S;
   }
diff --git a/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
new file mode 100644
index 0000000000000..4f994f2458a36
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
@@ -0,0 +1,223 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt %s -passes="loop(loop-idiom,indvars,loop-deletion,loop-unroll-full)" -S | FileCheck %s
+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(i32 %conv5, i1 %cmp13, i1 %cmp20, i1 %cmp27, i1 %cmp34, i1 %cmp41) local_unnamed_addr {
+; CHECK-LABEL: define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(
+; CHECK-SAME: i32 [[CONV5:%.*]], i1 [[CMP13:%.*]], i1 [[CMP20:%.*]], i1 [[CMP27:%.*]], i1 [[CMP34:%.*]], i1 [[CMP41:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[CONV5]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND:.*]]
+; CHECK:       [[FOR_COND_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND]]
+; CHECK:       [[FOR_COND]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = zext i32 [[CONV5]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND2:.*]]
+; CHECK:       [[FOR_COND2]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_COND_CLEANUP14:.*]] ], [ 0, %[[FOR_COND]] ]
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV]], [[TMP2]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[FOR_COND9_PREHEADER:.*]], label %[[FOR_COND_LOOPEXIT]]
+; CHECK:       [[FOR_COND9_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND9:.*]]
+; CHECK:       [[FOR_COND9_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND9]]
+; CHECK:       [[FOR_COND9]]:
+; CHECK-NEXT:    br i1 [[CMP13]], label %[[FOR_COND16_PREHEADER:.*]], label %[[FOR_COND_CLEANUP14]]
+; CHECK:       [[FOR_COND16_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND16:.*]]
+; CHECK:       [[FOR_COND_CLEANUP14]]:
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    br label %[[FOR_COND2]]
+; CHECK:       [[FOR_COND16_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND16]]
+; CHECK:       [[FOR_COND16]]:
+; CHECK-NEXT:    br i1 [[CMP20]], label %[[FOR_COND23_PREHEADER:.*]], label %[[FOR_COND9_LOOPEXIT]]
+; CHECK:       [[FOR_COND23_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND23:.*]]
+; CHECK:       [[FOR_COND23_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND23]]
+; CHECK:       [[FOR_COND23]]:
+; CHECK-NEXT:    br i1 [[CMP27]], label %[[FOR_COND30_PREHEADER:.*]], label %[[FOR_COND16_LOOPEXIT]]
+; CHECK:       [[FOR_COND30_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND30:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND30_LOOPEXIT:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[FOR_COND30]]
+; CHECK:       [[FOR_COND30]]:
+; CHECK-NEXT:    br i1 [[CMP34]], label %[[FOR_COND37_PREHEADER:.*]], label %[[FOR_COND23_LOOPEXIT]]
+; CHECK:       [[FOR_COND37_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_BEGIN:.*]]
+; CHECK:       [[FOR_COND37_PEEL_BEGIN]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL:.*]]
+; CHECK:       [[FOR_COND37_PEEL]]:
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43_PEEL:.*]], label %[[FOR_COND30_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43_PEEL]]:
+; CHECK-NEXT:    [[CONV45_PEEL:%.*]] = zext i32 0 to i64
+; CHECK-NEXT:    [[CALL31_I_I_PEEL:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[MUL79_I_I_PEEL:%.*]] = mul i64 [[CALL31_I_I_PEEL]], [[INDVARS_IV]]
+; CHECK-NEXT:    [[DOTIDX1_PEEL:%.*]] = add i64 [[CONV45_PEEL]], [[MUL79_I_I_PEEL]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_PEEL:%.*]] = shl i64 [[DOTIDX1_PEEL]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV_PEEL:%.*]] = ashr exact i64 [[SUB_PTR_LHS_CAST_PEEL]], 1
+; CHECK-NEXT:    [[CMP55_PEEL:%.*]] = icmp sgt i64 0, 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55_PEEL]])
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT1:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT1]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PREHEADER_PEEL_NEWPH:.*]]
+; CHECK:       [[FOR_COND37_PREHEADER_PEEL_NEWPH]]:
+; CHECK-NEXT:    br label %[[FOR_COND37:.*]]
+; CHECK:       [[FOR_COND37]]:
+; CHECK-NEXT:    [[OFFSET_619:%.*]] = phi i64 [ [[SUB_PTR_DIV:%.*]], %[[FOR_BODY43:.*]] ], [ [[SUB_PTR_DIV_PEEL]], %[[FOR_COND37_PREHEADER_PEEL_NEWPH]] ]
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43]], label %[[FOR_COND30_LOOPEXIT_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43]]:
+; CHECK-NEXT:    [[CALL31_I_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I_I:%.*]] = add i64 [[INDVARS_IV]], [[CALL31_I_I]]
+; CHECK-NEXT:    [[MUL42_I_I:%.*]] = mul i64 [[TMP1]], [[ADD33_I_I]]
+; CHECK-NEXT:    [[ADD43_I_I:%.*]] = add i64 [[MUL42_I_I]], 1
+; CHECK-NEXT:    [[MUL52_I_I:%.*]] = mul i64 [[TMP1]], [[ADD43_I_I]]
+; CHECK-NEXT:    [[ADD53_I_I:%.*]] = add i64 [[MUL52_I_I]], 1
+; CHECK-NEXT:    [[MUL62_I_I:%.*]] = mul i64 [[TMP1]], [[ADD53_I_I]]
+; CHECK-NEXT:    [[ADD63_I_I:%.*]] = add i64 [[MUL62_I_I]], 1
+; CHECK-NEXT:    [[MUL72_I_I:%.*]] = mul i64 [[INDVARS_IV]], [[ADD63_I_I]]
+; CHECK-NEXT:    [[MUL79_I_I:%.*]] = mul i64 [[CALL31_I_I]], [[MUL72_I_I]]
+; CHECK-NEXT:    [[DOTIDX1:%.*]] = add i64 [[TMP1]], [[MUL79_I_I]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST:%.*]] = shl i64 [[DOTIDX1]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV]] = ashr exact i64 [[SUB_PTR_LHS_CAST]], 1
+; CHECK-NEXT:    [[CMP55:%.*]] = icmp sgt i64 [[OFFSET_619]], 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55]])
+; CHECK-NEXT:    br label %[[FOR_COND37]], !llvm.loop [[LOOP0:![0-9]+]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.cond2, %entry
+  br label %for.cond2
+
+for.cond2:                                        ; preds = %for.cond.cleanup14, %for.cond
+  %i5.0 = phi i32 [ 0, %for.cond ], [ %inc70, %for.cond.cleanup14 ]
+  %cmp6 = icmp ult i32 %i5.0, %conv5
+  br i1 %cmp6, label %for.cond9, label %for.cond
+
+for.cond9:                                        ; preds = %for.cond16, %for.cond2
+  br i1 %cmp13, label %for.cond16, label %for.cond.cleanup14
+
+for.cond.cleanup14:                               ; preds = %for.cond9
+  %inc70 = add i32 %i5.0, 1
+  br label %for.cond2
+
+for.cond16:                                       ; preds = %for.cond23, %for.cond9
+  br i1 %cmp20, label %for.cond23, label %for.cond9
+
+for.cond23:                                       ; preds = %for.cond30, %for.cond16
+  br i1 %cmp27, label %for.cond30, label %for.cond16
+
+for.cond30:                                       ; preds = %for.cond37, %for.cond23
+  br i1 %cmp34, label %for.cond37, label %for.cond23
+
+for.cond37:                                       ; preds = %for.body43, %for.cond30
+  %i0.018 = phi i32 [ %inc, %for.body43 ], [ 0, %for.cond30 ]
+  %offset.619 = phi i64 [ %sub.ptr.div, %for.body43 ], [ 0, %for.cond30 ]
+  br i1 %cmp41, label %for.body43, label %for.cond30
+
+for.body43:                                       ; preds = %for.cond37
+  %conv45 = zext i32 %i0.018 to i64
+  %conv50 = zext i32 %i5.0 to i64
+  %call31.i.i = load volatile i64, ptr null, align 8
+  %add33.i.i = add i64 %conv50, %call31.i.i
+  %mul42.i.i = mul i64 %conv45, %add33.i.i
+  %add43.i.i = add i64 %mul42.i.i, 1
+  %mul52.i.i = mul i64 %conv45, %add43.i.i
+  %add53.i.i = add i64 %mul52.i.i, 1
+  %mul62.i.i = mul i64 %conv45, %add53.i.i
+  %add63.i.i = add i64 %mul62.i.i, 1
+  %mul72.i.i = mul i64 %conv50, %add63.i.i
+  %mul79.i.i = mul i64 %call31.i.i, %mul72.i.i
+  %.idx1 = add i64 %conv45, %mul79.i.i
+  %sub.ptr.lhs.cast = shl i64 %.idx1, 2
+  %sub.ptr.div = ashr exact i64 %sub.ptr.lhs.cast, 1
+  %cmp55 = icmp sgt i64 %offset.619, 0
+  call void @llvm.assume(i1 %cmp55)
+  %inc = add i32 %conv5, 1
+  br label %for.cond37
+}
+
+define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(i64 %i0, i64 %i5) local_unnamed_addr {
+; CHECK-LABEL: define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(
+; CHECK-SAME: i64 [[I0:%.*]], i64 [[I5:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I:%.*]] = add i64 [[I5]], [[CALL31_I]]
+; CHECK-NEXT:    [[MUL42_I:%.*]] = mul i64 [[I0]], [[ADD33_I]]
+; CHECK-NEXT:    [[ADD43_I:%.*]] = add i64 [[MUL42_I]], 1
+; CHECK-NEXT:    [[MUL52_I:%.*]] = mul i64 [[I0]], [[ADD43_I]]
+; CHECK-NEXT:    [[ADD53_I:%.*]] = add i64 [[MUL52_I]], 1
+; CHECK-NEXT:    [[MUL62_I:%.*]] = mul i64 [[I0]], [[ADD53_I]]
+; CHECK-NEXT:    [[ADD63_I:%.*]] = add i64 [[MUL62_I]], 1
+; CHECK-NEXT:    [[MUL72_I:%.*]] = mul i64 [[I5]], [[ADD63_I]]
+; CHECK-NEXT:    [[MUL79_I:%.*]] = mul i64 [[CALL31_I]], [[MUL72_I]]
+; CHECK-NEXT:    [[ADD80_I:%.*]] = add i64 [[I0]], [[MUL79_I]]
+; CHECK-NEXT:    [[ARRAYIDX_I:%.*]] = getelementptr i32, ptr null, i64 [[ADD80_I]]
+; CHECK-NEXT:    ret ptr [[ARRAYIDX_I]]
+;
+entry:
+  %call31.i = load volatile i64, ptr null, align 8
+  %add33.i = add i64 %i5, %call31.i
+  %mul42.i = mul i64 %i0, %add33.i
+  %add43.i = add i64 %mul42.i, 1
+  %mul52.i = mul i64 %i0, %add43.i
+  %add53.i = add i64 %mul52.i, 1
+  %mul62.i = mul i64 %i0, %add53.i
+  %add63.i = add i64 %mul62.i, 1
+  %mul72.i = mul i64 %i5, %add63.i
+  %mul79.i = mul i64 %call31.i, %mul72.i
+  %add80.i = add i64 %i0, %mul79.i
+  %arrayidx.i = getelementptr i32, ptr null, i64 %add80.i
+  ret ptr %arrayidx.i
+}
+
+define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(i64 %index_offsets, i64 %index_offsets9) local_unnamed_addr {
+; CHECK-LABEL: define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(
+; CHECK-SAME: i64 [[INDEX_OFFSETS:%.*]], i64 [[INDEX_OFFSETS9:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33:%.*]] = add i64 [[INDEX_OFFSETS9]], [[CALL31]]
+; CHECK-NEXT:    [[MUL42:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD33]]
+; CHECK-NEXT:    [[ADD43:%.*]] = add i64 [[MUL42]], 1
+; CHECK-NEXT:    [[MUL52:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD43]]
+; CHECK-NEXT:    [[ADD53:%.*]] = add i64 [[MUL52]], 1
+; CHECK-NEXT:    [[MUL62:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD53]]
+; CHECK-NEXT:    [[ADD63:%.*]] = add i64 [[MUL62]], 1
+; CHECK-NEXT:    [[MUL72:%.*]] = mul i64 [[INDEX_OFFSETS9]], [[ADD63]]
+; CHECK-NEXT:    [[MUL79:%.*]] = mul i64 [[CALL31]], [[MUL72]]
+; CHECK-NEXT:    [[ADD80:%.*]] = add i64 [[INDEX_OFFSETS]], [[MUL79]]
+; CHECK-NEXT:    ret i64 [[ADD80]]
+;
+entry:
+  %call31 = load volatile i64, ptr null, align 8
+  %add33 = add i64 %index_offsets9, %call31
+  %mul42 = mul i64 %index_offsets, %add33
+  %add43 = add i64 %mul42, 1
+  %mul52 = mul i64 %index_offsets, %add43
+  %add53 = add i64 %mul52, 1
+  %mul62 = mul i64 %index_offsets, %add53
+  %add63 = add i64 %mul62, 1
+  %mul72 = mul i64 %index_offsets9, %add63
+  %mul79 = mul i64 %call31, %mul72
+  %add80 = add i64 %index_offsets, %mul79
+  ret i64 %add80
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
+declare void @llvm.assume(i1 noundef) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.peeled.count", i32 1}
+;.

llvmbot · 2025-12-14T22:44:23Z

@llvm/pr-subscribers-llvm-analysis

Author: Anshil Gandhi (gandhi56)

Changes

Avoid inserting depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec).

This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.

Full diff: https://github.com/llvm/llvm-project/pull/172234.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+4-1)
(added) llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll (+223)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index e1f90264be7a2..518aba36c5e6d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1172,9 +1172,12 @@ const SCEV *ScalarEvolution::getTruncateExpr(const SCEV *Op, Type *Ty,
     return getTruncateOrZeroExtend(SZ->getOperand(), Ty, Depth + 1);
 
   if (Depth > MaxCastDepth) {
+    // Avoid caching depth-limited unsimplified results. Later queries at lower
+    // depth should be able to simplify (e.g., truncate of AddRec -> AddRec).
+    // Caching here would return this unsimplified SCEVTruncateExpr for future
+    // queries, preventing proper simplification.
     SCEV *S =
         new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
-    UniqueSCEVs.InsertNode(S, IP);
     registerUser(S, Op);
     return S;
   }
diff --git a/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
new file mode 100644
index 0000000000000..4f994f2458a36
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll
@@ -0,0 +1,223 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt %s -passes="loop(loop-idiom,indvars,loop-deletion,loop-unroll-full)" -S | FileCheck %s
+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(i32 %conv5, i1 %cmp13, i1 %cmp20, i1 %cmp27, i1 %cmp34, i1 %cmp41) local_unnamed_addr {
+; CHECK-LABEL: define void @_ZNK4Test29TestViewOperator_LeftAndRightIiN6Kokkos6SerialELj7EEclEmRi(
+; CHECK-SAME: i32 [[CONV5:%.*]], i1 [[CMP13:%.*]], i1 [[CMP20:%.*]], i1 [[CMP27:%.*]], i1 [[CMP34:%.*]], i1 [[CMP41:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[CONV5]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND:.*]]
+; CHECK:       [[FOR_COND_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND]]
+; CHECK:       [[FOR_COND]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = zext i32 [[CONV5]] to i64
+; CHECK-NEXT:    br label %[[FOR_COND2:.*]]
+; CHECK:       [[FOR_COND2]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_COND_CLEANUP14:.*]] ], [ 0, %[[FOR_COND]] ]
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV]], [[TMP2]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[FOR_COND9_PREHEADER:.*]], label %[[FOR_COND_LOOPEXIT]]
+; CHECK:       [[FOR_COND9_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND9:.*]]
+; CHECK:       [[FOR_COND9_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND9]]
+; CHECK:       [[FOR_COND9]]:
+; CHECK-NEXT:    br i1 [[CMP13]], label %[[FOR_COND16_PREHEADER:.*]], label %[[FOR_COND_CLEANUP14]]
+; CHECK:       [[FOR_COND16_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND16:.*]]
+; CHECK:       [[FOR_COND_CLEANUP14]]:
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    br label %[[FOR_COND2]]
+; CHECK:       [[FOR_COND16_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND16]]
+; CHECK:       [[FOR_COND16]]:
+; CHECK-NEXT:    br i1 [[CMP20]], label %[[FOR_COND23_PREHEADER:.*]], label %[[FOR_COND9_LOOPEXIT]]
+; CHECK:       [[FOR_COND23_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND23:.*]]
+; CHECK:       [[FOR_COND23_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND23]]
+; CHECK:       [[FOR_COND23]]:
+; CHECK-NEXT:    br i1 [[CMP27]], label %[[FOR_COND30_PREHEADER:.*]], label %[[FOR_COND16_LOOPEXIT]]
+; CHECK:       [[FOR_COND30_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND30:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT_LOOPEXIT:.*]]:
+; CHECK-NEXT:    br label %[[FOR_COND30_LOOPEXIT:.*]]
+; CHECK:       [[FOR_COND30_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[FOR_COND30]]
+; CHECK:       [[FOR_COND30]]:
+; CHECK-NEXT:    br i1 [[CMP34]], label %[[FOR_COND37_PREHEADER:.*]], label %[[FOR_COND23_LOOPEXIT]]
+; CHECK:       [[FOR_COND37_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_BEGIN:.*]]
+; CHECK:       [[FOR_COND37_PEEL_BEGIN]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL:.*]]
+; CHECK:       [[FOR_COND37_PEEL]]:
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43_PEEL:.*]], label %[[FOR_COND30_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43_PEEL]]:
+; CHECK-NEXT:    [[CONV45_PEEL:%.*]] = zext i32 0 to i64
+; CHECK-NEXT:    [[CALL31_I_I_PEEL:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[MUL79_I_I_PEEL:%.*]] = mul i64 [[CALL31_I_I_PEEL]], [[INDVARS_IV]]
+; CHECK-NEXT:    [[DOTIDX1_PEEL:%.*]] = add i64 [[CONV45_PEEL]], [[MUL79_I_I_PEEL]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST_PEEL:%.*]] = shl i64 [[DOTIDX1_PEEL]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV_PEEL:%.*]] = ashr exact i64 [[SUB_PTR_LHS_CAST_PEEL]], 1
+; CHECK-NEXT:    [[CMP55_PEEL:%.*]] = icmp sgt i64 0, 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55_PEEL]])
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PEEL_NEXT1:.*]]
+; CHECK:       [[FOR_COND37_PEEL_NEXT1]]:
+; CHECK-NEXT:    br label %[[FOR_COND37_PREHEADER_PEEL_NEWPH:.*]]
+; CHECK:       [[FOR_COND37_PREHEADER_PEEL_NEWPH]]:
+; CHECK-NEXT:    br label %[[FOR_COND37:.*]]
+; CHECK:       [[FOR_COND37]]:
+; CHECK-NEXT:    [[OFFSET_619:%.*]] = phi i64 [ [[SUB_PTR_DIV:%.*]], %[[FOR_BODY43:.*]] ], [ [[SUB_PTR_DIV_PEEL]], %[[FOR_COND37_PREHEADER_PEEL_NEWPH]] ]
+; CHECK-NEXT:    br i1 [[CMP41]], label %[[FOR_BODY43]], label %[[FOR_COND30_LOOPEXIT_LOOPEXIT]]
+; CHECK:       [[FOR_BODY43]]:
+; CHECK-NEXT:    [[CALL31_I_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I_I:%.*]] = add i64 [[INDVARS_IV]], [[CALL31_I_I]]
+; CHECK-NEXT:    [[MUL42_I_I:%.*]] = mul i64 [[TMP1]], [[ADD33_I_I]]
+; CHECK-NEXT:    [[ADD43_I_I:%.*]] = add i64 [[MUL42_I_I]], 1
+; CHECK-NEXT:    [[MUL52_I_I:%.*]] = mul i64 [[TMP1]], [[ADD43_I_I]]
+; CHECK-NEXT:    [[ADD53_I_I:%.*]] = add i64 [[MUL52_I_I]], 1
+; CHECK-NEXT:    [[MUL62_I_I:%.*]] = mul i64 [[TMP1]], [[ADD53_I_I]]
+; CHECK-NEXT:    [[ADD63_I_I:%.*]] = add i64 [[MUL62_I_I]], 1
+; CHECK-NEXT:    [[MUL72_I_I:%.*]] = mul i64 [[INDVARS_IV]], [[ADD63_I_I]]
+; CHECK-NEXT:    [[MUL79_I_I:%.*]] = mul i64 [[CALL31_I_I]], [[MUL72_I_I]]
+; CHECK-NEXT:    [[DOTIDX1:%.*]] = add i64 [[TMP1]], [[MUL79_I_I]]
+; CHECK-NEXT:    [[SUB_PTR_LHS_CAST:%.*]] = shl i64 [[DOTIDX1]], 2
+; CHECK-NEXT:    [[SUB_PTR_DIV]] = ashr exact i64 [[SUB_PTR_LHS_CAST]], 1
+; CHECK-NEXT:    [[CMP55:%.*]] = icmp sgt i64 [[OFFSET_619]], 0
+; CHECK-NEXT:    call void @llvm.assume(i1 [[CMP55]])
+; CHECK-NEXT:    br label %[[FOR_COND37]], !llvm.loop [[LOOP0:![0-9]+]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.cond2, %entry
+  br label %for.cond2
+
+for.cond2:                                        ; preds = %for.cond.cleanup14, %for.cond
+  %i5.0 = phi i32 [ 0, %for.cond ], [ %inc70, %for.cond.cleanup14 ]
+  %cmp6 = icmp ult i32 %i5.0, %conv5
+  br i1 %cmp6, label %for.cond9, label %for.cond
+
+for.cond9:                                        ; preds = %for.cond16, %for.cond2
+  br i1 %cmp13, label %for.cond16, label %for.cond.cleanup14
+
+for.cond.cleanup14:                               ; preds = %for.cond9
+  %inc70 = add i32 %i5.0, 1
+  br label %for.cond2
+
+for.cond16:                                       ; preds = %for.cond23, %for.cond9
+  br i1 %cmp20, label %for.cond23, label %for.cond9
+
+for.cond23:                                       ; preds = %for.cond30, %for.cond16
+  br i1 %cmp27, label %for.cond30, label %for.cond16
+
+for.cond30:                                       ; preds = %for.cond37, %for.cond23
+  br i1 %cmp34, label %for.cond37, label %for.cond23
+
+for.cond37:                                       ; preds = %for.body43, %for.cond30
+  %i0.018 = phi i32 [ %inc, %for.body43 ], [ 0, %for.cond30 ]
+  %offset.619 = phi i64 [ %sub.ptr.div, %for.body43 ], [ 0, %for.cond30 ]
+  br i1 %cmp41, label %for.body43, label %for.cond30
+
+for.body43:                                       ; preds = %for.cond37
+  %conv45 = zext i32 %i0.018 to i64
+  %conv50 = zext i32 %i5.0 to i64
+  %call31.i.i = load volatile i64, ptr null, align 8
+  %add33.i.i = add i64 %conv50, %call31.i.i
+  %mul42.i.i = mul i64 %conv45, %add33.i.i
+  %add43.i.i = add i64 %mul42.i.i, 1
+  %mul52.i.i = mul i64 %conv45, %add43.i.i
+  %add53.i.i = add i64 %mul52.i.i, 1
+  %mul62.i.i = mul i64 %conv45, %add53.i.i
+  %add63.i.i = add i64 %mul62.i.i, 1
+  %mul72.i.i = mul i64 %conv50, %add63.i.i
+  %mul79.i.i = mul i64 %call31.i.i, %mul72.i.i
+  %.idx1 = add i64 %conv45, %mul79.i.i
+  %sub.ptr.lhs.cast = shl i64 %.idx1, 2
+  %sub.ptr.div = ashr exact i64 %sub.ptr.lhs.cast, 1
+  %cmp55 = icmp sgt i64 %offset.619, 0
+  call void @llvm.assume(i1 %cmp55)
+  %inc = add i32 %conv5, 1
+  br label %for.cond37
+}
+
+define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(i64 %i0, i64 %i5) local_unnamed_addr {
+; CHECK-LABEL: define ptr @_ZNK6Kokkos11DynRankViewIiJNS_10LayoutLeftENS_6SerialEEEclEmmmmmmm(
+; CHECK-SAME: i64 [[I0:%.*]], i64 [[I5:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31_I:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33_I:%.*]] = add i64 [[I5]], [[CALL31_I]]
+; CHECK-NEXT:    [[MUL42_I:%.*]] = mul i64 [[I0]], [[ADD33_I]]
+; CHECK-NEXT:    [[ADD43_I:%.*]] = add i64 [[MUL42_I]], 1
+; CHECK-NEXT:    [[MUL52_I:%.*]] = mul i64 [[I0]], [[ADD43_I]]
+; CHECK-NEXT:    [[ADD53_I:%.*]] = add i64 [[MUL52_I]], 1
+; CHECK-NEXT:    [[MUL62_I:%.*]] = mul i64 [[I0]], [[ADD53_I]]
+; CHECK-NEXT:    [[ADD63_I:%.*]] = add i64 [[MUL62_I]], 1
+; CHECK-NEXT:    [[MUL72_I:%.*]] = mul i64 [[I5]], [[ADD63_I]]
+; CHECK-NEXT:    [[MUL79_I:%.*]] = mul i64 [[CALL31_I]], [[MUL72_I]]
+; CHECK-NEXT:    [[ADD80_I:%.*]] = add i64 [[I0]], [[MUL79_I]]
+; CHECK-NEXT:    [[ARRAYIDX_I:%.*]] = getelementptr i32, ptr null, i64 [[ADD80_I]]
+; CHECK-NEXT:    ret ptr [[ARRAYIDX_I]]
+;
+entry:
+  %call31.i = load volatile i64, ptr null, align 8
+  %add33.i = add i64 %i5, %call31.i
+  %mul42.i = mul i64 %i0, %add33.i
+  %add43.i = add i64 %mul42.i, 1
+  %mul52.i = mul i64 %i0, %add43.i
+  %add53.i = add i64 %mul52.i, 1
+  %mul62.i = mul i64 %i0, %add53.i
+  %add63.i = add i64 %mul62.i, 1
+  %mul72.i = mul i64 %i5, %add63.i
+  %mul79.i = mul i64 %call31.i, %mul72.i
+  %add80.i = add i64 %i0, %mul79.i
+  %arrayidx.i = getelementptr i32, ptr null, i64 %add80.i
+  ret ptr %arrayidx.i
+}
+
+define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(i64 %index_offsets, i64 %index_offsets9) local_unnamed_addr {
+; CHECK-LABEL: define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(
+; CHECK-SAME: i64 [[INDEX_OFFSETS:%.*]], i64 [[INDEX_OFFSETS9:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CALL31:%.*]] = load volatile i64, ptr null, align 8
+; CHECK-NEXT:    [[ADD33:%.*]] = add i64 [[INDEX_OFFSETS9]], [[CALL31]]
+; CHECK-NEXT:    [[MUL42:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD33]]
+; CHECK-NEXT:    [[ADD43:%.*]] = add i64 [[MUL42]], 1
+; CHECK-NEXT:    [[MUL52:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD43]]
+; CHECK-NEXT:    [[ADD53:%.*]] = add i64 [[MUL52]], 1
+; CHECK-NEXT:    [[MUL62:%.*]] = mul i64 [[INDEX_OFFSETS]], [[ADD53]]
+; CHECK-NEXT:    [[ADD63:%.*]] = add i64 [[MUL62]], 1
+; CHECK-NEXT:    [[MUL72:%.*]] = mul i64 [[INDEX_OFFSETS9]], [[ADD63]]
+; CHECK-NEXT:    [[MUL79:%.*]] = mul i64 [[CALL31]], [[MUL72]]
+; CHECK-NEXT:    [[ADD80:%.*]] = add i64 [[INDEX_OFFSETS]], [[MUL79]]
+; CHECK-NEXT:    ret i64 [[ADD80]]
+;
+entry:
+  %call31 = load volatile i64, ptr null, align 8
+  %add33 = add i64 %index_offsets9, %call31
+  %mul42 = mul i64 %index_offsets, %add33
+  %add43 = add i64 %mul42, 1
+  %mul52 = mul i64 %index_offsets, %add43
+  %add53 = add i64 %mul52, 1
+  %mul62 = mul i64 %index_offsets, %add53
+  %add63 = add i64 %mul62, 1
+  %mul72 = mul i64 %index_offsets9, %add63
+  %mul79 = mul i64 %call31, %mul72
+  %add80 = add i64 %index_offsets, %mul79
+  ret i64 %add80
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
+declare void @llvm.assume(i1 noundef) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.peeled.count", i32 1}
+;.

Copilot

Pull request overview

This PR fixes a cache poisoning issue in SCEV where depth-limited unsimplified truncate expressions were being cached, preventing later queries at lower depths from properly simplifying expressions (e.g., preventing truncate of AddRecExpr from becoming AddRecExpr).

Removes caching of depth-limited SCEVTruncateExpr results in getTruncateExpr
Adds comprehensive test case demonstrating the fix prevents assertion failures in IndVarSimplify

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
llvm/lib/Analysis/ScalarEvolution.cpp	Removes `UniqueSCEVs.InsertNode(S, IP)` call for depth-limited truncate expressions to prevent cache poisoning
llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll	Adds regression test with nested loops that previously triggered assertion failure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-14T22:47:29Z

llvm/lib/Analysis/ScalarEvolution.cpp

  if (Depth > MaxCastDepth) {
+    // Avoid caching depth-limited unsimplified results. Later queries at lower
+    // depth should be able to simplify (e.g., truncate of AddRec -> AddRec).
+    // Caching here would return this unsimplified SCEVTruncateExpr for future
+    // queries, preventing proper simplification.
    SCEV *S =
        new (SCEVAllocator) SCEVTruncateExpr(ID.Intern(SCEVAllocator), Op, Ty);
-    UniqueSCEVs.InsertNode(S, IP);
    registerUser(S, Op);
    return S;
  }


The fix removes caching of depth-limited truncate expressions to prevent cache poisoning. However, the same caching issue exists in getZeroExtendExprImpl (line 1603) and getSignExtendExprImpl (line 1943), where depth-limited unsimplified zero/sign extend expressions are still being inserted into UniqueSCEVs cache. This inconsistency means that while truncate expressions are now properly handled, zero and sign extend expressions can still cause the same cache poisoning problem, where a depth-limited unsimplified expression prevents later queries at lower depth from properly simplifying.

Don't insert depth-limited unsimplified SCEVTruncateExpr into the UniqueSCEVs cache. Caching these prevents later queries at lower depth from properly simplifying (e.g., trunc(AddRec) -> AddRec). This fixes an assertion failure in IndVarSimplify where getTruncateExpr on an AddRecExpr returned a cached SCEVTruncateExpr instead of the expected SCEVAddRecExpr.

nikic

Please reference the issue you are fixing. I remember there being one for this.

I initially thought that this change may cause compile-time issues for the pathological cases the depth limit is intended to address, but after considering more carefully, I don't think it will really affect them. The cache will only be hit when using the argument after the recursion has already happened. (We have a different cache that will use the original argument, but that's not used for trunc.)

nikic · 2025-12-15T08:49:23Z

llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll

@@ -0,0 +1,223 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt %s -passes="loop(loop-idiom,indvars,loop-deletion,loop-unroll-full)" -S | FileCheck %s
+; REQUIRES: asserts


Does not need asserts.

I added an assertion in IndVarSimplify to ensure that the truncated expression is a SCEVAddRecExpr.

The original issue involved the following assertion to fail:

llvm-project/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Line 1106 in 3e32735

assert(ExitCnt->getType()->isPointerTy() ==

You only need to add REQUIRES: asserts if the test will fail in a build without assertions, e.g. because you're using -debug. It's not intended for tests that happened to assert in the past.

nikic · 2025-12-15T08:49:30Z

llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll

+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"


Do not use triple.

nikic · 2025-12-15T08:49:49Z

llvm/test/Transforms/IndVarSimplify/scev-update-loop-opt.ll

+  ret ptr %arrayidx.i
+}
+
+define i64 @_ZNK6Kokkos4ViewIPPPPPPPiJNS_10LayoutLeftENS_6SerialEEE14compute_offsetIJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEJmmmmmmmEEEDaSt16integer_sequenceImJXspT_EEEDpT0_(i64 %index_offsets, i64 %index_offsets9) local_unnamed_addr {


This does not look like a reduced test case -- these functions don't look relevant.

nikic

Hm, but thinking about this again, this change is going to break SCEV uniquing. It means that we will be creating this node multiple times, breaking equality comparisons -- and creating a memory leak. If the query that hits this depth limit is run many times, it's going to create a new, different SCEV node every time, and these will never get cleaned up. I don't think we can do this.

nikic · 2025-12-15T09:03:54Z

I think the right way to fix the IndVarSimplify issue is to not assume the truncation is going to return a specific type. If it does not return an addrec, we can use a fallback implementation that truncates at the end instead of the start. That's less efficient but shouldn't really matter for this edge case.

gandhi56 · 2025-12-15T18:27:09Z

Please reference the issue you are fixing. I remember there being one for this.

This issue showed up during a kokkos build. Upon a quick iteration over github issues, I didn't find a related issue. I will mention them as soon as I find one.

nikic · 2025-12-15T19:42:14Z

Please reference the issue you are fixing. I remember there being one for this.

This issue showed up during a kokkos build. Upon a quick iteration over github issues, I didn't find a related issue. I will mention them as soon as I find one.

I think it may be #153090.

efriedma-quic · 2025-12-15T21:24:07Z

We can force an addrec with truncated operands by expanding out the code to create it: instead of truncating the AddRec itself, truncate the operands of the AddRec. (Assuming the step is non-zero after we truncate it, which it is in this case because we assert it's exactly one.)

Long-term I suspect we just don't want to limit depth the way we do. We should use iterative algorithms to avoid stack overflow... and if computations are still too expensive at that point, we should consider adding explicit markers to the IR to limit SCEV depth.

gandhi56 · 2025-12-16T06:54:41Z

I think the right way to fix the IndVarSimplify issue is to not assume the truncation is going to return a specific type. If it does not return an addrec, we can use a fallback implementation that truncates at the end instead of the start. That's less efficient but shouldn't really matter for this edge case.

I would prefer to perform the truncation within the SCEV framework instead of IndVarSimplify. This way, all consumers of SCEV will only need to handle canonical SCEV expressions. Furthermore, this problem of caching un-simplified expressions occur in other parts of the pass as well, such as getZeroExtendExprImpl and getSignExtendExprImpl. So handling this corner case in IndVarSimplify may resolve this bug, but there will still be other bugs lying around due to unsimplified zext/sext expressions.

Long-term I suspect we just don't want to limit depth the way we do. We should use iterative algorithms to avoid stack overflow... and if computations are still too expensive at that point, we should consider adding explicit markers to the IR to limit SCEV depth.

Converting the SCEV recursive algorithm into an iterative algorithm seems worth a chance.

gandhi56 · 2025-12-16T06:59:35Z

Hm, but thinking about this again, this change is going to break SCEV uniquing. It means that we will be creating this node multiple times, breaking equality comparisons -- and creating a memory leak. If the query that hits this depth limit is run many times, it's going to create a new, different SCEV node every time, and these will never get cleaned up. I don't think we can do this.

This memory leak issue can be avoided via an iterative algorithm, by simply caching expressions which are important and ensuring that the worklist is cleared up before the algorithm terminates.

nikic · 2025-12-16T08:45:30Z

We can force an addrec with truncated operands by expanding out the code to create it: instead of truncating the AddRec itself, truncate the operands of the AddRec. (Assuming the step is non-zero after we truncate it, which it is in this case because we assert it's exactly one.)

Another option would be to only check for an existing node after trying to push the trunc into the addrec. It should be fine to perform any unconditional transforms before checking for an existing node. We just shouldn't perform conditional ones (like the add/mul variant) before checking to avoid unnecessary work.

Long-term I suspect we just don't want to limit depth the way we do. We should use iterative algorithms to avoid stack overflow... and if computations are still too expensive at that point, we should consider adding explicit markers to the IR to limit SCEV depth.

Not sure I follow what you have in mind here. How would such an IR marker look like?

efriedma-quic · 2025-12-16T19:00:59Z

Not sure I follow what you have in mind here. How would such an IR marker look like?

The depth limit sort of serves two purposes: one is avoiding stack overflow, but the other is just to limit the total amount of work we do in various transforms. The marker is targeted at the latter.

i32 @llvm.scev.marker(i32) just returns the argument... but SCEV treats it as SCEVUnknown. When we detect SCEVs with a high depth, we insert this to reduce the total depth. Or something like that. This allows limiting complex SCEVs while still ensuring SCEV computations are reproducible: the cutoff is always in the same place. I'm not completely sure how we'd go about inserting the markers.

The underlying idea is that SCEV operations become complex mostly due to pulling in trees of irrelevant arithmetic; if we hide the complexity, we get the clean result we want. (This is sort of inspired by some work I did in polly... although the penalties for overly complex expressions are a lot more extreme there.)

Having this in the IR isn't great, maybe, but I can't think of a better way to represent it. We could maybe keep it in SCEV itself, but I'm not sure how we ensure we choose the location of the barrier consistently if it's stored in a transient datastructure.

gandhi56 self-assigned this Dec 14, 2025

gandhi56 requested a review from nikic as a code owner December 14, 2025 22:43

gandhi56 requested review from Copilot and removed request for nikic December 14, 2025 22:44

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Dec 14, 2025

gandhi56 requested a review from nikic December 14, 2025 22:44

gandhi56 requested a review from jlebar December 14, 2025 22:44

Copilot started reviewing on behalf of gandhi56 December 14, 2025 22:45 View session

Copilot AI reviewed Dec 14, 2025

View reviewed changes

gandhi56 force-pushed the scev/avoid-caching-unsimplified-trunc branch from c71fae1 to c8d18ae Compare December 14, 2025 23:08

gandhi56 force-pushed the scev/avoid-caching-unsimplified-trunc branch from c8d18ae to 5e1e5d0 Compare December 14, 2025 23:09

nikic reviewed Dec 15, 2025

View reviewed changes

nikic requested review from efriedma-quic and fhahn December 15, 2025 09:04

[SCEV] Fix cache poisoning from depth-limited truncate expressions #172234

Are you sure you want to change the base?

[SCEV] Fix cache poisoning from depth-limited truncate expressions #172234

Conversation

gandhi56 commented Dec 14, 2025

Uh oh!

llvmbot commented Dec 14, 2025

Uh oh!

llvmbot commented Dec 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gandhi56 Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gandhi56 Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic commented Dec 15, 2025

Uh oh!

gandhi56 commented Dec 15, 2025

Uh oh!

nikic commented Dec 15, 2025

Uh oh!

efriedma-quic commented Dec 15, 2025

Uh oh!

gandhi56 commented Dec 16, 2025

Uh oh!

gandhi56 commented Dec 16, 2025

Uh oh!

nikic commented Dec 16, 2025

Uh oh!

efriedma-quic commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants