[CIR] Add BinOpOverflowOp and basic pointer arithmetic support #133118

mmha · 2025-03-26T16:30:37Z

This was part of #132420 originally but split off to reduce the diff size.

This patch adds support for overflowing binary operators and all the plumbing required for pointer arithmetic except for the actual Ops (PtrDiffOp and PtrStrideOp - we emit dummy values instead).

This reverts commit 56fbcba.

This reverts commit df01131.

llvmbot · 2025-03-26T16:31:15Z

@llvm/pr-subscribers-clang

Author: Morris Hafner (mmha)

Changes

This was part of #132420 originally but split off to reduce the diff size.

This patch adds support for overflowing binary operators and all the plumbing required for pointer arithmetic except for the actual Ops (PtrDiffOp and PtrStrideOp - we emit dummy values instead).

Patch is 22.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133118.diff

8 Files Affected:

(modified) clang/include/clang/CIR/Dialect/IR/CIROps.td (+59)
(modified) clang/include/clang/CIR/MissingFeatures.h (+1)
(modified) clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp (+122-3)
(modified) clang/lib/CIR/CodeGen/CIRGenFunction.h (+10)
(modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp (+118)
(modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h (+22)
(modified) clang/test/CIR/CodeGen/binop.cpp (+10)
(added) clang/test/CIR/Lowering/binop-overflow.cir (+63)

diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 455cc2b8b0277..3d49f88b032cb 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -889,6 +889,65 @@ def BinOp : CIR_Op<"binop", [Pure,
   let hasVerifier = 1;
 }
 
+
+//===----------------------------------------------------------------------===//
+// BinOpOverflowOp
+//===----------------------------------------------------------------------===//
+
+def BinOpOverflowKind : I32EnumAttr<
+    "BinOpOverflowKind",
+    "checked binary arithmetic operation kind",
+    [BinOpKind_Add, BinOpKind_Sub, BinOpKind_Mul]> {
+  let cppNamespace = "::cir";
+}
+
+def BinOpOverflowOp : CIR_Op<"binop.overflow", [Pure, SameTypeOperands]> {
+  let summary = "Perform binary integral arithmetic with overflow checking";
+  let description = [{
+    `cir.binop.overflow` performs binary arithmetic operations with overflow
+    checking on integral operands.
+
+    The `kind` argument specifies the kind of arithmetic operation to perform.
+    It can be either `add`, `sub`, or `mul`. The `lhs` and `rhs` arguments
+    specify the input operands of the arithmetic operation. The types of `lhs`
+    and `rhs` must be the same.
+
+    `cir.binop.overflow` produces two SSA values. `result` is the result of the
+    arithmetic operation truncated to its specified type. `overflow` is a
+    boolean value indicating whether overflow happens during the operation.
+
+    The exact semantic of this operation is as follows:
+
+      - `lhs` and `rhs` are promoted to an imaginary integral type that has
+        infinite precision.
+      - The arithmetic operation is performed on the promoted operands.
+      - The infinite-precision result is truncated to the type of `result`. The
+        truncated result is assigned to `result`.
+      - If the truncated result is equal to the un-truncated result, `overflow`
+        is assigned to false. Otherwise, `overflow` is assigned to true.
+  }];
+
+  let arguments = (ins Arg<BinOpOverflowKind, "arithmetic kind">:$kind,
+                       CIR_IntType:$lhs, CIR_IntType:$rhs);
+  let results = (outs CIR_IntType:$result, CIR_BoolType:$overflow);
+
+  let assemblyFormat = [{
+    `(` $kind `,` $lhs `,` $rhs `)` `:` type($lhs) `,`
+    `(` type($result) `,` type($overflow) `)`
+    attr-dict
+  }];
+
+  let builders = [
+    OpBuilder<(ins "cir::IntType":$resultTy,
+                   "cir::BinOpOverflowKind":$kind,
+                   "mlir::Value":$lhs,
+                   "mlir::Value":$rhs), [{
+      auto overflowTy = cir::BoolType::get($_builder.getContext());
+      build($_builder, $_state, resultTy, overflowTy, kind, lhs, rhs);
+    }]>
+  ];
+}
+
 //===----------------------------------------------------------------------===//
 // GlobalOp
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/CIR/MissingFeatures.h b/clang/include/clang/CIR/MissingFeatures.h
index 3a102d90aba8f..338006b351da0 100644
--- a/clang/include/clang/CIR/MissingFeatures.h
+++ b/clang/include/clang/CIR/MissingFeatures.h
@@ -109,6 +109,7 @@ struct MissingFeatures {
   static bool cgFPOptionsRAII() { return false; }
   static bool metaDataNode() { return false; }
   static bool fastMathFlags() { return false; }
+  static bool vlas() { return false; }
 
   // Missing types
   static bool dataMemberType() { return false; }
diff --git a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
index 52bd3b2933744..6e1f5aa0dc600 100644
--- a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
@@ -936,8 +936,107 @@ getUnwidenedIntegerType(const ASTContext &astContext, const Expr *e) {
 static mlir::Value emitPointerArithmetic(CIRGenFunction &cgf,
                                          const BinOpInfo &op,
                                          bool isSubtraction) {
-  cgf.cgm.errorNYI(op.loc, "pointer arithmetic");
-  return {};
+  // Must have binary (not unary) expr here.  Unary pointer
+  // increment/decrement doesn't use this path.
+  const BinaryOperator *expr = cast<BinaryOperator>(op.e);
+
+  mlir::Value pointer = op.lhs;
+  Expr *pointerOperand = expr->getLHS();
+  mlir::Value index = op.rhs;
+  Expr *indexOperand = expr->getRHS();
+
+  // In a subtraction, the LHS is always the pointer.
+  if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {
+    std::swap(pointer, index);
+    std::swap(pointerOperand, indexOperand);
+  }
+
+  bool isSigned = indexOperand->getType()->isSignedIntegerOrEnumerationType();
+
+  // Some versions of glibc and gcc use idioms (particularly in their malloc
+  // routines) that add a pointer-sized integer (known to be a pointer value)
+  // to a null pointer in order to cast the value back to an integer or as
+  // part of a pointer alignment algorithm.  This is undefined behavior, but
+  // we'd like to be able to compile programs that use it.
+  //
+  // Normally, we'd generate a GEP with a null-pointer base here in response
+  // to that code, but it's also UB to dereference a pointer created that
+  // way.  Instead (as an acknowledged hack to tolerate the idiom) we will
+  // generate a direct cast of the integer value to a pointer.
+  //
+  // The idiom (p = nullptr + N) is not met if any of the following are true:
+  //
+  //   The operation is subtraction.
+  //   The index is not pointer-sized.
+  //   The pointer type is not byte-sized.
+  //
+  if (BinaryOperator::isNullPointerArithmeticExtension(
+          cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
+    return cgf.getBuilder().createIntToPtr(index, pointer.getType());
+
+  // Differently from LLVM codegen, ABI bits for index sizes is handled during
+  // LLVM lowering.
+
+  // If this is subtraction, negate the index.
+  if (isSubtraction)
+    index = cgf.getBuilder().createNeg(index);
+
+  if (cgf.sanOpts.has(SanitizerKind::ArrayBounds))
+    cgf.cgm.errorNYI("array bounds sanitizer");
+
+  const PointerType *pointerType =
+      pointerOperand->getType()->getAs<PointerType>();
+  if (!pointerType) {
+    cgf.cgm.errorNYI("ObjC");
+    return {};
+  }
+
+  QualType elementType = pointerType->getPointeeType();
+  if (const VariableArrayType *vla =
+          cgf.getContext().getAsVariableArrayType(elementType)) {
+
+    // The element count here is the total number of non-VLA elements.
+    // TODO(cir): Get correct VLA size here
+    assert(!cir::MissingFeatures::vlas());
+    mlir::Value numElements = cgf.getBuilder().getConstAPInt(
+        cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));
+
+    // GEP indexes are signed, and scaling an index isn't permitted to
+    // signed-overflow, so we use the same semantics for our explicit
+    // multiply.  We suppress this if overflow is not undefined behavior.
+    mlir::Type elemTy = cgf.convertTypeForMem(vla->getElementType());
+
+    index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
+                                        numElements.getType());
+    index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);
+
+    if (cgf.getLangOpts().isSignedOverflowDefined()) {
+      assert(!cir::MissingFeatures::ptrStrideOp());
+      cgf.cgm.errorNYI("pointer stride");
+    } else {
+      pointer = cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                           isSubtraction, op.e->getExprLoc());
+    }
+
+    return pointer;
+  }
+  // Explicitly handle GNU void* and function pointer arithmetic extensions. The
+  // GNU void* casts amount to no-ops since our void* type is i8*, but this is
+  // future proof.
+  mlir::Type elemTy;
+  if (elementType->isVoidType() || elementType->isFunctionType())
+    elemTy = cgf.UInt8Ty;
+  else
+    elemTy = cgf.convertTypeForMem(elementType);
+
+  if (cgf.getLangOpts().isSignedOverflowDefined()) {
+    assert(!cir::MissingFeatures::ptrStrideOp());
+    cgf.cgm.errorNYI("pointer stride");
+    return pointer;
+  }
+
+  return cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                    isSubtraction, op.e->getExprLoc());
 }
 
 mlir::Value ScalarExprEmitter::emitMul(const BinOpInfo &ops) {
@@ -1106,7 +1205,7 @@ mlir::Value ScalarExprEmitter::emitSub(const BinOpInfo &ops) {
   // See more in `EmitSub` in CGExprScalar.cpp.
   assert(!cir::MissingFeatures::ptrDiffOp());
   cgf.cgm.errorNYI("ptrdiff");
-  return {};
+  return cgf.createDummyValue(loc, ops.fullType);
 }
 
 mlir::Value ScalarExprEmitter::emitShl(const BinOpInfo &ops) {
@@ -1402,3 +1501,23 @@ mlir::Value CIRGenFunction::emitScalarPrePostIncDec(const UnaryOperator *e,
   return ScalarExprEmitter(*this, builder)
       .emitScalarPrePostIncDec(e, lv, isInc, isPre);
 }
+
+mlir::Value CIRGenFunction::emitCheckedInBoundsGEP(
+    mlir::Type elemTy, mlir::Value ptr, ArrayRef<mlir::Value> idxList,
+    bool signedIndices, bool isSubtraction, SourceLocation loc) {
+  assert(!cir::MissingFeatures::ptrStrideOp());
+  if (idxList.size() != 1)
+    cgm.errorNYI("multi-index ptr arithmetic");
+
+  // TODO(cir): This should be a PtrStrideOp. For now we simply return the base
+  // pointer
+  mlir::Value gepVal = ptr;
+
+  // If the pointer overflow sanitizer isn't enabled, do nothing.
+  if (!sanOpts.has(SanitizerKind::PointerOverflow))
+    return gepVal;
+
+  assert(!cir::MissingFeatures::pointerOverflowSanitizer());
+  cgm.errorNYI("pointer overflow sanitizer");
+  return gepVal;
+}
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h b/clang/lib/CIR/CodeGen/CIRGenFunction.h
index cc04610f23fcb..0cb6c72710d3e 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.h
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h
@@ -223,6 +223,16 @@ class CIRGenFunction : public CIRGenTypeCache {
 
   void emitDecl(const clang::Decl &d);
 
+  /// Same as IRBuilder::CreateInBoundsGEP, but additionally emits a check to
+  /// detect undefined behavior when the pointer overflow sanitizer is enabled.
+  /// \p SignedIndices indicates whether any of the GEP indices are signed.
+  /// \p IsSubtraction indicates whether the expression used to form the GEP
+  /// is a subtraction.
+  mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,
+                                     llvm::ArrayRef<mlir::Value> idxList,
+                                     bool signedIndices, bool isSubtraction,
+                                     SourceLocation loc);
+
   void emitScalarInit(const clang::Expr *init, mlir::Location loc,
                       LValue lvalue, bool capturedByInit = false);
 
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index a16840cc6bfef..b77750798f1ed 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -20,6 +20,7 @@
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/IR/BuiltinDialect.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/Types.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Target/LLVMIR/Dialect/Builtin/BuiltinToLLVMIRTranslation.h"
@@ -1117,6 +1118,122 @@ mlir::LogicalResult CIRToLLVMBinOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
+mlir::LogicalResult CIRToLLVMBinOpOverflowOpLowering::matchAndRewrite(
+    cir::BinOpOverflowOp op, OpAdaptor adaptor,
+    mlir::ConversionPatternRewriter &rewriter) const {
+  mlir::Location loc = op.getLoc();
+  BinOpOverflowKind arithKind = op.getKind();
+  IntType operandTy = op.getLhs().getType();
+  IntType resultTy = op.getResult().getType();
+
+  EncompassedTypeInfo encompassedTyInfo =
+      computeEncompassedTypeWidth(operandTy, resultTy);
+  mlir::IntegerType encompassedLLVMTy =
+      rewriter.getIntegerType(encompassedTyInfo.width);
+
+  mlir::Value lhs = adaptor.getLhs();
+  mlir::Value rhs = adaptor.getRhs();
+  if (operandTy.getWidth() < encompassedTyInfo.width) {
+    if (operandTy.isSigned()) {
+      lhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, rhs);
+    } else {
+      lhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, rhs);
+    }
+  }
+
+  std::string intrinName = getLLVMIntrinName(arithKind, encompassedTyInfo.sign,
+                                             encompassedTyInfo.width);
+  auto intrinNameAttr = mlir::StringAttr::get(op.getContext(), intrinName);
+
+  mlir::IntegerType overflowLLVMTy = rewriter.getI1Type();
+  auto intrinRetTy = mlir::LLVM::LLVMStructType::getLiteral(
+      rewriter.getContext(), {encompassedLLVMTy, overflowLLVMTy});
+
+  auto callLLVMIntrinOp = rewriter.create<mlir::LLVM::CallIntrinsicOp>(
+      loc, intrinRetTy, intrinNameAttr, mlir::ValueRange{lhs, rhs});
+  mlir::Value intrinRet = callLLVMIntrinOp.getResult(0);
+
+  mlir::Value result = rewriter
+                           .create<mlir::LLVM::ExtractValueOp>(
+                               loc, intrinRet, ArrayRef<int64_t>{0})
+                           .getResult();
+  mlir::Value overflow = rewriter
+                             .create<mlir::LLVM::ExtractValueOp>(
+                                 loc, intrinRet, ArrayRef<int64_t>{1})
+                             .getResult();
+
+  if (resultTy.getWidth() < encompassedTyInfo.width) {
+    mlir::Type resultLLVMTy = getTypeConverter()->convertType(resultTy);
+    auto truncResult =
+        rewriter.create<mlir::LLVM::TruncOp>(loc, resultLLVMTy, result);
+
+    // Extend the truncated result back to the encompassing type to check for
+    // any overflows during the truncation.
+    mlir::Value truncResultExt;
+    if (resultTy.isSigned())
+      truncResultExt = rewriter.create<mlir::LLVM::SExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    else
+      truncResultExt = rewriter.create<mlir::LLVM::ZExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    auto truncOverflow = rewriter.create<mlir::LLVM::ICmpOp>(
+        loc, mlir::LLVM::ICmpPredicate::ne, truncResultExt, result);
+
+    result = truncResult;
+    overflow = rewriter.create<mlir::LLVM::OrOp>(loc, overflow, truncOverflow);
+  }
+
+  mlir::Type boolLLVMTy =
+      getTypeConverter()->convertType(op.getOverflow().getType());
+  if (boolLLVMTy != rewriter.getI1Type())
+    overflow = rewriter.create<mlir::LLVM::ZExtOp>(loc, boolLLVMTy, overflow);
+
+  rewriter.replaceOp(op, mlir::ValueRange{result, overflow});
+
+  return mlir::success();
+}
+
+std::string CIRToLLVMBinOpOverflowOpLowering::getLLVMIntrinName(
+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";
+
+  if (isSigned)
+    name.push_back('s');
+  else
+    name.push_back('u');
+
+  switch (opKind) {
+  case cir::BinOpOverflowKind::Add:
+    name.append("add.");
+    break;
+  case cir::BinOpOverflowKind::Sub:
+    name.append("sub.");
+    break;
+  case cir::BinOpOverflowKind::Mul:
+    name.append("mul.");
+    break;
+  }
+
+  name.append("with.overflow.i");
+  name.append(std::to_string(width));
+
+  return name;
+}
+
+CIRToLLVMBinOpOverflowOpLowering::EncompassedTypeInfo
+CIRToLLVMBinOpOverflowOpLowering::computeEncompassedTypeWidth(
+    cir::IntType operandTy, cir::IntType resultTy) {
+  bool sign = operandTy.getIsSigned() || resultTy.getIsSigned();
+  unsigned width =
+      std::max(operandTy.getWidth() + (sign && operandTy.isUnsigned()),
+               resultTy.getWidth() + (sign && resultTy.isUnsigned()));
+  return {sign, width};
+}
+
 static void prepareTypeConverter(mlir::LLVMTypeConverter &converter,
                                  mlir::DataLayout &dataLayout) {
   converter.addConversion([&](cir::PointerType type) -> mlir::Type {
@@ -1256,6 +1373,7 @@ void ConvertCIRToLLVMPass::runOnOperation() {
   patterns.add<
       // clang-format off
                CIRToLLVMBinOpLowering,
+               CIRToLLVMBinOpOverflowOpLowering,
                CIRToLLVMBrCondOpLowering,
                CIRToLLVMBrOpLowering,
                CIRToLLVMFuncOpLowering,
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
index ef0bb2deaccdf..9fb8babe3dd6c 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
@@ -189,6 +189,28 @@ class CIRToLLVMBinOpLowering : public mlir::OpConversionPattern<cir::BinOp> {
                   mlir::ConversionPatternRewriter &) const override;
 };
 
+class CIRToLLVMBinOpOverflowOpLowering
+    : public mlir::OpConversionPattern<cir::BinOpOverflowOp> {
+public:
+  using mlir::OpConversionPattern<cir::BinOpOverflowOp>::OpConversionPattern;
+
+  mlir::LogicalResult
+  matchAndRewrite(cir::BinOpOverflowOp op, OpAdaptor,
+                  mlir::ConversionPatternRewriter &) const override;
+
+private:
+  static std::string getLLVMIntrinName(cir::BinOpOverflowKind opKind,
+                                       bool isSigned, unsigned width);
+
+  struct EncompassedTypeInfo {
+    bool sign;
+    unsigned width;
+  };
+
+  static EncompassedTypeInfo computeEncompassedTypeWidth(cir::IntType operandTy,
+                                                         cir::IntType resultTy);
+};
+
 class CIRToLLVMBrOpLowering : public mlir::OpConversionPattern<cir::BrOp> {
 public:
   using mlir::OpConversionPattern<cir::BrOp>::OpConversionPattern;
diff --git a/clang/test/CIR/CodeGen/binop.cpp b/clang/test/CIR/CodeGen/binop.cpp
index 4c20f79600fac..e617e54133e45 100644
--- a/clang/test/CIR/CodeGen/binop.cpp
+++ b/clang/test/CIR/CodeGen/binop.cpp
@@ -31,3 +31,13 @@ void testFloatingPointBinOps(float a, float b) {
   a - b;
   // CHECK: cir.binop(sub, %{{.+}}, %{{.+}}) : !cir.float
 }
+
+void testPointerArith(int *a, int *b, int c) {
+  auto add = a + c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+
+  auto sub = a - c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+}
\ No newline at end of file
diff --git a/clang/test/CIR/Lowering/binop-overflow.cir b/clang/test/CIR/Lowering/binop-overflow.cir
new file mode 100644
index 0000000000000..68af70aa6abb6
--- /dev/null
+++ b/clang/test/CIR/Lowering/binop-overflow.cir
@@ -0,0 +1,63 @@
+// RUN: cir-opt %s -cir-to-llvm -o - | FileCheck %s -check-prefix=MLIR
+// RUN: cir-translate %s -cir-to-llvmir --target x86_64-unknown-linux-gnu --disable-cc-lowering -o -  | FileCheck %s -check-prefix=LLVM
+
+!u32i = !cir.int<u, 32>
+!s32i = !cir.int<s, 32>
+
+module {
+  cir.func @test_add_u32_u32_u32(%lhs: !u32i, %rhs: !u32i, %res: !cir.ptr<!u32i>) -> !cir.bool {
+    %result, %overflow = cir.binop.overflow(add, %lhs, %rhs) : !u32i, (!u32i, !cir.bool)
+    cir.store %result, %res : !u32i, !cir.ptr<!u32i>
+    cir.return %overflow : !cir.bool
+  }
+
+  //      MLIR: llvm.func @test_add_u32_u32_u32(%[[LHS:.+]]: i32, %[[RHS:.+]]: i32, %[[RES_PTR:.+]]: !llvm.ptr) -> i1
+  // MLIR-NEXT:   %[[#INTRIN_RET:]] = llvm.call_intrinsic "llvm.uadd.with.overflow.i32"(%[[LHS]], %[[RHS]]) : (i32, i32) -> !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#RES:]] = llvm.extractvalue %[[#INTRIN_RET]][0] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#OVFL:]] = llvm.extractvalue %[[#INTRIN_RET]][1] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   llvm.store %[[#RES]], %[[RES_PTR]] {{.*}} : i32, !llvm.ptr
+  // MLIR-NEXT:   llvm.return %[[#OVFL]] : i1
+  // MLIR-NEXT: }
+
+  //      LLVM: define i1 @test_add_u32_u32_u32(i32 %[[#LHS:]], i32 %[[#RHS:]], ptr %[[#RES_PTR:]])
+  // LLVM-NEXT:   %[[#INTRIN_RET:]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %[[#LHS]], i32 %[[#RHS]])
+  // LLVM-NEXT:   %[[#RES:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 0
+  // LLVM-NEXT:   %[[#OVFL:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 1
+  // LLVM-NEXT:   store i32 %[[#RES]], ptr %[[#RES_PTR]], align 4
+  // LLVM-NEXT:   ret i1 %[[#OVFL]]
+  // LLVM-NEXT: }
+
+  cir.func ...
[truncated]

llvmbot · 2025-03-26T16:31:16Z

@llvm/pr-subscribers-clangir

Author: Morris Hafner (mmha)

Changes

This was part of #132420 originally but split off to reduce the diff size.

This patch adds support for overflowing binary operators and all the plumbing required for pointer arithmetic except for the actual Ops (PtrDiffOp and PtrStrideOp - we emit dummy values instead).

Patch is 22.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133118.diff

8 Files Affected:

(modified) clang/include/clang/CIR/Dialect/IR/CIROps.td (+59)
(modified) clang/include/clang/CIR/MissingFeatures.h (+1)
(modified) clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp (+122-3)
(modified) clang/lib/CIR/CodeGen/CIRGenFunction.h (+10)
(modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp (+118)
(modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h (+22)
(modified) clang/test/CIR/CodeGen/binop.cpp (+10)
(added) clang/test/CIR/Lowering/binop-overflow.cir (+63)

diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 455cc2b8b0277..3d49f88b032cb 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -889,6 +889,65 @@ def BinOp : CIR_Op<"binop", [Pure,
   let hasVerifier = 1;
 }
 
+
+//===----------------------------------------------------------------------===//
+// BinOpOverflowOp
+//===----------------------------------------------------------------------===//
+
+def BinOpOverflowKind : I32EnumAttr<
+    "BinOpOverflowKind",
+    "checked binary arithmetic operation kind",
+    [BinOpKind_Add, BinOpKind_Sub, BinOpKind_Mul]> {
+  let cppNamespace = "::cir";
+}
+
+def BinOpOverflowOp : CIR_Op<"binop.overflow", [Pure, SameTypeOperands]> {
+  let summary = "Perform binary integral arithmetic with overflow checking";
+  let description = [{
+    `cir.binop.overflow` performs binary arithmetic operations with overflow
+    checking on integral operands.
+
+    The `kind` argument specifies the kind of arithmetic operation to perform.
+    It can be either `add`, `sub`, or `mul`. The `lhs` and `rhs` arguments
+    specify the input operands of the arithmetic operation. The types of `lhs`
+    and `rhs` must be the same.
+
+    `cir.binop.overflow` produces two SSA values. `result` is the result of the
+    arithmetic operation truncated to its specified type. `overflow` is a
+    boolean value indicating whether overflow happens during the operation.
+
+    The exact semantic of this operation is as follows:
+
+      - `lhs` and `rhs` are promoted to an imaginary integral type that has
+        infinite precision.
+      - The arithmetic operation is performed on the promoted operands.
+      - The infinite-precision result is truncated to the type of `result`. The
+        truncated result is assigned to `result`.
+      - If the truncated result is equal to the un-truncated result, `overflow`
+        is assigned to false. Otherwise, `overflow` is assigned to true.
+  }];
+
+  let arguments = (ins Arg<BinOpOverflowKind, "arithmetic kind">:$kind,
+                       CIR_IntType:$lhs, CIR_IntType:$rhs);
+  let results = (outs CIR_IntType:$result, CIR_BoolType:$overflow);
+
+  let assemblyFormat = [{
+    `(` $kind `,` $lhs `,` $rhs `)` `:` type($lhs) `,`
+    `(` type($result) `,` type($overflow) `)`
+    attr-dict
+  }];
+
+  let builders = [
+    OpBuilder<(ins "cir::IntType":$resultTy,
+                   "cir::BinOpOverflowKind":$kind,
+                   "mlir::Value":$lhs,
+                   "mlir::Value":$rhs), [{
+      auto overflowTy = cir::BoolType::get($_builder.getContext());
+      build($_builder, $_state, resultTy, overflowTy, kind, lhs, rhs);
+    }]>
+  ];
+}
+
 //===----------------------------------------------------------------------===//
 // GlobalOp
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/CIR/MissingFeatures.h b/clang/include/clang/CIR/MissingFeatures.h
index 3a102d90aba8f..338006b351da0 100644
--- a/clang/include/clang/CIR/MissingFeatures.h
+++ b/clang/include/clang/CIR/MissingFeatures.h
@@ -109,6 +109,7 @@ struct MissingFeatures {
   static bool cgFPOptionsRAII() { return false; }
   static bool metaDataNode() { return false; }
   static bool fastMathFlags() { return false; }
+  static bool vlas() { return false; }
 
   // Missing types
   static bool dataMemberType() { return false; }
diff --git a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
index 52bd3b2933744..6e1f5aa0dc600 100644
--- a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
@@ -936,8 +936,107 @@ getUnwidenedIntegerType(const ASTContext &astContext, const Expr *e) {
 static mlir::Value emitPointerArithmetic(CIRGenFunction &cgf,
                                          const BinOpInfo &op,
                                          bool isSubtraction) {
-  cgf.cgm.errorNYI(op.loc, "pointer arithmetic");
-  return {};
+  // Must have binary (not unary) expr here.  Unary pointer
+  // increment/decrement doesn't use this path.
+  const BinaryOperator *expr = cast<BinaryOperator>(op.e);
+
+  mlir::Value pointer = op.lhs;
+  Expr *pointerOperand = expr->getLHS();
+  mlir::Value index = op.rhs;
+  Expr *indexOperand = expr->getRHS();
+
+  // In a subtraction, the LHS is always the pointer.
+  if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {
+    std::swap(pointer, index);
+    std::swap(pointerOperand, indexOperand);
+  }
+
+  bool isSigned = indexOperand->getType()->isSignedIntegerOrEnumerationType();
+
+  // Some versions of glibc and gcc use idioms (particularly in their malloc
+  // routines) that add a pointer-sized integer (known to be a pointer value)
+  // to a null pointer in order to cast the value back to an integer or as
+  // part of a pointer alignment algorithm.  This is undefined behavior, but
+  // we'd like to be able to compile programs that use it.
+  //
+  // Normally, we'd generate a GEP with a null-pointer base here in response
+  // to that code, but it's also UB to dereference a pointer created that
+  // way.  Instead (as an acknowledged hack to tolerate the idiom) we will
+  // generate a direct cast of the integer value to a pointer.
+  //
+  // The idiom (p = nullptr + N) is not met if any of the following are true:
+  //
+  //   The operation is subtraction.
+  //   The index is not pointer-sized.
+  //   The pointer type is not byte-sized.
+  //
+  if (BinaryOperator::isNullPointerArithmeticExtension(
+          cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
+    return cgf.getBuilder().createIntToPtr(index, pointer.getType());
+
+  // Differently from LLVM codegen, ABI bits for index sizes is handled during
+  // LLVM lowering.
+
+  // If this is subtraction, negate the index.
+  if (isSubtraction)
+    index = cgf.getBuilder().createNeg(index);
+
+  if (cgf.sanOpts.has(SanitizerKind::ArrayBounds))
+    cgf.cgm.errorNYI("array bounds sanitizer");
+
+  const PointerType *pointerType =
+      pointerOperand->getType()->getAs<PointerType>();
+  if (!pointerType) {
+    cgf.cgm.errorNYI("ObjC");
+    return {};
+  }
+
+  QualType elementType = pointerType->getPointeeType();
+  if (const VariableArrayType *vla =
+          cgf.getContext().getAsVariableArrayType(elementType)) {
+
+    // The element count here is the total number of non-VLA elements.
+    // TODO(cir): Get correct VLA size here
+    assert(!cir::MissingFeatures::vlas());
+    mlir::Value numElements = cgf.getBuilder().getConstAPInt(
+        cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));
+
+    // GEP indexes are signed, and scaling an index isn't permitted to
+    // signed-overflow, so we use the same semantics for our explicit
+    // multiply.  We suppress this if overflow is not undefined behavior.
+    mlir::Type elemTy = cgf.convertTypeForMem(vla->getElementType());
+
+    index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
+                                        numElements.getType());
+    index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);
+
+    if (cgf.getLangOpts().isSignedOverflowDefined()) {
+      assert(!cir::MissingFeatures::ptrStrideOp());
+      cgf.cgm.errorNYI("pointer stride");
+    } else {
+      pointer = cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                           isSubtraction, op.e->getExprLoc());
+    }
+
+    return pointer;
+  }
+  // Explicitly handle GNU void* and function pointer arithmetic extensions. The
+  // GNU void* casts amount to no-ops since our void* type is i8*, but this is
+  // future proof.
+  mlir::Type elemTy;
+  if (elementType->isVoidType() || elementType->isFunctionType())
+    elemTy = cgf.UInt8Ty;
+  else
+    elemTy = cgf.convertTypeForMem(elementType);
+
+  if (cgf.getLangOpts().isSignedOverflowDefined()) {
+    assert(!cir::MissingFeatures::ptrStrideOp());
+    cgf.cgm.errorNYI("pointer stride");
+    return pointer;
+  }
+
+  return cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                    isSubtraction, op.e->getExprLoc());
 }
 
 mlir::Value ScalarExprEmitter::emitMul(const BinOpInfo &ops) {
@@ -1106,7 +1205,7 @@ mlir::Value ScalarExprEmitter::emitSub(const BinOpInfo &ops) {
   // See more in `EmitSub` in CGExprScalar.cpp.
   assert(!cir::MissingFeatures::ptrDiffOp());
   cgf.cgm.errorNYI("ptrdiff");
-  return {};
+  return cgf.createDummyValue(loc, ops.fullType);
 }
 
 mlir::Value ScalarExprEmitter::emitShl(const BinOpInfo &ops) {
@@ -1402,3 +1501,23 @@ mlir::Value CIRGenFunction::emitScalarPrePostIncDec(const UnaryOperator *e,
   return ScalarExprEmitter(*this, builder)
       .emitScalarPrePostIncDec(e, lv, isInc, isPre);
 }
+
+mlir::Value CIRGenFunction::emitCheckedInBoundsGEP(
+    mlir::Type elemTy, mlir::Value ptr, ArrayRef<mlir::Value> idxList,
+    bool signedIndices, bool isSubtraction, SourceLocation loc) {
+  assert(!cir::MissingFeatures::ptrStrideOp());
+  if (idxList.size() != 1)
+    cgm.errorNYI("multi-index ptr arithmetic");
+
+  // TODO(cir): This should be a PtrStrideOp. For now we simply return the base
+  // pointer
+  mlir::Value gepVal = ptr;
+
+  // If the pointer overflow sanitizer isn't enabled, do nothing.
+  if (!sanOpts.has(SanitizerKind::PointerOverflow))
+    return gepVal;
+
+  assert(!cir::MissingFeatures::pointerOverflowSanitizer());
+  cgm.errorNYI("pointer overflow sanitizer");
+  return gepVal;
+}
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h b/clang/lib/CIR/CodeGen/CIRGenFunction.h
index cc04610f23fcb..0cb6c72710d3e 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.h
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h
@@ -223,6 +223,16 @@ class CIRGenFunction : public CIRGenTypeCache {
 
   void emitDecl(const clang::Decl &d);
 
+  /// Same as IRBuilder::CreateInBoundsGEP, but additionally emits a check to
+  /// detect undefined behavior when the pointer overflow sanitizer is enabled.
+  /// \p SignedIndices indicates whether any of the GEP indices are signed.
+  /// \p IsSubtraction indicates whether the expression used to form the GEP
+  /// is a subtraction.
+  mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,
+                                     llvm::ArrayRef<mlir::Value> idxList,
+                                     bool signedIndices, bool isSubtraction,
+                                     SourceLocation loc);
+
   void emitScalarInit(const clang::Expr *init, mlir::Location loc,
                       LValue lvalue, bool capturedByInit = false);
 
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index a16840cc6bfef..b77750798f1ed 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -20,6 +20,7 @@
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/IR/BuiltinDialect.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/Types.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Target/LLVMIR/Dialect/Builtin/BuiltinToLLVMIRTranslation.h"
@@ -1117,6 +1118,122 @@ mlir::LogicalResult CIRToLLVMBinOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
+mlir::LogicalResult CIRToLLVMBinOpOverflowOpLowering::matchAndRewrite(
+    cir::BinOpOverflowOp op, OpAdaptor adaptor,
+    mlir::ConversionPatternRewriter &rewriter) const {
+  mlir::Location loc = op.getLoc();
+  BinOpOverflowKind arithKind = op.getKind();
+  IntType operandTy = op.getLhs().getType();
+  IntType resultTy = op.getResult().getType();
+
+  EncompassedTypeInfo encompassedTyInfo =
+      computeEncompassedTypeWidth(operandTy, resultTy);
+  mlir::IntegerType encompassedLLVMTy =
+      rewriter.getIntegerType(encompassedTyInfo.width);
+
+  mlir::Value lhs = adaptor.getLhs();
+  mlir::Value rhs = adaptor.getRhs();
+  if (operandTy.getWidth() < encompassedTyInfo.width) {
+    if (operandTy.isSigned()) {
+      lhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, rhs);
+    } else {
+      lhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, rhs);
+    }
+  }
+
+  std::string intrinName = getLLVMIntrinName(arithKind, encompassedTyInfo.sign,
+                                             encompassedTyInfo.width);
+  auto intrinNameAttr = mlir::StringAttr::get(op.getContext(), intrinName);
+
+  mlir::IntegerType overflowLLVMTy = rewriter.getI1Type();
+  auto intrinRetTy = mlir::LLVM::LLVMStructType::getLiteral(
+      rewriter.getContext(), {encompassedLLVMTy, overflowLLVMTy});
+
+  auto callLLVMIntrinOp = rewriter.create<mlir::LLVM::CallIntrinsicOp>(
+      loc, intrinRetTy, intrinNameAttr, mlir::ValueRange{lhs, rhs});
+  mlir::Value intrinRet = callLLVMIntrinOp.getResult(0);
+
+  mlir::Value result = rewriter
+                           .create<mlir::LLVM::ExtractValueOp>(
+                               loc, intrinRet, ArrayRef<int64_t>{0})
+                           .getResult();
+  mlir::Value overflow = rewriter
+                             .create<mlir::LLVM::ExtractValueOp>(
+                                 loc, intrinRet, ArrayRef<int64_t>{1})
+                             .getResult();
+
+  if (resultTy.getWidth() < encompassedTyInfo.width) {
+    mlir::Type resultLLVMTy = getTypeConverter()->convertType(resultTy);
+    auto truncResult =
+        rewriter.create<mlir::LLVM::TruncOp>(loc, resultLLVMTy, result);
+
+    // Extend the truncated result back to the encompassing type to check for
+    // any overflows during the truncation.
+    mlir::Value truncResultExt;
+    if (resultTy.isSigned())
+      truncResultExt = rewriter.create<mlir::LLVM::SExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    else
+      truncResultExt = rewriter.create<mlir::LLVM::ZExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    auto truncOverflow = rewriter.create<mlir::LLVM::ICmpOp>(
+        loc, mlir::LLVM::ICmpPredicate::ne, truncResultExt, result);
+
+    result = truncResult;
+    overflow = rewriter.create<mlir::LLVM::OrOp>(loc, overflow, truncOverflow);
+  }
+
+  mlir::Type boolLLVMTy =
+      getTypeConverter()->convertType(op.getOverflow().getType());
+  if (boolLLVMTy != rewriter.getI1Type())
+    overflow = rewriter.create<mlir::LLVM::ZExtOp>(loc, boolLLVMTy, overflow);
+
+  rewriter.replaceOp(op, mlir::ValueRange{result, overflow});
+
+  return mlir::success();
+}
+
+std::string CIRToLLVMBinOpOverflowOpLowering::getLLVMIntrinName(
+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";
+
+  if (isSigned)
+    name.push_back('s');
+  else
+    name.push_back('u');
+
+  switch (opKind) {
+  case cir::BinOpOverflowKind::Add:
+    name.append("add.");
+    break;
+  case cir::BinOpOverflowKind::Sub:
+    name.append("sub.");
+    break;
+  case cir::BinOpOverflowKind::Mul:
+    name.append("mul.");
+    break;
+  }
+
+  name.append("with.overflow.i");
+  name.append(std::to_string(width));
+
+  return name;
+}
+
+CIRToLLVMBinOpOverflowOpLowering::EncompassedTypeInfo
+CIRToLLVMBinOpOverflowOpLowering::computeEncompassedTypeWidth(
+    cir::IntType operandTy, cir::IntType resultTy) {
+  bool sign = operandTy.getIsSigned() || resultTy.getIsSigned();
+  unsigned width =
+      std::max(operandTy.getWidth() + (sign && operandTy.isUnsigned()),
+               resultTy.getWidth() + (sign && resultTy.isUnsigned()));
+  return {sign, width};
+}
+
 static void prepareTypeConverter(mlir::LLVMTypeConverter &converter,
                                  mlir::DataLayout &dataLayout) {
   converter.addConversion([&](cir::PointerType type) -> mlir::Type {
@@ -1256,6 +1373,7 @@ void ConvertCIRToLLVMPass::runOnOperation() {
   patterns.add<
       // clang-format off
                CIRToLLVMBinOpLowering,
+               CIRToLLVMBinOpOverflowOpLowering,
                CIRToLLVMBrCondOpLowering,
                CIRToLLVMBrOpLowering,
                CIRToLLVMFuncOpLowering,
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
index ef0bb2deaccdf..9fb8babe3dd6c 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
@@ -189,6 +189,28 @@ class CIRToLLVMBinOpLowering : public mlir::OpConversionPattern<cir::BinOp> {
                   mlir::ConversionPatternRewriter &) const override;
 };
 
+class CIRToLLVMBinOpOverflowOpLowering
+    : public mlir::OpConversionPattern<cir::BinOpOverflowOp> {
+public:
+  using mlir::OpConversionPattern<cir::BinOpOverflowOp>::OpConversionPattern;
+
+  mlir::LogicalResult
+  matchAndRewrite(cir::BinOpOverflowOp op, OpAdaptor,
+                  mlir::ConversionPatternRewriter &) const override;
+
+private:
+  static std::string getLLVMIntrinName(cir::BinOpOverflowKind opKind,
+                                       bool isSigned, unsigned width);
+
+  struct EncompassedTypeInfo {
+    bool sign;
+    unsigned width;
+  };
+
+  static EncompassedTypeInfo computeEncompassedTypeWidth(cir::IntType operandTy,
+                                                         cir::IntType resultTy);
+};
+
 class CIRToLLVMBrOpLowering : public mlir::OpConversionPattern<cir::BrOp> {
 public:
   using mlir::OpConversionPattern<cir::BrOp>::OpConversionPattern;
diff --git a/clang/test/CIR/CodeGen/binop.cpp b/clang/test/CIR/CodeGen/binop.cpp
index 4c20f79600fac..e617e54133e45 100644
--- a/clang/test/CIR/CodeGen/binop.cpp
+++ b/clang/test/CIR/CodeGen/binop.cpp
@@ -31,3 +31,13 @@ void testFloatingPointBinOps(float a, float b) {
   a - b;
   // CHECK: cir.binop(sub, %{{.+}}, %{{.+}}) : !cir.float
 }
+
+void testPointerArith(int *a, int *b, int c) {
+  auto add = a + c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+
+  auto sub = a - c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+}
\ No newline at end of file
diff --git a/clang/test/CIR/Lowering/binop-overflow.cir b/clang/test/CIR/Lowering/binop-overflow.cir
new file mode 100644
index 0000000000000..68af70aa6abb6
--- /dev/null
+++ b/clang/test/CIR/Lowering/binop-overflow.cir
@@ -0,0 +1,63 @@
+// RUN: cir-opt %s -cir-to-llvm -o - | FileCheck %s -check-prefix=MLIR
+// RUN: cir-translate %s -cir-to-llvmir --target x86_64-unknown-linux-gnu --disable-cc-lowering -o -  | FileCheck %s -check-prefix=LLVM
+
+!u32i = !cir.int<u, 32>
+!s32i = !cir.int<s, 32>
+
+module {
+  cir.func @test_add_u32_u32_u32(%lhs: !u32i, %rhs: !u32i, %res: !cir.ptr<!u32i>) -> !cir.bool {
+    %result, %overflow = cir.binop.overflow(add, %lhs, %rhs) : !u32i, (!u32i, !cir.bool)
+    cir.store %result, %res : !u32i, !cir.ptr<!u32i>
+    cir.return %overflow : !cir.bool
+  }
+
+  //      MLIR: llvm.func @test_add_u32_u32_u32(%[[LHS:.+]]: i32, %[[RHS:.+]]: i32, %[[RES_PTR:.+]]: !llvm.ptr) -> i1
+  // MLIR-NEXT:   %[[#INTRIN_RET:]] = llvm.call_intrinsic "llvm.uadd.with.overflow.i32"(%[[LHS]], %[[RHS]]) : (i32, i32) -> !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#RES:]] = llvm.extractvalue %[[#INTRIN_RET]][0] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#OVFL:]] = llvm.extractvalue %[[#INTRIN_RET]][1] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   llvm.store %[[#RES]], %[[RES_PTR]] {{.*}} : i32, !llvm.ptr
+  // MLIR-NEXT:   llvm.return %[[#OVFL]] : i1
+  // MLIR-NEXT: }
+
+  //      LLVM: define i1 @test_add_u32_u32_u32(i32 %[[#LHS:]], i32 %[[#RHS:]], ptr %[[#RES_PTR:]])
+  // LLVM-NEXT:   %[[#INTRIN_RET:]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %[[#LHS]], i32 %[[#RHS]])
+  // LLVM-NEXT:   %[[#RES:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 0
+  // LLVM-NEXT:   %[[#OVFL:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 1
+  // LLVM-NEXT:   store i32 %[[#RES]], ptr %[[#RES_PTR]], align 4
+  // LLVM-NEXT:   ret i1 %[[#OVFL]]
+  // LLVM-NEXT: }
+
+  cir.func ...
[truncated]

mmha · 2025-03-26T16:31:35Z

cc @andykaylor @erichkeane @dkolsen-pgi

mmha · 2025-03-26T16:33:55Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";


I wonder if this should use llvm::raw_string_ostream or Twine.

I don't think so, but it could use llvm::Intrinisc::getName() with logic to figure out the needed intrinsic ID instead of building the name as a composite.

Good point, incubator's CIRGenFunction::emitBuiltinExpr has a few examples on that

There is definitely value to at least doing a reserve here if the above suggestions don't help.

andykaylor · 2025-03-26T16:57:53Z

clang/lib/CIR/CodeGen/CIRGenFunction.h

+  /// \p SignedIndices indicates whether any of the GEP indices are signed.
+  /// \p IsSubtraction indicates whether the expression used to form the GEP
+  /// is a subtraction.
+  mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,


I'm reorganizing the emit functions in #133017. Can we hold this until that is merged?

andykaylor

I don't see the value of this PR right now. The pointer arithmetic isn't performing the correct arithmetic, and nothing generates the binop.overflow op. It would be more useful to remove the binop.overflow part and add PtrDiffOp.

andykaylor · 2025-03-26T20:58:21Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+          cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
+    return cgf.getBuilder().createIntToPtr(index, pointer.getType());
+
+  // Differently from LLVM codegen, ABI bits for index sizes is handled during


This comment could use some clarification. I was going to say that I didn't think the "Differently from LLVM codegen" part of the comment added anything useful, but then before I even submitted my comment I found I had to go to the LLVM IR codegen to see what the comment was talking about. Anyway, here's my suggested rewording.

"The LLVM IR codegen compares the size of the index variable to what the data layout requires, and extends it if necessary. Like other ABI-specific processing, this is deferred until lowering to the LLVM dialect."

andykaylor · 2025-03-26T21:10:38Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+    mlir::Value numElements = cgf.getBuilder().getConstAPInt(
+        cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));
+
+    // GEP indexes are signed, and scaling an index isn't permitted to


The first line of this comment from classic codegen was lost in the transition to the incubator. I think it's relevant and should be added here. The wording of the comment is also a bit awkward. How about this?

// Effectively, the multiply by the VLA size is part of the GEP.
// GEP indexes are signed, and signed-overflow while scaling an
// index isn't permitted, so we set the NSW flag on our explicit
// multiply if overflow is undefined behavior.

there are a few extra words perhaps in Andy's suggestion, but I'm generally in favor of clarifying this.

andykaylor · 2025-03-26T21:13:49Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+
+    index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
+                                        numElements.getType());
+    index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);


Based on the comment above (with or without my rewording), this should sunk into the if-else below and should be createNSWMul in the else case. That's what happens in classic codegen.

andykaylor · 2025-03-26T21:25:51Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+  // future proof.
+  mlir::Type elemTy;
+  if (elementType->isVoidType() || elementType->isFunctionType())
+    elemTy = cgf.UInt8Ty;


This looks a bit suspect. Before the transition to opaque pointers in ClangIR, the code here in the classic codegen did this:

if (elementType->isVoidType() || elementType->isFunctionType()) { Value *result = CGF.EmitCastToVoidPtr(pointer); result = CGF.Builder.CreateGEP(CGF.Int8Ty, result, index, "add.ptr"); return CGF.Builder.CreateBitCast(result, pointer->getType()); }

That looks much more consistent with the comment above. The original code here was added in this commit:

42a8cd3

We should probably look at the test cases there to make sure this is doing the right thing.

andykaylor · 2025-03-26T21:29:26Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+
+  // TODO(cir): This should be a PtrStrideOp. For now we simply return the base
+  // pointer
+  mlir::Value gepVal = ptr;


It seems like we should be emitting an NYI error here.

andykaylor · 2025-03-26T21:35:58Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";


I don't think so, but it could use llvm::Intrinisc::getName() with logic to figure out the needed intrinsic ID instead of building the name as a composite.

erichkeane · 2025-04-01T14:32:29Z

clang/include/clang/CIR/MissingFeatures.h

@@ -109,6 +109,7 @@ struct MissingFeatures {
  static bool cgFPOptionsRAII() { return false; }
  static bool metaDataNode() { return false; }
  static bool fastMathFlags() { return false; }
+  static bool vlas() { return false; }


Suggested change

static bool vlas() { return false; }

static bool variableLengthArrays() { return false; }

vla has a handful of other meanings that this could get confused for. Also, it gives me an immediate gag-reflex to see vla in code, so spelling it out eases me into them better :)

erichkeane · 2025-04-01T14:33:53Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+  Expr *indexOperand = expr->getRHS();
+
+  // In a subtraction, the LHS is always the pointer.
+  if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {


In the case of subtraction index is no longer accurate either, right? Since some_ptr - other_ptr is valid.

erichkeane · 2025-04-01T14:35:36Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+
+  // If this is subtraction, negate the index.
+  if (isSubtraction)
+    index = cgf.getBuilder().createNeg(index);


as I mentioned above, this assumes that the index is integral, which isn't necessarily true. Or did we do a conversion somewhere?

erichkeane · 2025-04-01T14:36:07Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+  const PointerType *pointerType =
+      pointerOperand->getType()->getAs<PointerType>();
+  if (!pointerType) {
+    cgf.cgm.errorNYI("ObjC");


would love this to be slightly more detailed of a message here. Just saying ObjC is pretty vague.

erichkeane · 2025-04-01T14:36:54Z

clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp

+    mlir::Value numElements = cgf.getBuilder().getConstAPInt(
+        cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));
+
+    // GEP indexes are signed, and scaling an index isn't permitted to


there are a few extra words perhaps in Andy's suggestion, but I'm generally in favor of clarifying this.

erichkeane · 2025-04-01T14:39:36Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";


There is definitely value to at least doing a reserve here if the above suggestions don't help.

mmha added 3 commits March 26, 2025 14:32

Revert "Remove BinOpOverflowOp and pointer arithmetic"

27e177c

This reverts commit 56fbcba.

Revert "[CIR] Remove failing binop test (llvm#133001)"

efac480

This reverts commit df01131.

Some cleanup

919b6d7

mmha requested review from lanza and bcardosolopes as code owners March 26, 2025 16:30

llvmbot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels Mar 26, 2025

mmha commented Mar 26, 2025

View reviewed changes

andykaylor reviewed Mar 26, 2025

View reviewed changes

erichkeane reviewed Apr 1, 2025

View reviewed changes

	static bool vlas() { return false; }
	static bool variableLengthArrays() { return false; }

[CIR] Add BinOpOverflowOp and basic pointer arithmetic support #133118

Are you sure you want to change the base?

[CIR] Add BinOpOverflowOp and basic pointer arithmetic support #133118

Uh oh!

Conversation

mmha commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025

Uh oh!

mmha commented Mar 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andykaylor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!