Skip to content

[CIR] Add BinOpOverflowOp and basic pointer arithmetic support #133118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mmha
Copy link
Contributor

@mmha mmha commented Mar 26, 2025

This was part of #132420 originally but split off to reduce the diff size.

This patch adds support for overflowing binary operators and all the plumbing required for pointer arithmetic except for the actual Ops (PtrDiffOp and PtrStrideOp - we emit dummy values instead).

@llvmbot llvmbot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels Mar 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-clang

Author: Morris Hafner (mmha)

Changes

This was part of #132420 originally but split off to reduce the diff size.

This patch adds support for overflowing binary operators and all the plumbing required for pointer arithmetic except for the actual Ops (PtrDiffOp and PtrStrideOp - we emit dummy values instead).


Patch is 22.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133118.diff

8 Files Affected:

  • (modified) clang/include/clang/CIR/Dialect/IR/CIROps.td (+59)
  • (modified) clang/include/clang/CIR/MissingFeatures.h (+1)
  • (modified) clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp (+122-3)
  • (modified) clang/lib/CIR/CodeGen/CIRGenFunction.h (+10)
  • (modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp (+118)
  • (modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h (+22)
  • (modified) clang/test/CIR/CodeGen/binop.cpp (+10)
  • (added) clang/test/CIR/Lowering/binop-overflow.cir (+63)
diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 455cc2b8b0277..3d49f88b032cb 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -889,6 +889,65 @@ def BinOp : CIR_Op<"binop", [Pure,
   let hasVerifier = 1;
 }
 
+
+//===----------------------------------------------------------------------===//
+// BinOpOverflowOp
+//===----------------------------------------------------------------------===//
+
+def BinOpOverflowKind : I32EnumAttr<
+    "BinOpOverflowKind",
+    "checked binary arithmetic operation kind",
+    [BinOpKind_Add, BinOpKind_Sub, BinOpKind_Mul]> {
+  let cppNamespace = "::cir";
+}
+
+def BinOpOverflowOp : CIR_Op<"binop.overflow", [Pure, SameTypeOperands]> {
+  let summary = "Perform binary integral arithmetic with overflow checking";
+  let description = [{
+    `cir.binop.overflow` performs binary arithmetic operations with overflow
+    checking on integral operands.
+
+    The `kind` argument specifies the kind of arithmetic operation to perform.
+    It can be either `add`, `sub`, or `mul`. The `lhs` and `rhs` arguments
+    specify the input operands of the arithmetic operation. The types of `lhs`
+    and `rhs` must be the same.
+
+    `cir.binop.overflow` produces two SSA values. `result` is the result of the
+    arithmetic operation truncated to its specified type. `overflow` is a
+    boolean value indicating whether overflow happens during the operation.
+
+    The exact semantic of this operation is as follows:
+
+      - `lhs` and `rhs` are promoted to an imaginary integral type that has
+        infinite precision.
+      - The arithmetic operation is performed on the promoted operands.
+      - The infinite-precision result is truncated to the type of `result`. The
+        truncated result is assigned to `result`.
+      - If the truncated result is equal to the un-truncated result, `overflow`
+        is assigned to false. Otherwise, `overflow` is assigned to true.
+  }];
+
+  let arguments = (ins Arg<BinOpOverflowKind, "arithmetic kind">:$kind,
+                       CIR_IntType:$lhs, CIR_IntType:$rhs);
+  let results = (outs CIR_IntType:$result, CIR_BoolType:$overflow);
+
+  let assemblyFormat = [{
+    `(` $kind `,` $lhs `,` $rhs `)` `:` type($lhs) `,`
+    `(` type($result) `,` type($overflow) `)`
+    attr-dict
+  }];
+
+  let builders = [
+    OpBuilder<(ins "cir::IntType":$resultTy,
+                   "cir::BinOpOverflowKind":$kind,
+                   "mlir::Value":$lhs,
+                   "mlir::Value":$rhs), [{
+      auto overflowTy = cir::BoolType::get($_builder.getContext());
+      build($_builder, $_state, resultTy, overflowTy, kind, lhs, rhs);
+    }]>
+  ];
+}
+
 //===----------------------------------------------------------------------===//
 // GlobalOp
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/CIR/MissingFeatures.h b/clang/include/clang/CIR/MissingFeatures.h
index 3a102d90aba8f..338006b351da0 100644
--- a/clang/include/clang/CIR/MissingFeatures.h
+++ b/clang/include/clang/CIR/MissingFeatures.h
@@ -109,6 +109,7 @@ struct MissingFeatures {
   static bool cgFPOptionsRAII() { return false; }
   static bool metaDataNode() { return false; }
   static bool fastMathFlags() { return false; }
+  static bool vlas() { return false; }
 
   // Missing types
   static bool dataMemberType() { return false; }
diff --git a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
index 52bd3b2933744..6e1f5aa0dc600 100644
--- a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
@@ -936,8 +936,107 @@ getUnwidenedIntegerType(const ASTContext &astContext, const Expr *e) {
 static mlir::Value emitPointerArithmetic(CIRGenFunction &cgf,
                                          const BinOpInfo &op,
                                          bool isSubtraction) {
-  cgf.cgm.errorNYI(op.loc, "pointer arithmetic");
-  return {};
+  // Must have binary (not unary) expr here.  Unary pointer
+  // increment/decrement doesn't use this path.
+  const BinaryOperator *expr = cast<BinaryOperator>(op.e);
+
+  mlir::Value pointer = op.lhs;
+  Expr *pointerOperand = expr->getLHS();
+  mlir::Value index = op.rhs;
+  Expr *indexOperand = expr->getRHS();
+
+  // In a subtraction, the LHS is always the pointer.
+  if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {
+    std::swap(pointer, index);
+    std::swap(pointerOperand, indexOperand);
+  }
+
+  bool isSigned = indexOperand->getType()->isSignedIntegerOrEnumerationType();
+
+  // Some versions of glibc and gcc use idioms (particularly in their malloc
+  // routines) that add a pointer-sized integer (known to be a pointer value)
+  // to a null pointer in order to cast the value back to an integer or as
+  // part of a pointer alignment algorithm.  This is undefined behavior, but
+  // we'd like to be able to compile programs that use it.
+  //
+  // Normally, we'd generate a GEP with a null-pointer base here in response
+  // to that code, but it's also UB to dereference a pointer created that
+  // way.  Instead (as an acknowledged hack to tolerate the idiom) we will
+  // generate a direct cast of the integer value to a pointer.
+  //
+  // The idiom (p = nullptr + N) is not met if any of the following are true:
+  //
+  //   The operation is subtraction.
+  //   The index is not pointer-sized.
+  //   The pointer type is not byte-sized.
+  //
+  if (BinaryOperator::isNullPointerArithmeticExtension(
+          cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
+    return cgf.getBuilder().createIntToPtr(index, pointer.getType());
+
+  // Differently from LLVM codegen, ABI bits for index sizes is handled during
+  // LLVM lowering.
+
+  // If this is subtraction, negate the index.
+  if (isSubtraction)
+    index = cgf.getBuilder().createNeg(index);
+
+  if (cgf.sanOpts.has(SanitizerKind::ArrayBounds))
+    cgf.cgm.errorNYI("array bounds sanitizer");
+
+  const PointerType *pointerType =
+      pointerOperand->getType()->getAs<PointerType>();
+  if (!pointerType) {
+    cgf.cgm.errorNYI("ObjC");
+    return {};
+  }
+
+  QualType elementType = pointerType->getPointeeType();
+  if (const VariableArrayType *vla =
+          cgf.getContext().getAsVariableArrayType(elementType)) {
+
+    // The element count here is the total number of non-VLA elements.
+    // TODO(cir): Get correct VLA size here
+    assert(!cir::MissingFeatures::vlas());
+    mlir::Value numElements = cgf.getBuilder().getConstAPInt(
+        cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));
+
+    // GEP indexes are signed, and scaling an index isn't permitted to
+    // signed-overflow, so we use the same semantics for our explicit
+    // multiply.  We suppress this if overflow is not undefined behavior.
+    mlir::Type elemTy = cgf.convertTypeForMem(vla->getElementType());
+
+    index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
+                                        numElements.getType());
+    index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);
+
+    if (cgf.getLangOpts().isSignedOverflowDefined()) {
+      assert(!cir::MissingFeatures::ptrStrideOp());
+      cgf.cgm.errorNYI("pointer stride");
+    } else {
+      pointer = cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                           isSubtraction, op.e->getExprLoc());
+    }
+
+    return pointer;
+  }
+  // Explicitly handle GNU void* and function pointer arithmetic extensions. The
+  // GNU void* casts amount to no-ops since our void* type is i8*, but this is
+  // future proof.
+  mlir::Type elemTy;
+  if (elementType->isVoidType() || elementType->isFunctionType())
+    elemTy = cgf.UInt8Ty;
+  else
+    elemTy = cgf.convertTypeForMem(elementType);
+
+  if (cgf.getLangOpts().isSignedOverflowDefined()) {
+    assert(!cir::MissingFeatures::ptrStrideOp());
+    cgf.cgm.errorNYI("pointer stride");
+    return pointer;
+  }
+
+  return cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                    isSubtraction, op.e->getExprLoc());
 }
 
 mlir::Value ScalarExprEmitter::emitMul(const BinOpInfo &ops) {
@@ -1106,7 +1205,7 @@ mlir::Value ScalarExprEmitter::emitSub(const BinOpInfo &ops) {
   // See more in `EmitSub` in CGExprScalar.cpp.
   assert(!cir::MissingFeatures::ptrDiffOp());
   cgf.cgm.errorNYI("ptrdiff");
-  return {};
+  return cgf.createDummyValue(loc, ops.fullType);
 }
 
 mlir::Value ScalarExprEmitter::emitShl(const BinOpInfo &ops) {
@@ -1402,3 +1501,23 @@ mlir::Value CIRGenFunction::emitScalarPrePostIncDec(const UnaryOperator *e,
   return ScalarExprEmitter(*this, builder)
       .emitScalarPrePostIncDec(e, lv, isInc, isPre);
 }
+
+mlir::Value CIRGenFunction::emitCheckedInBoundsGEP(
+    mlir::Type elemTy, mlir::Value ptr, ArrayRef<mlir::Value> idxList,
+    bool signedIndices, bool isSubtraction, SourceLocation loc) {
+  assert(!cir::MissingFeatures::ptrStrideOp());
+  if (idxList.size() != 1)
+    cgm.errorNYI("multi-index ptr arithmetic");
+
+  // TODO(cir): This should be a PtrStrideOp. For now we simply return the base
+  // pointer
+  mlir::Value gepVal = ptr;
+
+  // If the pointer overflow sanitizer isn't enabled, do nothing.
+  if (!sanOpts.has(SanitizerKind::PointerOverflow))
+    return gepVal;
+
+  assert(!cir::MissingFeatures::pointerOverflowSanitizer());
+  cgm.errorNYI("pointer overflow sanitizer");
+  return gepVal;
+}
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h b/clang/lib/CIR/CodeGen/CIRGenFunction.h
index cc04610f23fcb..0cb6c72710d3e 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.h
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h
@@ -223,6 +223,16 @@ class CIRGenFunction : public CIRGenTypeCache {
 
   void emitDecl(const clang::Decl &d);
 
+  /// Same as IRBuilder::CreateInBoundsGEP, but additionally emits a check to
+  /// detect undefined behavior when the pointer overflow sanitizer is enabled.
+  /// \p SignedIndices indicates whether any of the GEP indices are signed.
+  /// \p IsSubtraction indicates whether the expression used to form the GEP
+  /// is a subtraction.
+  mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,
+                                     llvm::ArrayRef<mlir::Value> idxList,
+                                     bool signedIndices, bool isSubtraction,
+                                     SourceLocation loc);
+
   void emitScalarInit(const clang::Expr *init, mlir::Location loc,
                       LValue lvalue, bool capturedByInit = false);
 
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index a16840cc6bfef..b77750798f1ed 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -20,6 +20,7 @@
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/IR/BuiltinDialect.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/Types.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Target/LLVMIR/Dialect/Builtin/BuiltinToLLVMIRTranslation.h"
@@ -1117,6 +1118,122 @@ mlir::LogicalResult CIRToLLVMBinOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
+mlir::LogicalResult CIRToLLVMBinOpOverflowOpLowering::matchAndRewrite(
+    cir::BinOpOverflowOp op, OpAdaptor adaptor,
+    mlir::ConversionPatternRewriter &rewriter) const {
+  mlir::Location loc = op.getLoc();
+  BinOpOverflowKind arithKind = op.getKind();
+  IntType operandTy = op.getLhs().getType();
+  IntType resultTy = op.getResult().getType();
+
+  EncompassedTypeInfo encompassedTyInfo =
+      computeEncompassedTypeWidth(operandTy, resultTy);
+  mlir::IntegerType encompassedLLVMTy =
+      rewriter.getIntegerType(encompassedTyInfo.width);
+
+  mlir::Value lhs = adaptor.getLhs();
+  mlir::Value rhs = adaptor.getRhs();
+  if (operandTy.getWidth() < encompassedTyInfo.width) {
+    if (operandTy.isSigned()) {
+      lhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, rhs);
+    } else {
+      lhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, rhs);
+    }
+  }
+
+  std::string intrinName = getLLVMIntrinName(arithKind, encompassedTyInfo.sign,
+                                             encompassedTyInfo.width);
+  auto intrinNameAttr = mlir::StringAttr::get(op.getContext(), intrinName);
+
+  mlir::IntegerType overflowLLVMTy = rewriter.getI1Type();
+  auto intrinRetTy = mlir::LLVM::LLVMStructType::getLiteral(
+      rewriter.getContext(), {encompassedLLVMTy, overflowLLVMTy});
+
+  auto callLLVMIntrinOp = rewriter.create<mlir::LLVM::CallIntrinsicOp>(
+      loc, intrinRetTy, intrinNameAttr, mlir::ValueRange{lhs, rhs});
+  mlir::Value intrinRet = callLLVMIntrinOp.getResult(0);
+
+  mlir::Value result = rewriter
+                           .create<mlir::LLVM::ExtractValueOp>(
+                               loc, intrinRet, ArrayRef<int64_t>{0})
+                           .getResult();
+  mlir::Value overflow = rewriter
+                             .create<mlir::LLVM::ExtractValueOp>(
+                                 loc, intrinRet, ArrayRef<int64_t>{1})
+                             .getResult();
+
+  if (resultTy.getWidth() < encompassedTyInfo.width) {
+    mlir::Type resultLLVMTy = getTypeConverter()->convertType(resultTy);
+    auto truncResult =
+        rewriter.create<mlir::LLVM::TruncOp>(loc, resultLLVMTy, result);
+
+    // Extend the truncated result back to the encompassing type to check for
+    // any overflows during the truncation.
+    mlir::Value truncResultExt;
+    if (resultTy.isSigned())
+      truncResultExt = rewriter.create<mlir::LLVM::SExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    else
+      truncResultExt = rewriter.create<mlir::LLVM::ZExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    auto truncOverflow = rewriter.create<mlir::LLVM::ICmpOp>(
+        loc, mlir::LLVM::ICmpPredicate::ne, truncResultExt, result);
+
+    result = truncResult;
+    overflow = rewriter.create<mlir::LLVM::OrOp>(loc, overflow, truncOverflow);
+  }
+
+  mlir::Type boolLLVMTy =
+      getTypeConverter()->convertType(op.getOverflow().getType());
+  if (boolLLVMTy != rewriter.getI1Type())
+    overflow = rewriter.create<mlir::LLVM::ZExtOp>(loc, boolLLVMTy, overflow);
+
+  rewriter.replaceOp(op, mlir::ValueRange{result, overflow});
+
+  return mlir::success();
+}
+
+std::string CIRToLLVMBinOpOverflowOpLowering::getLLVMIntrinName(
+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";
+
+  if (isSigned)
+    name.push_back('s');
+  else
+    name.push_back('u');
+
+  switch (opKind) {
+  case cir::BinOpOverflowKind::Add:
+    name.append("add.");
+    break;
+  case cir::BinOpOverflowKind::Sub:
+    name.append("sub.");
+    break;
+  case cir::BinOpOverflowKind::Mul:
+    name.append("mul.");
+    break;
+  }
+
+  name.append("with.overflow.i");
+  name.append(std::to_string(width));
+
+  return name;
+}
+
+CIRToLLVMBinOpOverflowOpLowering::EncompassedTypeInfo
+CIRToLLVMBinOpOverflowOpLowering::computeEncompassedTypeWidth(
+    cir::IntType operandTy, cir::IntType resultTy) {
+  bool sign = operandTy.getIsSigned() || resultTy.getIsSigned();
+  unsigned width =
+      std::max(operandTy.getWidth() + (sign && operandTy.isUnsigned()),
+               resultTy.getWidth() + (sign && resultTy.isUnsigned()));
+  return {sign, width};
+}
+
 static void prepareTypeConverter(mlir::LLVMTypeConverter &converter,
                                  mlir::DataLayout &dataLayout) {
   converter.addConversion([&](cir::PointerType type) -> mlir::Type {
@@ -1256,6 +1373,7 @@ void ConvertCIRToLLVMPass::runOnOperation() {
   patterns.add<
       // clang-format off
                CIRToLLVMBinOpLowering,
+               CIRToLLVMBinOpOverflowOpLowering,
                CIRToLLVMBrCondOpLowering,
                CIRToLLVMBrOpLowering,
                CIRToLLVMFuncOpLowering,
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
index ef0bb2deaccdf..9fb8babe3dd6c 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
@@ -189,6 +189,28 @@ class CIRToLLVMBinOpLowering : public mlir::OpConversionPattern<cir::BinOp> {
                   mlir::ConversionPatternRewriter &) const override;
 };
 
+class CIRToLLVMBinOpOverflowOpLowering
+    : public mlir::OpConversionPattern<cir::BinOpOverflowOp> {
+public:
+  using mlir::OpConversionPattern<cir::BinOpOverflowOp>::OpConversionPattern;
+
+  mlir::LogicalResult
+  matchAndRewrite(cir::BinOpOverflowOp op, OpAdaptor,
+                  mlir::ConversionPatternRewriter &) const override;
+
+private:
+  static std::string getLLVMIntrinName(cir::BinOpOverflowKind opKind,
+                                       bool isSigned, unsigned width);
+
+  struct EncompassedTypeInfo {
+    bool sign;
+    unsigned width;
+  };
+
+  static EncompassedTypeInfo computeEncompassedTypeWidth(cir::IntType operandTy,
+                                                         cir::IntType resultTy);
+};
+
 class CIRToLLVMBrOpLowering : public mlir::OpConversionPattern<cir::BrOp> {
 public:
   using mlir::OpConversionPattern<cir::BrOp>::OpConversionPattern;
diff --git a/clang/test/CIR/CodeGen/binop.cpp b/clang/test/CIR/CodeGen/binop.cpp
index 4c20f79600fac..e617e54133e45 100644
--- a/clang/test/CIR/CodeGen/binop.cpp
+++ b/clang/test/CIR/CodeGen/binop.cpp
@@ -31,3 +31,13 @@ void testFloatingPointBinOps(float a, float b) {
   a - b;
   // CHECK: cir.binop(sub, %{{.+}}, %{{.+}}) : !cir.float
 }
+
+void testPointerArith(int *a, int *b, int c) {
+  auto add = a + c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+
+  auto sub = a - c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+}
\ No newline at end of file
diff --git a/clang/test/CIR/Lowering/binop-overflow.cir b/clang/test/CIR/Lowering/binop-overflow.cir
new file mode 100644
index 0000000000000..68af70aa6abb6
--- /dev/null
+++ b/clang/test/CIR/Lowering/binop-overflow.cir
@@ -0,0 +1,63 @@
+// RUN: cir-opt %s -cir-to-llvm -o - | FileCheck %s -check-prefix=MLIR
+// RUN: cir-translate %s -cir-to-llvmir --target x86_64-unknown-linux-gnu --disable-cc-lowering -o -  | FileCheck %s -check-prefix=LLVM
+
+!u32i = !cir.int<u, 32>
+!s32i = !cir.int<s, 32>
+
+module {
+  cir.func @test_add_u32_u32_u32(%lhs: !u32i, %rhs: !u32i, %res: !cir.ptr<!u32i>) -> !cir.bool {
+    %result, %overflow = cir.binop.overflow(add, %lhs, %rhs) : !u32i, (!u32i, !cir.bool)
+    cir.store %result, %res : !u32i, !cir.ptr<!u32i>
+    cir.return %overflow : !cir.bool
+  }
+
+  //      MLIR: llvm.func @test_add_u32_u32_u32(%[[LHS:.+]]: i32, %[[RHS:.+]]: i32, %[[RES_PTR:.+]]: !llvm.ptr) -> i1
+  // MLIR-NEXT:   %[[#INTRIN_RET:]] = llvm.call_intrinsic "llvm.uadd.with.overflow.i32"(%[[LHS]], %[[RHS]]) : (i32, i32) -> !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#RES:]] = llvm.extractvalue %[[#INTRIN_RET]][0] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#OVFL:]] = llvm.extractvalue %[[#INTRIN_RET]][1] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   llvm.store %[[#RES]], %[[RES_PTR]] {{.*}} : i32, !llvm.ptr
+  // MLIR-NEXT:   llvm.return %[[#OVFL]] : i1
+  // MLIR-NEXT: }
+
+  //      LLVM: define i1 @test_add_u32_u32_u32(i32 %[[#LHS:]], i32 %[[#RHS:]], ptr %[[#RES_PTR:]])
+  // LLVM-NEXT:   %[[#INTRIN_RET:]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %[[#LHS]], i32 %[[#RHS]])
+  // LLVM-NEXT:   %[[#RES:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 0
+  // LLVM-NEXT:   %[[#OVFL:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 1
+  // LLVM-NEXT:   store i32 %[[#RES]], ptr %[[#RES_PTR]], align 4
+  // LLVM-NEXT:   ret i1 %[[#OVFL]]
+  // LLVM-NEXT: }
+
+  cir.func ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-clangir

Author: Morris Hafner (mmha)

Changes

This was part of #132420 originally but split off to reduce the diff size.

This patch adds support for overflowing binary operators and all the plumbing required for pointer arithmetic except for the actual Ops (PtrDiffOp and PtrStrideOp - we emit dummy values instead).


Patch is 22.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133118.diff

8 Files Affected:

  • (modified) clang/include/clang/CIR/Dialect/IR/CIROps.td (+59)
  • (modified) clang/include/clang/CIR/MissingFeatures.h (+1)
  • (modified) clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp (+122-3)
  • (modified) clang/lib/CIR/CodeGen/CIRGenFunction.h (+10)
  • (modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp (+118)
  • (modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h (+22)
  • (modified) clang/test/CIR/CodeGen/binop.cpp (+10)
  • (added) clang/test/CIR/Lowering/binop-overflow.cir (+63)
diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 455cc2b8b0277..3d49f88b032cb 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -889,6 +889,65 @@ def BinOp : CIR_Op<"binop", [Pure,
   let hasVerifier = 1;
 }
 
+
+//===----------------------------------------------------------------------===//
+// BinOpOverflowOp
+//===----------------------------------------------------------------------===//
+
+def BinOpOverflowKind : I32EnumAttr<
+    "BinOpOverflowKind",
+    "checked binary arithmetic operation kind",
+    [BinOpKind_Add, BinOpKind_Sub, BinOpKind_Mul]> {
+  let cppNamespace = "::cir";
+}
+
+def BinOpOverflowOp : CIR_Op<"binop.overflow", [Pure, SameTypeOperands]> {
+  let summary = "Perform binary integral arithmetic with overflow checking";
+  let description = [{
+    `cir.binop.overflow` performs binary arithmetic operations with overflow
+    checking on integral operands.
+
+    The `kind` argument specifies the kind of arithmetic operation to perform.
+    It can be either `add`, `sub`, or `mul`. The `lhs` and `rhs` arguments
+    specify the input operands of the arithmetic operation. The types of `lhs`
+    and `rhs` must be the same.
+
+    `cir.binop.overflow` produces two SSA values. `result` is the result of the
+    arithmetic operation truncated to its specified type. `overflow` is a
+    boolean value indicating whether overflow happens during the operation.
+
+    The exact semantic of this operation is as follows:
+
+      - `lhs` and `rhs` are promoted to an imaginary integral type that has
+        infinite precision.
+      - The arithmetic operation is performed on the promoted operands.
+      - The infinite-precision result is truncated to the type of `result`. The
+        truncated result is assigned to `result`.
+      - If the truncated result is equal to the un-truncated result, `overflow`
+        is assigned to false. Otherwise, `overflow` is assigned to true.
+  }];
+
+  let arguments = (ins Arg<BinOpOverflowKind, "arithmetic kind">:$kind,
+                       CIR_IntType:$lhs, CIR_IntType:$rhs);
+  let results = (outs CIR_IntType:$result, CIR_BoolType:$overflow);
+
+  let assemblyFormat = [{
+    `(` $kind `,` $lhs `,` $rhs `)` `:` type($lhs) `,`
+    `(` type($result) `,` type($overflow) `)`
+    attr-dict
+  }];
+
+  let builders = [
+    OpBuilder<(ins "cir::IntType":$resultTy,
+                   "cir::BinOpOverflowKind":$kind,
+                   "mlir::Value":$lhs,
+                   "mlir::Value":$rhs), [{
+      auto overflowTy = cir::BoolType::get($_builder.getContext());
+      build($_builder, $_state, resultTy, overflowTy, kind, lhs, rhs);
+    }]>
+  ];
+}
+
 //===----------------------------------------------------------------------===//
 // GlobalOp
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/CIR/MissingFeatures.h b/clang/include/clang/CIR/MissingFeatures.h
index 3a102d90aba8f..338006b351da0 100644
--- a/clang/include/clang/CIR/MissingFeatures.h
+++ b/clang/include/clang/CIR/MissingFeatures.h
@@ -109,6 +109,7 @@ struct MissingFeatures {
   static bool cgFPOptionsRAII() { return false; }
   static bool metaDataNode() { return false; }
   static bool fastMathFlags() { return false; }
+  static bool vlas() { return false; }
 
   // Missing types
   static bool dataMemberType() { return false; }
diff --git a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
index 52bd3b2933744..6e1f5aa0dc600 100644
--- a/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
@@ -936,8 +936,107 @@ getUnwidenedIntegerType(const ASTContext &astContext, const Expr *e) {
 static mlir::Value emitPointerArithmetic(CIRGenFunction &cgf,
                                          const BinOpInfo &op,
                                          bool isSubtraction) {
-  cgf.cgm.errorNYI(op.loc, "pointer arithmetic");
-  return {};
+  // Must have binary (not unary) expr here.  Unary pointer
+  // increment/decrement doesn't use this path.
+  const BinaryOperator *expr = cast<BinaryOperator>(op.e);
+
+  mlir::Value pointer = op.lhs;
+  Expr *pointerOperand = expr->getLHS();
+  mlir::Value index = op.rhs;
+  Expr *indexOperand = expr->getRHS();
+
+  // In a subtraction, the LHS is always the pointer.
+  if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {
+    std::swap(pointer, index);
+    std::swap(pointerOperand, indexOperand);
+  }
+
+  bool isSigned = indexOperand->getType()->isSignedIntegerOrEnumerationType();
+
+  // Some versions of glibc and gcc use idioms (particularly in their malloc
+  // routines) that add a pointer-sized integer (known to be a pointer value)
+  // to a null pointer in order to cast the value back to an integer or as
+  // part of a pointer alignment algorithm.  This is undefined behavior, but
+  // we'd like to be able to compile programs that use it.
+  //
+  // Normally, we'd generate a GEP with a null-pointer base here in response
+  // to that code, but it's also UB to dereference a pointer created that
+  // way.  Instead (as an acknowledged hack to tolerate the idiom) we will
+  // generate a direct cast of the integer value to a pointer.
+  //
+  // The idiom (p = nullptr + N) is not met if any of the following are true:
+  //
+  //   The operation is subtraction.
+  //   The index is not pointer-sized.
+  //   The pointer type is not byte-sized.
+  //
+  if (BinaryOperator::isNullPointerArithmeticExtension(
+          cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
+    return cgf.getBuilder().createIntToPtr(index, pointer.getType());
+
+  // Differently from LLVM codegen, ABI bits for index sizes is handled during
+  // LLVM lowering.
+
+  // If this is subtraction, negate the index.
+  if (isSubtraction)
+    index = cgf.getBuilder().createNeg(index);
+
+  if (cgf.sanOpts.has(SanitizerKind::ArrayBounds))
+    cgf.cgm.errorNYI("array bounds sanitizer");
+
+  const PointerType *pointerType =
+      pointerOperand->getType()->getAs<PointerType>();
+  if (!pointerType) {
+    cgf.cgm.errorNYI("ObjC");
+    return {};
+  }
+
+  QualType elementType = pointerType->getPointeeType();
+  if (const VariableArrayType *vla =
+          cgf.getContext().getAsVariableArrayType(elementType)) {
+
+    // The element count here is the total number of non-VLA elements.
+    // TODO(cir): Get correct VLA size here
+    assert(!cir::MissingFeatures::vlas());
+    mlir::Value numElements = cgf.getBuilder().getConstAPInt(
+        cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));
+
+    // GEP indexes are signed, and scaling an index isn't permitted to
+    // signed-overflow, so we use the same semantics for our explicit
+    // multiply.  We suppress this if overflow is not undefined behavior.
+    mlir::Type elemTy = cgf.convertTypeForMem(vla->getElementType());
+
+    index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
+                                        numElements.getType());
+    index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);
+
+    if (cgf.getLangOpts().isSignedOverflowDefined()) {
+      assert(!cir::MissingFeatures::ptrStrideOp());
+      cgf.cgm.errorNYI("pointer stride");
+    } else {
+      pointer = cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                           isSubtraction, op.e->getExprLoc());
+    }
+
+    return pointer;
+  }
+  // Explicitly handle GNU void* and function pointer arithmetic extensions. The
+  // GNU void* casts amount to no-ops since our void* type is i8*, but this is
+  // future proof.
+  mlir::Type elemTy;
+  if (elementType->isVoidType() || elementType->isFunctionType())
+    elemTy = cgf.UInt8Ty;
+  else
+    elemTy = cgf.convertTypeForMem(elementType);
+
+  if (cgf.getLangOpts().isSignedOverflowDefined()) {
+    assert(!cir::MissingFeatures::ptrStrideOp());
+    cgf.cgm.errorNYI("pointer stride");
+    return pointer;
+  }
+
+  return cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
+                                    isSubtraction, op.e->getExprLoc());
 }
 
 mlir::Value ScalarExprEmitter::emitMul(const BinOpInfo &ops) {
@@ -1106,7 +1205,7 @@ mlir::Value ScalarExprEmitter::emitSub(const BinOpInfo &ops) {
   // See more in `EmitSub` in CGExprScalar.cpp.
   assert(!cir::MissingFeatures::ptrDiffOp());
   cgf.cgm.errorNYI("ptrdiff");
-  return {};
+  return cgf.createDummyValue(loc, ops.fullType);
 }
 
 mlir::Value ScalarExprEmitter::emitShl(const BinOpInfo &ops) {
@@ -1402,3 +1501,23 @@ mlir::Value CIRGenFunction::emitScalarPrePostIncDec(const UnaryOperator *e,
   return ScalarExprEmitter(*this, builder)
       .emitScalarPrePostIncDec(e, lv, isInc, isPre);
 }
+
+mlir::Value CIRGenFunction::emitCheckedInBoundsGEP(
+    mlir::Type elemTy, mlir::Value ptr, ArrayRef<mlir::Value> idxList,
+    bool signedIndices, bool isSubtraction, SourceLocation loc) {
+  assert(!cir::MissingFeatures::ptrStrideOp());
+  if (idxList.size() != 1)
+    cgm.errorNYI("multi-index ptr arithmetic");
+
+  // TODO(cir): This should be a PtrStrideOp. For now we simply return the base
+  // pointer
+  mlir::Value gepVal = ptr;
+
+  // If the pointer overflow sanitizer isn't enabled, do nothing.
+  if (!sanOpts.has(SanitizerKind::PointerOverflow))
+    return gepVal;
+
+  assert(!cir::MissingFeatures::pointerOverflowSanitizer());
+  cgm.errorNYI("pointer overflow sanitizer");
+  return gepVal;
+}
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h b/clang/lib/CIR/CodeGen/CIRGenFunction.h
index cc04610f23fcb..0cb6c72710d3e 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.h
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h
@@ -223,6 +223,16 @@ class CIRGenFunction : public CIRGenTypeCache {
 
   void emitDecl(const clang::Decl &d);
 
+  /// Same as IRBuilder::CreateInBoundsGEP, but additionally emits a check to
+  /// detect undefined behavior when the pointer overflow sanitizer is enabled.
+  /// \p SignedIndices indicates whether any of the GEP indices are signed.
+  /// \p IsSubtraction indicates whether the expression used to form the GEP
+  /// is a subtraction.
+  mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,
+                                     llvm::ArrayRef<mlir::Value> idxList,
+                                     bool signedIndices, bool isSubtraction,
+                                     SourceLocation loc);
+
   void emitScalarInit(const clang::Expr *init, mlir::Location loc,
                       LValue lvalue, bool capturedByInit = false);
 
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index a16840cc6bfef..b77750798f1ed 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -20,6 +20,7 @@
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/IR/BuiltinDialect.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/Types.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Target/LLVMIR/Dialect/Builtin/BuiltinToLLVMIRTranslation.h"
@@ -1117,6 +1118,122 @@ mlir::LogicalResult CIRToLLVMBinOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
+mlir::LogicalResult CIRToLLVMBinOpOverflowOpLowering::matchAndRewrite(
+    cir::BinOpOverflowOp op, OpAdaptor adaptor,
+    mlir::ConversionPatternRewriter &rewriter) const {
+  mlir::Location loc = op.getLoc();
+  BinOpOverflowKind arithKind = op.getKind();
+  IntType operandTy = op.getLhs().getType();
+  IntType resultTy = op.getResult().getType();
+
+  EncompassedTypeInfo encompassedTyInfo =
+      computeEncompassedTypeWidth(operandTy, resultTy);
+  mlir::IntegerType encompassedLLVMTy =
+      rewriter.getIntegerType(encompassedTyInfo.width);
+
+  mlir::Value lhs = adaptor.getLhs();
+  mlir::Value rhs = adaptor.getRhs();
+  if (operandTy.getWidth() < encompassedTyInfo.width) {
+    if (operandTy.isSigned()) {
+      lhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, rhs);
+    } else {
+      lhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, lhs);
+      rhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, rhs);
+    }
+  }
+
+  std::string intrinName = getLLVMIntrinName(arithKind, encompassedTyInfo.sign,
+                                             encompassedTyInfo.width);
+  auto intrinNameAttr = mlir::StringAttr::get(op.getContext(), intrinName);
+
+  mlir::IntegerType overflowLLVMTy = rewriter.getI1Type();
+  auto intrinRetTy = mlir::LLVM::LLVMStructType::getLiteral(
+      rewriter.getContext(), {encompassedLLVMTy, overflowLLVMTy});
+
+  auto callLLVMIntrinOp = rewriter.create<mlir::LLVM::CallIntrinsicOp>(
+      loc, intrinRetTy, intrinNameAttr, mlir::ValueRange{lhs, rhs});
+  mlir::Value intrinRet = callLLVMIntrinOp.getResult(0);
+
+  mlir::Value result = rewriter
+                           .create<mlir::LLVM::ExtractValueOp>(
+                               loc, intrinRet, ArrayRef<int64_t>{0})
+                           .getResult();
+  mlir::Value overflow = rewriter
+                             .create<mlir::LLVM::ExtractValueOp>(
+                                 loc, intrinRet, ArrayRef<int64_t>{1})
+                             .getResult();
+
+  if (resultTy.getWidth() < encompassedTyInfo.width) {
+    mlir::Type resultLLVMTy = getTypeConverter()->convertType(resultTy);
+    auto truncResult =
+        rewriter.create<mlir::LLVM::TruncOp>(loc, resultLLVMTy, result);
+
+    // Extend the truncated result back to the encompassing type to check for
+    // any overflows during the truncation.
+    mlir::Value truncResultExt;
+    if (resultTy.isSigned())
+      truncResultExt = rewriter.create<mlir::LLVM::SExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    else
+      truncResultExt = rewriter.create<mlir::LLVM::ZExtOp>(
+          loc, encompassedLLVMTy, truncResult);
+    auto truncOverflow = rewriter.create<mlir::LLVM::ICmpOp>(
+        loc, mlir::LLVM::ICmpPredicate::ne, truncResultExt, result);
+
+    result = truncResult;
+    overflow = rewriter.create<mlir::LLVM::OrOp>(loc, overflow, truncOverflow);
+  }
+
+  mlir::Type boolLLVMTy =
+      getTypeConverter()->convertType(op.getOverflow().getType());
+  if (boolLLVMTy != rewriter.getI1Type())
+    overflow = rewriter.create<mlir::LLVM::ZExtOp>(loc, boolLLVMTy, overflow);
+
+  rewriter.replaceOp(op, mlir::ValueRange{result, overflow});
+
+  return mlir::success();
+}
+
+std::string CIRToLLVMBinOpOverflowOpLowering::getLLVMIntrinName(
+    cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
+  // The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`
+
+  std::string name = "llvm.";
+
+  if (isSigned)
+    name.push_back('s');
+  else
+    name.push_back('u');
+
+  switch (opKind) {
+  case cir::BinOpOverflowKind::Add:
+    name.append("add.");
+    break;
+  case cir::BinOpOverflowKind::Sub:
+    name.append("sub.");
+    break;
+  case cir::BinOpOverflowKind::Mul:
+    name.append("mul.");
+    break;
+  }
+
+  name.append("with.overflow.i");
+  name.append(std::to_string(width));
+
+  return name;
+}
+
+CIRToLLVMBinOpOverflowOpLowering::EncompassedTypeInfo
+CIRToLLVMBinOpOverflowOpLowering::computeEncompassedTypeWidth(
+    cir::IntType operandTy, cir::IntType resultTy) {
+  bool sign = operandTy.getIsSigned() || resultTy.getIsSigned();
+  unsigned width =
+      std::max(operandTy.getWidth() + (sign && operandTy.isUnsigned()),
+               resultTy.getWidth() + (sign && resultTy.isUnsigned()));
+  return {sign, width};
+}
+
 static void prepareTypeConverter(mlir::LLVMTypeConverter &converter,
                                  mlir::DataLayout &dataLayout) {
   converter.addConversion([&](cir::PointerType type) -> mlir::Type {
@@ -1256,6 +1373,7 @@ void ConvertCIRToLLVMPass::runOnOperation() {
   patterns.add<
       // clang-format off
                CIRToLLVMBinOpLowering,
+               CIRToLLVMBinOpOverflowOpLowering,
                CIRToLLVMBrCondOpLowering,
                CIRToLLVMBrOpLowering,
                CIRToLLVMFuncOpLowering,
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
index ef0bb2deaccdf..9fb8babe3dd6c 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
@@ -189,6 +189,28 @@ class CIRToLLVMBinOpLowering : public mlir::OpConversionPattern<cir::BinOp> {
                   mlir::ConversionPatternRewriter &) const override;
 };
 
+class CIRToLLVMBinOpOverflowOpLowering
+    : public mlir::OpConversionPattern<cir::BinOpOverflowOp> {
+public:
+  using mlir::OpConversionPattern<cir::BinOpOverflowOp>::OpConversionPattern;
+
+  mlir::LogicalResult
+  matchAndRewrite(cir::BinOpOverflowOp op, OpAdaptor,
+                  mlir::ConversionPatternRewriter &) const override;
+
+private:
+  static std::string getLLVMIntrinName(cir::BinOpOverflowKind opKind,
+                                       bool isSigned, unsigned width);
+
+  struct EncompassedTypeInfo {
+    bool sign;
+    unsigned width;
+  };
+
+  static EncompassedTypeInfo computeEncompassedTypeWidth(cir::IntType operandTy,
+                                                         cir::IntType resultTy);
+};
+
 class CIRToLLVMBrOpLowering : public mlir::OpConversionPattern<cir::BrOp> {
 public:
   using mlir::OpConversionPattern<cir::BrOp>::OpConversionPattern;
diff --git a/clang/test/CIR/CodeGen/binop.cpp b/clang/test/CIR/CodeGen/binop.cpp
index 4c20f79600fac..e617e54133e45 100644
--- a/clang/test/CIR/CodeGen/binop.cpp
+++ b/clang/test/CIR/CodeGen/binop.cpp
@@ -31,3 +31,13 @@ void testFloatingPointBinOps(float a, float b) {
   a - b;
   // CHECK: cir.binop(sub, %{{.+}}, %{{.+}}) : !cir.float
 }
+
+void testPointerArith(int *a, int *b, int c) {
+  auto add = a + c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+
+  auto sub = a - c;
+  // CHECK: cir.load %{{.+}} : !cir.ptr<!cir.ptr<!s32i>>, !cir.ptr<!s32i>
+  // CHECK: cir.store %{{.+}}, %{{.+}} : !cir.ptr<!s32i>, !cir.ptr<!cir.ptr<!s32i>>
+}
\ No newline at end of file
diff --git a/clang/test/CIR/Lowering/binop-overflow.cir b/clang/test/CIR/Lowering/binop-overflow.cir
new file mode 100644
index 0000000000000..68af70aa6abb6
--- /dev/null
+++ b/clang/test/CIR/Lowering/binop-overflow.cir
@@ -0,0 +1,63 @@
+// RUN: cir-opt %s -cir-to-llvm -o - | FileCheck %s -check-prefix=MLIR
+// RUN: cir-translate %s -cir-to-llvmir --target x86_64-unknown-linux-gnu --disable-cc-lowering -o -  | FileCheck %s -check-prefix=LLVM
+
+!u32i = !cir.int<u, 32>
+!s32i = !cir.int<s, 32>
+
+module {
+  cir.func @test_add_u32_u32_u32(%lhs: !u32i, %rhs: !u32i, %res: !cir.ptr<!u32i>) -> !cir.bool {
+    %result, %overflow = cir.binop.overflow(add, %lhs, %rhs) : !u32i, (!u32i, !cir.bool)
+    cir.store %result, %res : !u32i, !cir.ptr<!u32i>
+    cir.return %overflow : !cir.bool
+  }
+
+  //      MLIR: llvm.func @test_add_u32_u32_u32(%[[LHS:.+]]: i32, %[[RHS:.+]]: i32, %[[RES_PTR:.+]]: !llvm.ptr) -> i1
+  // MLIR-NEXT:   %[[#INTRIN_RET:]] = llvm.call_intrinsic "llvm.uadd.with.overflow.i32"(%[[LHS]], %[[RHS]]) : (i32, i32) -> !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#RES:]] = llvm.extractvalue %[[#INTRIN_RET]][0] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   %[[#OVFL:]] = llvm.extractvalue %[[#INTRIN_RET]][1] : !llvm.struct<(i32, i1)>
+  // MLIR-NEXT:   llvm.store %[[#RES]], %[[RES_PTR]] {{.*}} : i32, !llvm.ptr
+  // MLIR-NEXT:   llvm.return %[[#OVFL]] : i1
+  // MLIR-NEXT: }
+
+  //      LLVM: define i1 @test_add_u32_u32_u32(i32 %[[#LHS:]], i32 %[[#RHS:]], ptr %[[#RES_PTR:]])
+  // LLVM-NEXT:   %[[#INTRIN_RET:]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %[[#LHS]], i32 %[[#RHS]])
+  // LLVM-NEXT:   %[[#RES:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 0
+  // LLVM-NEXT:   %[[#OVFL:]] = extractvalue { i32, i1 } %[[#INTRIN_RET]], 1
+  // LLVM-NEXT:   store i32 %[[#RES]], ptr %[[#RES_PTR]], align 4
+  // LLVM-NEXT:   ret i1 %[[#OVFL]]
+  // LLVM-NEXT: }
+
+  cir.func ...
[truncated]

@mmha
Copy link
Contributor Author

mmha commented Mar 26, 2025

cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
// The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`

std::string name = "llvm.";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should use llvm::raw_string_ostream or Twine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but it could use llvm::Intrinisc::getName() with logic to figure out the needed intrinsic ID instead of building the name as a composite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, incubator's CIRGenFunction::emitBuiltinExpr has a few examples on that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is definitely value to at least doing a reserve here if the above suggestions don't help.

/// \p SignedIndices indicates whether any of the GEP indices are signed.
/// \p IsSubtraction indicates whether the expression used to form the GEP
/// is a subtraction.
mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reorganizing the emit functions in #133017. Can we hold this until that is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍

Copy link
Contributor

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value of this PR right now. The pointer arithmetic isn't performing the correct arithmetic, and nothing generates the binop.overflow op. It would be more useful to remove the binop.overflow part and add PtrDiffOp.

cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
return cgf.getBuilder().createIntToPtr(index, pointer.getType());

// Differently from LLVM codegen, ABI bits for index sizes is handled during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could use some clarification. I was going to say that I didn't think the "Differently from LLVM codegen" part of the comment added anything useful, but then before I even submitted my comment I found I had to go to the LLVM IR codegen to see what the comment was talking about. Anyway, here's my suggested rewording.

"The LLVM IR codegen compares the size of the index variable to what the data layout requires, and extends it if necessary. Like other ABI-specific processing, this is deferred until lowering to the LLVM dialect."

mlir::Value numElements = cgf.getBuilder().getConstAPInt(
cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));

// GEP indexes are signed, and scaling an index isn't permitted to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line of this comment from classic codegen was lost in the transition to the incubator. I think it's relevant and should be added here. The wording of the comment is also a bit awkward. How about this?

// Effectively, the multiply by the VLA size is part of the GEP.
// GEP indexes are signed, and signed-overflow while scaling an
// index isn't permitted, so we set the NSW flag on our explicit
// multiply if overflow is undefined behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few extra words perhaps in Andy's suggestion, but I'm generally in favor of clarifying this.


index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
numElements.getType());
index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment above (with or without my rewording), this should sunk into the if-else below and should be createNSWMul in the else case. That's what happens in classic codegen.

// future proof.
mlir::Type elemTy;
if (elementType->isVoidType() || elementType->isFunctionType())
elemTy = cgf.UInt8Ty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit suspect. Before the transition to opaque pointers in ClangIR, the code here in the classic codegen did this:

if (elementType->isVoidType() || elementType->isFunctionType()) {
  Value *result = CGF.EmitCastToVoidPtr(pointer);
  result = CGF.Builder.CreateGEP(CGF.Int8Ty, result, index, "add.ptr");
  return CGF.Builder.CreateBitCast(result, pointer->getType());
}

That looks much more consistent with the comment above. The original code here was added in this commit:

42a8cd3

We should probably look at the test cases there to make sure this is doing the right thing.


// TODO(cir): This should be a PtrStrideOp. For now we simply return the base
// pointer
mlir::Value gepVal = ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should be emitting an NYI error here.

cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
// The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`

std::string name = "llvm.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but it could use llvm::Intrinisc::getName() with logic to figure out the needed intrinsic ID instead of building the name as a composite.

@@ -109,6 +109,7 @@ struct MissingFeatures {
static bool cgFPOptionsRAII() { return false; }
static bool metaDataNode() { return false; }
static bool fastMathFlags() { return false; }
static bool vlas() { return false; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool vlas() { return false; }
static bool variableLengthArrays() { return false; }

vla has a handful of other meanings that this could get confused for. Also, it gives me an immediate gag-reflex to see vla in code, so spelling it out eases me into them better :)

Expr *indexOperand = expr->getRHS();

// In a subtraction, the LHS is always the pointer.
if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of subtraction index is no longer accurate either, right? Since some_ptr - other_ptr is valid.


// If this is subtraction, negate the index.
if (isSubtraction)
index = cgf.getBuilder().createNeg(index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned above, this assumes that the index is integral, which isn't necessarily true. Or did we do a conversion somewhere?

const PointerType *pointerType =
pointerOperand->getType()->getAs<PointerType>();
if (!pointerType) {
cgf.cgm.errorNYI("ObjC");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love this to be slightly more detailed of a message here. Just saying ObjC is pretty vague.

mlir::Value numElements = cgf.getBuilder().getConstAPInt(
cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));

// GEP indexes are signed, and scaling an index isn't permitted to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few extra words perhaps in Andy's suggestion, but I'm generally in favor of clarifying this.

cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
// The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`

std::string name = "llvm.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is definitely value to at least doing a reserve here if the above suggestions don't help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants