[Clang][objectsize] Generate object size calculation for sub-objects #86858

bwendling · 2024-03-27T19:23:09Z

The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:

struct s {
int foo;
char bar[2][40];
int baz;
int qux;
};

int main(int argc, char **argv) {
struct s f;

#define report(x) printf(#x ": %zu\n", x)

argc = 1;
report(__builtin_dynamic_object_size(f.bar[argc], 0));
report(__builtin_dynamic_object_size(f.bar[argc], 1));
return 0;

}

should return:

__builtin_dynamic_object_size(f.bar[argc], 0): 48
__builtin_dynamic_object_size(f.bar[argc], 1): 40

determined by the least significant bit of the TYPE.

The LLVM IR isn't sufficient to determine what could be considered a
"sub-object". However, the front-end does have enough information to
determine the size of a sub-object and the offset into that sub-object.

Therefore to convert the intrinsic into a calculation in the front-end
so that we can avoid the information issue.

The second argument of __builtin_dynamic_object_size controls whether it returns the size of the whole object or the closest surrounding object. For this struct: struct s { int foo; char bar[2][40]; int baz; int qux; }; int main(int argc, char **argv) { struct s f; #define report(x) printf(#x ": %zu\n", x) argc = 1; report(__builtin_dynamic_object_size(f.bar[argc], 0)); report(__builtin_dynamic_object_size(f.bar[argc], 1)); return 0; } should return: __builtin_dynamic_object_size(f.bar[argc], 0): 48 __builtin_dynamic_object_size(f.bar[argc], 1): 40 determined by the least significant bit of the TYPE. The LLVM IR isn't sufficient to determine what could be considered a "sub-object". However, the front-end does have enough information to determine the size of a sub-object and the offset into that sub-object. We try therefore to convert the intrinsic into a calculation in the front-end so that we can avoid the information issue..

…turning MAX_INT.

… it's in a substruct.

…ional array taken as a whole. Instead treat it as referencing the sub-type of the array the indices point to.

llvmbot · 2024-03-27T19:23:39Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Bill Wendling (bwendling)

Changes

The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:

struct s {
int foo;
char bar[2][40];
int baz;
int qux;
};

int main(int argc, char **argv) {
struct s f;

#define report(x) printf(#x ": %zu\n", x)

argc = 1;
report(__builtin_dynamic_object_size(f.bar[argc], 0));
report(__builtin_dynamic_object_size(f.bar[argc], 1));
return 0;

}

should return:

__builtin_dynamic_object_size(f.bar[argc], 0): 48
__builtin_dynamic_object_size(f.bar[argc], 1): 40

determined by the least significant bit of the TYPE.

The LLVM IR isn't sufficient to determine what could be considered a
"sub-object". However, the front-end does have enough information to
determine the size of a sub-object and the offset into that sub-object.

Therefore to convert the intrinsic into a calculation in the front-end
so that we can avoid the information issue.

Patch is 51.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86858.diff

5 Files Affected:

(modified) clang/lib/CodeGen/CGBuiltin.cpp (+169-6)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+6)
(modified) clang/test/CodeGen/attr-counted-by.c (+51-51)
(added) clang/test/CodeGen/object-size-sub-object.c (+303)
(modified) clang/test/CodeGen/object-size.c (+12-7)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 2eaceeba617700..b49311459fda6d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -26,6 +26,7 @@
 #include "clang/AST/Decl.h"
 #include "clang/AST/OSLog.h"
 #include "clang/AST/OperationKinds.h"
+#include "clang/AST/StmtVisitor.h"
 #include "clang/Basic/TargetBuiltins.h"
 #include "clang/Basic/TargetInfo.h"
 #include "clang/Basic/TargetOptions.h"
@@ -1052,6 +1053,165 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, IsSigned));
 }
 
+namespace {
+
+class ObjectSizeVisitor
+    : public ConstStmtVisitor<ObjectSizeVisitor, const Expr *> {
+  bool SkipASE;
+
+public:
+  ObjectSizeVisitor(bool SkipASE = false) : SkipASE(SkipASE) {}
+
+  const Expr *Visit(const Expr *E) {
+    return ConstStmtVisitor<ObjectSizeVisitor, const Expr *>::Visit(E);
+  }
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+    return SkipASE ? Visit(E->getBase()) : E;
+  }
+
+  const Expr *VisitCastExpr(const CastExpr *E) {
+    return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+    return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+    return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+    return Visit(E->getSubExpr());
+  }
+};
+
+} // end anonymous namespace
+
+/// getLastDecl - Return the last FieldDecl in the struct.
+static const FieldDecl *getLastDecl(const RecordDecl *RD) {
+  const Decl *LastDecl = nullptr;
+  for (const Decl *D : RD->decls())
+    if (isa<FieldDecl>(D) || isa<RecordDecl>(D))
+      LastDecl = D;
+
+  if (const auto *LastRD = dyn_cast<RecordDecl>(LastDecl)) {
+    LastDecl = getLastDecl(LastRD);
+  } else if (const auto *LastFD = dyn_cast<FieldDecl>(LastDecl)) {
+    QualType Ty = LastFD->getType();
+    if (Ty->isPointerType())
+      Ty = Ty->getPointeeType();
+
+    if (const RecordDecl *Rec = Ty->getAsRecordDecl())
+      // The last FieldDecl is a structure. Look into that struct to find its
+      // last FieldDecl.
+      LastDecl = getLastDecl(Rec);
+  }
+
+  return dyn_cast_if_present<FieldDecl>(LastDecl);
+}
+
+/// tryToCalculateSubObjectSize - It may be possible to calculate the
+/// sub-object size of an array and skip the generation of the llvm.objectsize
+/// intrinsic. This avoids the complication in conveying the sub-object's
+/// information to the backend. This calculation works for an N-dimentional
+/// array.
+llvm::Value *
+CodeGenFunction::tryToCalculateSubObjectSize(const Expr *E, unsigned Type,
+                                             llvm::IntegerType *ResType) {
+  if ((Type & 0x01) != 1)
+    // Only support sub-object calculation.
+    return nullptr;
+
+  const Expr *ObjectRef = ObjectSizeVisitor().Visit(E);
+  if (!ObjectRef)
+    return nullptr;
+
+  QualType ObjectRefType = ObjectRef->getType();
+  if (ObjectRefType->isPointerType())
+    ObjectRefType = ObjectRefType->getPointeeType();
+
+  // Collect the base and index from the array.
+  QualType ObjectBaseRefTy;
+  const Expr *ArrayIdx = nullptr;
+
+  if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(ObjectRef)) {
+    ArrayIdx = ASE->getIdx()->IgnoreParenImpCasts();
+
+    const Expr *ArrayRefBase = ASE->getBase()->IgnoreParenImpCasts();
+    if (isa<ArraySubscriptExpr>(ArrayRefBase)) {
+      ObjectBaseRefTy = ArrayRefBase->getType();
+      if (ObjectBaseRefTy->isPointerType())
+        ObjectBaseRefTy = ObjectBaseRefTy->getPointeeType();
+    }
+  }
+
+  ASTContext &Ctx = getContext();
+  if (!ArrayIdx || ArrayIdx->HasSideEffects(Ctx))
+    return nullptr;
+
+  // Check to see if the Decl is a flexible array member. Processing of the
+  // 'counted_by' attribute is done by now. So we don't have any information on
+  // its size, so return MAX_INT.
+  //
+  // Rerun the visitor to find the base expr: MemberExpr or DeclRefExpr.
+  ObjectRef = ObjectSizeVisitor(true).Visit(ObjectRef);
+  if (!ObjectRef)
+    return nullptr;
+
+  if (const auto *ME = dyn_cast<MemberExpr>(ObjectRef)) {
+    if (const auto *FD = dyn_cast<FieldDecl>(ME->getMemberDecl())) {
+      const LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
+          getLangOpts().getStrictFlexArraysLevel();
+      const RecordDecl *OuterRD =
+          FD->getParent()->getOuterLexicalRecordContext();
+      const FieldDecl *LastFD = getLastDecl(OuterRD);
+
+      if (LastFD == FD && Decl::isFlexibleArrayMemberLike(
+                              Ctx, FD, FD->getType(), StrictFlexArraysLevel,
+                              /*IgnoreTemplateOrMacroSubstitution=*/true))
+        return ConstantInt::get(ResType, -1, /*isSigned=*/true);
+    }
+  }
+
+  if (ObjectBaseRefTy.isNull()) {
+    ObjectBaseRefTy = ObjectRef->getType();
+    if (ObjectBaseRefTy->isPointerType())
+      ObjectBaseRefTy = ObjectBaseRefTy->getPointeeType();
+  }
+
+  // Generate the calculation:
+  //
+  //     S Object[n_1][n_2]...[n_m]; /* M-dimentional array */
+  //
+  //     ObjectRef = Object[n_1]...[n_x]; /* 0 < x < m */
+  //     ObjectBaseRef = Object[n_1]...[n_{x-1}];
+  //
+  //     ArrayRefSize = sizeof( typeof( ObjectRef ) );
+  //     ArrayRefBaseSize = sizeof( typeof( ObjectBaseRef ) );
+  //
+  //     Size = ArrayRefSize - (ArrayRefBaseSize * ArrayIdx);
+  //     return Size > 0 ? Size : 0;
+  //
+  Value *ArrayRefSize = ConstantInt::get(
+      ResType, Ctx.getTypeSizeInChars(ObjectRefType).getQuantity(),
+      /*isSigned=*/true);
+  Value *ArrayRefBaseSize = ConstantInt::get(
+      ResType, Ctx.getTypeSizeInChars(ObjectBaseRefTy).getQuantity(),
+      /*isSigned=*/true);
+
+  Value *Res = EmitScalarExpr(ArrayIdx);
+
+  Res = Builder.CreateIntCast(Res, ResType, /*isSigned=*/true);
+  Res =
+      Builder.CreateSub(ArrayRefBaseSize, Builder.CreateMul(ArrayRefSize, Res));
+
+  return Builder.CreateSelect(Builder.CreateIsNotNeg(Res), Res,
+                              ConstantInt::get(ResType, 0, /*isSigned=*/true));
+}
+
 /// Returns a Value corresponding to the size of the given expression.
 /// This Value may be either of the following:
 ///   - A llvm::Argument (if E is a param with the pass_object_size attribute on
@@ -1084,18 +1244,21 @@ CodeGenFunction::emitBuiltinObjectSize(const Expr *E, unsigned Type,
     }
   }
 
+  // LLVM can't handle Type=3 appropriately, and __builtin_object_size shouldn't
+  // evaluate E for side-effects. In either case, we shouldn't lower to
+  // @llvm.objectsize.
+  if (Type == 3 || (!EmittedE && E->HasSideEffects(getContext())))
+    return getDefaultBuiltinObjectSizeResult(Type, ResType);
+
   if (IsDynamic) {
     // Emit special code for a flexible array member with the "counted_by"
     // attribute.
     if (Value *V = emitFlexibleArrayMemberSize(E, Type, ResType))
       return V;
-  }
 
-  // LLVM can't handle Type=3 appropriately, and __builtin_object_size shouldn't
-  // evaluate E for side-effects. In either case, we shouldn't lower to
-  // @llvm.objectsize.
-  if (Type == 3 || (!EmittedE && E->HasSideEffects(getContext())))
-    return getDefaultBuiltinObjectSizeResult(Type, ResType);
+    if (Value *V = tryToCalculateSubObjectSize(E, Type, ResType))
+      return V;
+  }
 
   Value *Ptr = EmittedE ? EmittedE : EmitScalarExpr(E);
   assert(Ptr->getType()->isPointerTy() &&
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index e8f8aa601ed017..640f2cf2c51b56 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4919,6 +4919,12 @@ class CodeGenFunction : public CodeGenTypeCache {
                                                llvm::Value *EmittedE,
                                                bool IsDynamic);
 
+  /// Try to calculate the sub-object size (i.e. \p Type's least significant
+  /// bit is set). It afoids the complication in conveying the sub-object
+  /// information to the backend.
+  llvm::Value *tryToCalculateSubObjectSize(const Expr *E, unsigned Type,
+                                           llvm::IntegerType *ResType);
+
   /// Emits the size of E, as required by __builtin_object_size. This
   /// function is aware of pass_object_size parameters, and will act accordingly
   /// if E is a parameter with the pass_object_size attribute.
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 1fb39f9a346667..16e586baaced97 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -66,7 +66,7 @@ struct anon_struct {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = icmp ult i64 [[IDXPROM]], [[TMP0]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP1]], label [[CONT3:%.*]], label [[HANDLER_OUT_OF_BOUNDS:%.*]], !prof [[PROF3:![0-9]+]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       handler.out_of_bounds:
-// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB2:[0-9]+]], i64 [[IDXPROM]]) #[[ATTR10:[0-9]+]], !nosanitize [[META2]]
+// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB1:[0-9]+]], i64 [[IDXPROM]]) #[[ATTR10:[0-9]+]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 12
@@ -114,7 +114,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = icmp ugt i64 [[TMP0]], [[INDEX]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP1]], label [[CONT3:%.*]], label [[HANDLER_OUT_OF_BOUNDS:%.*]], !prof [[PROF3]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       handler.out_of_bounds:
-// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB4:[0-9]+]], i64 [[INDEX]]) #[[ATTR10]], !nosanitize [[META2]]
+// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB3:[0-9]+]], i64 [[INDEX]]) #[[ATTR10]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 12
@@ -203,7 +203,7 @@ size_t test2_bdos(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = icmp ugt i64 [[TMP0]], [[INDEX]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP1]], label [[CONT3:%.*]], label [[HANDLER_OUT_OF_BOUNDS:%.*]], !prof [[PROF3]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       handler.out_of_bounds:
-// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB5:[0-9]+]], i64 [[INDEX]]) #[[ATTR10]], !nosanitize [[META2]]
+// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB4:[0-9]+]], i64 [[INDEX]]) #[[ATTR10]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 12
@@ -308,7 +308,7 @@ size_t test3_bdos(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = icmp ult i64 [[IDXPROM]], [[TMP0]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP1]], label [[CONT4:%.*]], label [[HANDLER_OUT_OF_BOUNDS:%.*]], !prof [[PROF3]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       handler.out_of_bounds:
-// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB6:[0-9]+]], i64 [[IDXPROM]]) #[[ATTR10]], !nosanitize [[META2]]
+// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB5:[0-9]+]], i64 [[IDXPROM]]) #[[ATTR10]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont4:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = icmp sgt i32 [[DOT_COUNTED_BY_LOAD]], 2
@@ -325,7 +325,7 @@ size_t test3_bdos(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP7:%.*]] = icmp ult i64 [[IDXPROM13]], [[TMP6]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP7]], label [[CONT20:%.*]], label [[HANDLER_OUT_OF_BOUNDS16:%.*]], !prof [[PROF3]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       handler.out_of_bounds16:
-// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB7:[0-9]+]], i64 [[IDXPROM13]]) #[[ATTR10]], !nosanitize [[META2]]
+// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB6:[0-9]+]], i64 [[IDXPROM13]]) #[[ATTR10]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont20:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP8:%.*]] = icmp sgt i32 [[DOT_COUNTED_BY_LOAD7]], 3
@@ -342,7 +342,7 @@ size_t test3_bdos(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP13:%.*]] = icmp ult i64 [[IDXPROM30]], [[TMP12]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[TMP13]], label [[CONT37:%.*]], label [[HANDLER_OUT_OF_BOUNDS33:%.*]], !prof [[PROF3]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       handler.out_of_bounds33:
-// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB8:[0-9]+]], i64 [[IDXPROM30]]) #[[ATTR10]], !nosanitize [[META2]]
+// SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB7:[0-9]+]], i64 [[IDXPROM30]]) #[[ATTR10]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont37:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX35:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM30]]
@@ -405,33 +405,33 @@ size_t test3_bdos(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[ADD]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM17]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX18]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD31:%.*]] = add nsw i32 [[INDEX]], 2
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM32:%.*]] = sext i32 [[ADD31]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM32]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX33]], align 4, !tbaa [[TBAA2]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM6:%.*]] = sext i32 [[ADD]] to i64
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM6]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX7]], align 4, !tbaa [[TBAA2]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD13:%.*]] = add nsw i32 [[INDEX]], 2
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM14:%.*]] = sext i32 [[ADD13]] to i64
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM14]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX15]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
 // NO-SANITIZE-WITHOUT-ATTR-LABEL: define dso_local void @test4(
-// NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i32 noundef [[INDEX:%.*]], i32 noundef [[FAM_IDX:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// NO-SANITIZE-WITHOUT-ATTR-SAME: ptr nocapture noundef writeonly [[P:%.*]], i32 noundef [[INDEX:%.*]], i32 noundef [[FAM_IDX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM9:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM9]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD17:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM18:%.*]] = sext i32 [[ADD17]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM18]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX19]], align 4, !tbaa [[TBAA2]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM2:%.*]] = sext i32 [[ADD]] to i64
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM2]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD5:%.*]] = add nsw i32 [[INDEX]], 2
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM6:%.*]] = sext i32 [[ADD5]] to i64
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM6]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX7]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
 void test4(struct annotated *p, int index, int fam_idx) {
@@ -471,13 +471,13 @@ void test4(struct annotated *p, int index, int fam_idx) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP7:%.*]] = select i1 [[TMP6]], i64 [[TMP3]], i64 0
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret i64 [[TMP7]]
 //
-// SANITIZE-WITHOUT-ATTR-LABEL: define dso_local i64 @test4_bdos(
-// SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i32 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// SANITIZE-WITHOUT-ATTR-LABEL: define dso_local noundef i64 @test4_bdos(
+// SANITIZE-WITHOUT-ATTR-SAME: ptr nocapture noundef readnone [[P:%.*]], i32 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret i64 -1
 //
-// NO-SANITIZE-WITHOUT-ATTR-LABEL: define dso_local i64 @test4_bdos(
-// NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef readnone [[P:%.*]], i32 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR1]] {
+// NO-SANITIZE-WITHOUT-ATTR-LABEL: define dso_local noundef i64 @test4_bdos(
+// NO-SANITIZE-WITHOUT-ATTR-SAME: ptr nocapture noundef readnone [[P:%.*]], i32 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR1]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret i64 -1
 //
@@ -494,7 +494,7 @@ size_t test4_bdos(struct annotated *p, int index) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = icmp ugt i64 [[DO...
[truncated]

kees

I can't speak to the implementation details, but this passes my PoC tests that examine subobjects.

bwendling · 2024-04-10T19:52:02Z

Friendly ping.

bwendling · 2024-04-22T22:23:44Z

Friendly ping 2: Electric Boogaloo

bwendling · 2024-04-30T00:08:55Z

Another ping...

bwendling · 2024-05-06T20:45:38Z

Ping again.

clang/lib/CodeGen/CGBuiltin.cpp

…stDecl' so that it only looks at 'FieldDecls'.

bwendling · 2024-05-22T18:11:55Z

I think I addressed your concerns. PTAL.

clang/lib/CodeGen/CGBuiltin.cpp

efriedma-quic · 2024-05-23T20:53:22Z

clang/lib/CodeGen/CGBuiltin.cpp

+    return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+    return Visit(E->getSubExpr());


For this kind of recursion, it probably makes sense to track whether we're looking at an lvalue or an rvalue. If "x" is an int*, __bdos(x) is very different from __bdos(&x).

I see what you mean. I kinda want to wave my hands and say that "it's okay, because we're only interested in arrays at this point," but that sounds like Famous Last Words(tm). Do you think that allowing & only on an l-value is okay, too restrictive, or not what you had in mind?

If you have an array of pointers int* a[10];, and someone writes __bdos(a[b]), that's equivalent to void* p = a[b]; __bdos(p)... and this code can't resolve that kind of construct. So you need to first look for an "&" (or the equivalent array-to-pointer decay), then look for an array subscript expression inside of that.

If you restrict things enough so each expression you handle is inherently either an lvalue or an rvalue, the result would be logically sound, I guess. This requires ensuring you don't IgnoreParenImpCasts anywhere because that can look through an lvalue-to-rvalue conversion. I think it would be more clear with two separate visitors, though.

My knowledge of the front-end's architecture isn't perfect, but it seems like supporting that type of aliasing in the front-end isn't easy. For your example, the code reverts to using the llvm.objectsize intrinsic on the result of EmitScalarExpr(p). This can be seen as a deficiency, but according to others it's "acceptable" to return such in this case. I intend to expand the use of this function with future changes, where it may be possible to support this type of aliasing. But we'll always have issues with things like:

unsigned __attribute__((always_inline)) foo(void *p) { return __builtin_dynamic_object_size(p, 1); } void bar(int y) { int *x[10]; foo((void *)x[y]); }

because we don't perform inlining in the front-end. For that, we'll need to "fix" the llvm.objectsize intrinsic, but it's tougher than it looks.

With the latest version of your patch, with the following code, both __builtin_dynamic_object_size() calls fold to 319, which is pretty clearly wrong. (gcc folds the first call to 312, and the second to -1.)

void report(long); int main(int argc, char **argv) { char *bar[40]; argc = 1; report(__builtin_dynamic_object_size(&bar[argc], 1)); report(__builtin_dynamic_object_size(bar[argc], 1)); return 0; }

efriedma-quic · 2024-05-23T20:54:02Z

clang/lib/CodeGen/CGBuiltin.cpp

+  const Expr *VisitCastExpr(const CastExpr *E) {
+    return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {


IgnoreParens() will give you better coverage of things we're supposed to treat like parentheses.

Good to know. Done.

I'll remove this visit and instead rely on IgnoreParenImpCasts instead. How does that sound?

clang/lib/CodeGen/CGBuiltin.cpp

…stead of using a 'VisitParenExpr', which is too general.

efriedma-quic · 2024-05-30T22:09:25Z

clang/lib/CodeGen/CGBuiltin.cpp

+      ResType, Ctx.getTypeSizeInChars(ObjectBaseRefTy).getQuantity(),
+      /*isSigned=*/true);
+
+  Value *Res = EmitScalarExpr(ArrayIdx);


Do we need to check whether ArrayIdx has side-effects? (If EmittedE is non-null, we don't check for side-effects otherwise.)

I do that on line 1156.

bwendling · 2024-07-08T19:30:21Z

That last commit message was a mistake. :-/ This is almost ready for another look. Please stay tunded.

bwendling · 2024-07-09T21:56:02Z

I believe this is ready for another review. PTAL.

efriedma-quic

As a matter of ensuring the behavior is predictable, I don't like IgnoreParenImpCasts(), and more generally looking through casts without checking the CastKind; it very easily leads to bugs because some casts have important semantics. Particularly lvalue-to-rvalue casts in this context.

IgnoreParens() is safe.

…e not testing the cast type.

bwendling · 2024-07-25T23:36:39Z

As a matter of ensuring the behavior is predictable, I don't like IgnoreParenImpCasts(), and more generally looking through casts without checking the CastKind; it very easily leads to bugs because some casts have important semantics. Particularly lvalue-to-rvalue casts in this context.

Roger that. Some of the visitors are meant to completely ignore all casts. That's intentional. I want to get to the MemberExpr/DeclRefExpr to get its type. Casts only get in the way of that. Some discussion about why is here: #96737

efriedma-quic · 2024-07-26T00:38:56Z

I agree you want to look through some casts... but I'd like to see a check of getCastKind() to ensure the cast you're looking through is a cast you're expecting to see. Skipping over CK_BitCast should be fine; I'm concerned you'll end up skipping over something that actually affects the pointer value like CK_DerivedToBase.

bwendling added 5 commits March 22, 2024 14:07

Make sure the field Decl we're looking as is the actual FAM before re…

81b5a84

…turning MAX_INT.

Make sure we're looking at the very last field in the struct, even if…

3a59a35

… it's in a substruct.

Follow the flexible array member through a pointer.

0b66243

Don't calculate from the last index to the end of the entire N-diment…

d2c4328

…ional array taken as a whole. Instead treat it as referencing the sub-type of the array the indices point to.

bwendling requested review from nikic, zygoloid and kees March 27, 2024 19:23

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 27, 2024

kees reviewed Mar 28, 2024

View reviewed changes

bwendling requested a review from efriedma-quic April 22, 2024 22:23

efriedma-quic reviewed May 15, 2024

View reviewed changes

clang/lib/CodeGen/CGBuiltin.cpp Outdated Show resolved Hide resolved

clang/lib/CodeGen/CGBuiltin.cpp Outdated Show resolved Hide resolved

clang/lib/CodeGen/CGBuiltin.cpp Outdated Show resolved Hide resolved

bwendling and others added 2 commits May 17, 2024 14:54

Merge branch 'llvm:main' into __bdos-subobject

a3362e8

Remove aggressive 'dereference' from the visitor, and simplify 'getLa…

206b305

…stDecl' so that it only looks at 'FieldDecls'.

efriedma-quic reviewed May 23, 2024

View reviewed changes

Restrict 'VisitCastExpr' to no-op casts. Use 'IgnoreParenImpCasts' in…

98d4823

…stead of using a 'VisitParenExpr', which is too general.

efriedma-quic reviewed May 30, 2024

View reviewed changes

bwendling and others added 2 commits June 24, 2024 14:40

Merge branch 'llvm:main' into __bdos-subobject

75e9037

WIP

ed1604b

Merge branch 'llvm:main' into __bdos-subobject

aa0406d

bwendling mentioned this pull request Jul 8, 2024

How should __builtin_object_size treat casts in the type & 1 == 1 case? #96737

Open

The array base may not be an array itself.

82bb459

bwendling added 3 commits July 8, 2024 14:39

Fix testcases.

5168824

Simplify the cast visitor.

e38caf9

Add testcase.

da1fd97

efriedma-quic reviewed Jul 25, 2024

View reviewed changes

Use IgnoreParens instead of IgnoreParenImpCasts. It's safer when we'r…

b4dc683

…e not testing the cast type.

[Clang][objectsize] Generate object size calculation for sub-objects #86858

Are you sure you want to change the base?

[Clang][objectsize] Generate object size calculation for sub-objects #86858

Uh oh!

Conversation

bwendling commented Mar 27, 2024

Uh oh!

llvmbot commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kees left a comment

Choose a reason for hiding this comment

Uh oh!

bwendling commented Apr 10, 2024

Uh oh!

bwendling commented Apr 22, 2024

Uh oh!

bwendling commented Apr 30, 2024

Uh oh!

bwendling commented May 6, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwendling commented May 22, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwendling commented Jul 8, 2024

Uh oh!

bwendling commented Jul 9, 2024

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

bwendling commented Jul 25, 2024

Uh oh!

efriedma-quic commented Jul 26, 2024

Uh oh!

Uh oh!

llvmbot commented Mar 27, 2024 •

edited

Loading