Skip to content

[Clang] Verify data layout consistency #144720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jun 18, 2025

Verify that the alignments specified by clang TargetInfo match the alignments specified by LLVM data layout, which will hopefully prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like -malign-double are used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency in Basic.

Verify that the alignments specified by clang TargetInfo match
the alignments specified by LLVM data layout, which will hopefully
prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like `-malign-double`
are used, or a language that mandates sizes/alignments that differ
from C.
@nikic nikic requested review from rjmccall and efriedma-quic June 18, 2025 14:59
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jun 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-clang-codegen

Author: Nikita Popov (nikic)

Changes

Verify that the alignments specified by clang TargetInfo match the alignments specified by LLVM data layout, which will hopefully prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like -malign-double are used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency in Basic.


Full diff: https://github.com/llvm/llvm-project/pull/144720.diff

1 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+73)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c27168e4c4bfe..aabc872e22df1 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -332,6 +332,76 @@ const TargetCodeGenInfo &CodeGenModule::getTargetCodeGenInfo() {
   return *TheTargetCodeGenInfo;
 }
 
+static void checkDataLayoutConsistency(const TargetInfo &Target,
+                                       llvm::LLVMContext &Context,
+                                       const LangOptions &Opts) {
+#ifndef NDEBUG
+  // Don't verify non-standard ABI configurations.
+  if (Opts.AlignDouble || Opts.OpenCL || Opts.HLSL)
+    return;
+
+  llvm::Triple Triple = Target.getTriple();
+  llvm::DataLayout DL(Target.getDataLayoutString());
+  auto Check = [&](const char *Name, llvm::Type *Ty, unsigned Alignment) {
+    llvm::Align DLAlign = DL.getABITypeAlign(Ty);
+    llvm::Align ClangAlign(Alignment / 8);
+    if (DLAlign != ClangAlign) {
+      llvm::errs() << "For target " << Triple.str() << " type " << Name
+                   << " mapping to " << *Ty << " has data layout alignment "
+                   << DLAlign.value() << " while clang specifies "
+                   << ClangAlign.value() << "\n";
+      abort();
+    }
+  };
+
+  Check("bool", llvm::Type::getIntNTy(Context, Target.BoolWidth),
+        Target.BoolAlign);
+  Check("short", llvm::Type::getIntNTy(Context, Target.ShortWidth),
+        Target.ShortAlign);
+  // FIXME: M68k specifies incorrect wrong int and long alignments in Clang
+  // and incorrect long long alignment in both LLVM and Clang.
+  if (Triple.getArch() != llvm::Triple::m68k) {
+    Check("int", llvm::Type::getIntNTy(Context, Target.IntWidth),
+          Target.IntAlign);
+    Check("long", llvm::Type::getIntNTy(Context, Target.LongWidth),
+          Target.LongAlign);
+    Check("long long", llvm::Type::getIntNTy(Context, Target.LongLongWidth),
+          Target.LongLongAlign);
+  }
+  // FIXME: There are int128 alignment mismatches on multiple targets.
+  if (Target.hasInt128Type() && !Target.getTargetOpts().ForceEnableInt128 &&
+      !Triple.isAMDGPU() && !Triple.isSPIRV() &&
+      Triple.getArch() != llvm::Triple::ve)
+    Check("__int128", llvm::Type::getIntNTy(Context, 128), Target.Int128Align);
+
+  if (Target.hasFloat16Type())
+    Check("half", llvm::Type::getFloatingPointTy(Context, *Target.HalfFormat),
+          Target.HalfAlign);
+  if (Target.hasBFloat16Type())
+    Check("bfloat", llvm::Type::getBFloatTy(Context), Target.BFloat16Align);
+  Check("float", llvm::Type::getFloatingPointTy(Context, *Target.FloatFormat),
+        Target.FloatAlign);
+  // FIXME: AIX specifies wrong double alignment in DataLayout
+  if (!Triple.isOSAIX()) {
+    Check("double",
+          llvm::Type::getFloatingPointTy(Context, *Target.DoubleFormat),
+          Target.DoubleAlign);
+    Check("long double",
+          llvm::Type::getFloatingPointTy(Context, *Target.LongDoubleFormat),
+          Target.LongDoubleAlign);
+  }
+  // FIXME: Wasm has a mismatch in f128 alignment between Clang and LLVM.
+  if (Target.hasFloat128Type() && !Triple.isWasm())
+    Check("__float128", llvm::Type::getFP128Ty(Context), Target.Float128Align);
+  if (Target.hasIbm128Type())
+    Check("__ibm128", llvm::Type::getPPC_FP128Ty(Context), Target.Ibm128Align);
+
+  // FIXME: Clang specifies incorrect pointer alignment for m68k.
+  if (Triple.getArch() != llvm::Triple::m68k)
+    Check("void*", llvm::PointerType::getUnqual(Context), Target.PointerAlign);
+#endif
+}
+
 CodeGenModule::CodeGenModule(ASTContext &C,
                              IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS,
                              const HeaderSearchOptions &HSO,
@@ -458,6 +528,9 @@ CodeGenModule::CodeGenModule(ASTContext &C,
   if (Context.getTargetInfo().getTriple().getArch() == llvm::Triple::x86)
     getModule().addModuleFlag(llvm::Module::Error, "NumRegisterParameters",
                               CodeGenOpts.NumRegisterParameters);
+
+  if (!Context.getAuxTargetInfo())
+    checkDataLayoutConsistency(Context.getTargetInfo(), LLVMContext, LangOpts);
 }
 
 CodeGenModule::~CodeGenModule() {}

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

Verify that the alignments specified by clang TargetInfo match the alignments specified by LLVM data layout, which will hopefully prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like -malign-double are used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency in Basic.


Full diff: https://github.com/llvm/llvm-project/pull/144720.diff

1 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+73)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c27168e4c4bfe..aabc872e22df1 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -332,6 +332,76 @@ const TargetCodeGenInfo &CodeGenModule::getTargetCodeGenInfo() {
   return *TheTargetCodeGenInfo;
 }
 
+static void checkDataLayoutConsistency(const TargetInfo &Target,
+                                       llvm::LLVMContext &Context,
+                                       const LangOptions &Opts) {
+#ifndef NDEBUG
+  // Don't verify non-standard ABI configurations.
+  if (Opts.AlignDouble || Opts.OpenCL || Opts.HLSL)
+    return;
+
+  llvm::Triple Triple = Target.getTriple();
+  llvm::DataLayout DL(Target.getDataLayoutString());
+  auto Check = [&](const char *Name, llvm::Type *Ty, unsigned Alignment) {
+    llvm::Align DLAlign = DL.getABITypeAlign(Ty);
+    llvm::Align ClangAlign(Alignment / 8);
+    if (DLAlign != ClangAlign) {
+      llvm::errs() << "For target " << Triple.str() << " type " << Name
+                   << " mapping to " << *Ty << " has data layout alignment "
+                   << DLAlign.value() << " while clang specifies "
+                   << ClangAlign.value() << "\n";
+      abort();
+    }
+  };
+
+  Check("bool", llvm::Type::getIntNTy(Context, Target.BoolWidth),
+        Target.BoolAlign);
+  Check("short", llvm::Type::getIntNTy(Context, Target.ShortWidth),
+        Target.ShortAlign);
+  // FIXME: M68k specifies incorrect wrong int and long alignments in Clang
+  // and incorrect long long alignment in both LLVM and Clang.
+  if (Triple.getArch() != llvm::Triple::m68k) {
+    Check("int", llvm::Type::getIntNTy(Context, Target.IntWidth),
+          Target.IntAlign);
+    Check("long", llvm::Type::getIntNTy(Context, Target.LongWidth),
+          Target.LongAlign);
+    Check("long long", llvm::Type::getIntNTy(Context, Target.LongLongWidth),
+          Target.LongLongAlign);
+  }
+  // FIXME: There are int128 alignment mismatches on multiple targets.
+  if (Target.hasInt128Type() && !Target.getTargetOpts().ForceEnableInt128 &&
+      !Triple.isAMDGPU() && !Triple.isSPIRV() &&
+      Triple.getArch() != llvm::Triple::ve)
+    Check("__int128", llvm::Type::getIntNTy(Context, 128), Target.Int128Align);
+
+  if (Target.hasFloat16Type())
+    Check("half", llvm::Type::getFloatingPointTy(Context, *Target.HalfFormat),
+          Target.HalfAlign);
+  if (Target.hasBFloat16Type())
+    Check("bfloat", llvm::Type::getBFloatTy(Context), Target.BFloat16Align);
+  Check("float", llvm::Type::getFloatingPointTy(Context, *Target.FloatFormat),
+        Target.FloatAlign);
+  // FIXME: AIX specifies wrong double alignment in DataLayout
+  if (!Triple.isOSAIX()) {
+    Check("double",
+          llvm::Type::getFloatingPointTy(Context, *Target.DoubleFormat),
+          Target.DoubleAlign);
+    Check("long double",
+          llvm::Type::getFloatingPointTy(Context, *Target.LongDoubleFormat),
+          Target.LongDoubleAlign);
+  }
+  // FIXME: Wasm has a mismatch in f128 alignment between Clang and LLVM.
+  if (Target.hasFloat128Type() && !Triple.isWasm())
+    Check("__float128", llvm::Type::getFP128Ty(Context), Target.Float128Align);
+  if (Target.hasIbm128Type())
+    Check("__ibm128", llvm::Type::getPPC_FP128Ty(Context), Target.Ibm128Align);
+
+  // FIXME: Clang specifies incorrect pointer alignment for m68k.
+  if (Triple.getArch() != llvm::Triple::m68k)
+    Check("void*", llvm::PointerType::getUnqual(Context), Target.PointerAlign);
+#endif
+}
+
 CodeGenModule::CodeGenModule(ASTContext &C,
                              IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS,
                              const HeaderSearchOptions &HSO,
@@ -458,6 +528,9 @@ CodeGenModule::CodeGenModule(ASTContext &C,
   if (Context.getTargetInfo().getTriple().getArch() == llvm::Triple::x86)
     getModule().addModuleFlag(llvm::Module::Error, "NumRegisterParameters",
                               CodeGenOpts.NumRegisterParameters);
+
+  if (!Context.getAuxTargetInfo())
+    checkDataLayoutConsistency(Context.getTargetInfo(), LLVMContext, LangOpts);
 }
 
 CodeGenModule::~CodeGenModule() {}

@rjmccall
Copy link
Contributor

So, devil's advocate: do we actually care about IR data layout alignments? Clang should be putting explicit alignments on everything in the IR. And, I mean, I can certainly imagine that a target might end up in an impossible situation for terrible historical reasons, because e.g. it has an int128_t with 8-byte alignment but an __int128_t with 16-byte alignment.

@nikic
Copy link
Contributor Author

nikic commented Jun 19, 2025

@rjmccall From a Clang perspective, largely no -- but not entirely. There are cases when LLVM materializes allocations based on the data layout in ways that are observable and not controllable by Clang. One of these is certain inline asm constraints (which can end up passing constant pool addresses that don't satisfy alignment requirements) and another is certain edge cases around libcall lowering (libcalls that result in sret demotion, which is why clang used to miscompile f128 libcalls).

From an LLVM perspective, we do care because non-Clang frontends use LLVM's data layout as their source of truth. For most targets, the information there is sufficient -- as long as it's actually correct. It seems that sometimes Clang developers also expect that the data layout is the source of truth, which is how we end up with situations like m68k forgetting to specify Clang-side alignments entirely (#144740).

@rjmccall
Copy link
Contributor

Okay. I'm not really opposed to trying to make this accurate, even if it were purely on general principle. I have to say that things like inline asm constraints and libcall lowering sound like gaps that we should be fixing to allow an explicit alignment, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants