-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[AArch64] Add support for -mlong-calls code generation #142982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-aarch64 Author: dong jianqiang (dongjianqiang2) ChangesThis patch implements backend support for -mlong-calls on AArch64 targets. When enabled, calls to external functions are lowered to an indirect call via an address computed using This is particularly useful when code and/or data exceeds the 26-bit immediate range of Key changes:
This patch ensures that long-calls are emitted correctly for both GlobalAddress and ExternalSymbol call targets. Tested:
Full diff: https://github.com/llvm/llvm-project/pull/142982.diff 4 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/Arch/AArch64.cpp b/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
index eaae9f876e3ad..2463bcdae2f4f 100644
--- a/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
@@ -466,6 +466,12 @@ void aarch64::getAArch64TargetFeatures(const Driver &D,
if (Args.getLastArg(options::OPT_mno_bti_at_return_twice))
Features.push_back("+no-bti-at-return-twice");
+
+ if (Arg *A = Args.getLastArg(options::OPT_mlong_calls,
+ options::OPT_mno_long_calls)) {
+ if (A->getOption().matches(options::OPT_mlong_calls))
+ Features.push_back("+long-calls");
+ }
}
void aarch64::setPAuthABIInTriple(const Driver &D, const ArgList &Args,
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 469c76752c78c..5af6ed5f1ffa2 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -825,6 +825,10 @@ def FeatureDisableFastIncVL : SubtargetFeature<"disable-fast-inc-vl",
"HasDisableFastIncVL", "true",
"Do not prefer INC/DEC, ALL, { 1, 2, 4 } over ADDVL">;
+def FeatureLongCalls : SubtargetFeature<"long-calls", "GenLongCalls", "true",
+ "Generate calls via indirect call "
+ "instructions">;
+
//===----------------------------------------------------------------------===//
// Architectures.
//
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 9f51caef6d228..d6015ccf94afc 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9286,8 +9286,12 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
Callee = DAG.getTargetGlobalAddress(CalledGlobal, DL, PtrVT, 0, OpFlags);
Callee = DAG.getNode(AArch64ISD::LOADgot, DL, PtrVT, Callee);
} else {
- const GlobalValue *GV = G->getGlobal();
- Callee = DAG.getTargetGlobalAddress(GV, DL, PtrVT, 0, OpFlags);
+ if (Subtarget->genLongCalls())
+ Callee = getAddr(G, DAG, OpFlags);
+ else {
+ const GlobalValue *GV = G->getGlobal();
+ Callee = DAG.getTargetGlobalAddress(GV, DL, PtrVT, 0, OpFlags);
+ }
}
} else if (auto *S = dyn_cast<ExternalSymbolSDNode>(Callee)) {
bool UseGot = (getTargetMachine().getCodeModel() == CodeModel::Large &&
@@ -9298,7 +9302,10 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, AArch64II::MO_GOT);
Callee = DAG.getNode(AArch64ISD::LOADgot, DL, PtrVT, Callee);
} else {
- Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
+ if (Subtarget->genLongCalls())
+ Callee = getAddr(S, DAG, 0);
+ else
+ Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
}
}
diff --git a/llvm/test/CodeGen/AArch64/aarch64-long-calls.ll b/llvm/test/CodeGen/AArch64/aarch64-long-calls.ll
new file mode 100644
index 0000000000000..cb41c3cf519e0
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-long-calls.ll
@@ -0,0 +1,26 @@
+; RUN: llc -O2 -mtriple=aarch64-linux-gnu -mcpu=generic -mattr=+long-calls < %s | FileCheck %s
+
+declare void @far_func()
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
+
+define void @test() {
+entry:
+ call void @far_func()
+ ret void
+}
+
+define void @test2(ptr %dst, i8 %val, i64 %len) {
+entry:
+ call void @llvm.memset.p0.i64(ptr %dst, i8 %val, i64 %len, i1 false)
+ ret void
+}
+
+; CHECK-LABEL: test:
+; CHECK: adrp {{x[0-9]+}}, far_func
+; CHECK: add {{x[0-9]+}}, {{x[0-9]+}}, :lo12:far_func
+; CHECK: blr {{x[0-9]+}}
+
+; CHECK-LABEL: test2:
+; CHECK: adrp {{x[0-9]+}}, memset
+; CHECK: add {{x[0-9]+}}, {{x[0-9]+}}, :lo12:memset
+; CHECK: blr {{x[0-9]+}}
|
@llvm/pr-subscribers-clang-driver Author: dong jianqiang (dongjianqiang2) ChangesThis patch implements backend support for -mlong-calls on AArch64 targets. When enabled, calls to external functions are lowered to an indirect call via an address computed using This is particularly useful when code and/or data exceeds the 26-bit immediate range of Key changes:
This patch ensures that long-calls are emitted correctly for both GlobalAddress and ExternalSymbol call targets. Tested:
Full diff: https://github.com/llvm/llvm-project/pull/142982.diff 4 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/Arch/AArch64.cpp b/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
index eaae9f876e3ad..2463bcdae2f4f 100644
--- a/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/AArch64.cpp
@@ -466,6 +466,12 @@ void aarch64::getAArch64TargetFeatures(const Driver &D,
if (Args.getLastArg(options::OPT_mno_bti_at_return_twice))
Features.push_back("+no-bti-at-return-twice");
+
+ if (Arg *A = Args.getLastArg(options::OPT_mlong_calls,
+ options::OPT_mno_long_calls)) {
+ if (A->getOption().matches(options::OPT_mlong_calls))
+ Features.push_back("+long-calls");
+ }
}
void aarch64::setPAuthABIInTriple(const Driver &D, const ArgList &Args,
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 469c76752c78c..5af6ed5f1ffa2 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -825,6 +825,10 @@ def FeatureDisableFastIncVL : SubtargetFeature<"disable-fast-inc-vl",
"HasDisableFastIncVL", "true",
"Do not prefer INC/DEC, ALL, { 1, 2, 4 } over ADDVL">;
+def FeatureLongCalls : SubtargetFeature<"long-calls", "GenLongCalls", "true",
+ "Generate calls via indirect call "
+ "instructions">;
+
//===----------------------------------------------------------------------===//
// Architectures.
//
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 9f51caef6d228..d6015ccf94afc 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9286,8 +9286,12 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
Callee = DAG.getTargetGlobalAddress(CalledGlobal, DL, PtrVT, 0, OpFlags);
Callee = DAG.getNode(AArch64ISD::LOADgot, DL, PtrVT, Callee);
} else {
- const GlobalValue *GV = G->getGlobal();
- Callee = DAG.getTargetGlobalAddress(GV, DL, PtrVT, 0, OpFlags);
+ if (Subtarget->genLongCalls())
+ Callee = getAddr(G, DAG, OpFlags);
+ else {
+ const GlobalValue *GV = G->getGlobal();
+ Callee = DAG.getTargetGlobalAddress(GV, DL, PtrVT, 0, OpFlags);
+ }
}
} else if (auto *S = dyn_cast<ExternalSymbolSDNode>(Callee)) {
bool UseGot = (getTargetMachine().getCodeModel() == CodeModel::Large &&
@@ -9298,7 +9302,10 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, AArch64II::MO_GOT);
Callee = DAG.getNode(AArch64ISD::LOADgot, DL, PtrVT, Callee);
} else {
- Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
+ if (Subtarget->genLongCalls())
+ Callee = getAddr(S, DAG, 0);
+ else
+ Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
}
}
diff --git a/llvm/test/CodeGen/AArch64/aarch64-long-calls.ll b/llvm/test/CodeGen/AArch64/aarch64-long-calls.ll
new file mode 100644
index 0000000000000..cb41c3cf519e0
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-long-calls.ll
@@ -0,0 +1,26 @@
+; RUN: llc -O2 -mtriple=aarch64-linux-gnu -mcpu=generic -mattr=+long-calls < %s | FileCheck %s
+
+declare void @far_func()
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
+
+define void @test() {
+entry:
+ call void @far_func()
+ ret void
+}
+
+define void @test2(ptr %dst, i8 %val, i64 %len) {
+entry:
+ call void @llvm.memset.p0.i64(ptr %dst, i8 %val, i64 %len, i1 false)
+ ret void
+}
+
+; CHECK-LABEL: test:
+; CHECK: adrp {{x[0-9]+}}, far_func
+; CHECK: add {{x[0-9]+}}, {{x[0-9]+}}, :lo12:far_func
+; CHECK: blr {{x[0-9]+}}
+
+; CHECK-LABEL: test2:
+; CHECK: adrp {{x[0-9]+}}, memset
+; CHECK: add {{x[0-9]+}}, {{x[0-9]+}}, :lo12:memset
+; CHECK: blr {{x[0-9]+}}
|
My understanding is that this will make all calls to global functions into long calls. In AArch64 static linkes are required to insert range extension thunks for out of range BLs. In the best case this is just another direct branch, at worst case for I note that with |
This option is explicitly designed to enable reliable patching workflows when compiling object files. It is to guarantee call range safety in patches. When modifying/recompiling individual object files (e.g., during security patches),final memory layouts are unknown at compile time, patched functions might end up >128MB away from callers. -mlong-calls forces all cross-object calls to use 64-bit absolute addressing. |
-mlong-calls is an old-fashioned compiler option. I think it was added before linkers knew range extension thunks (aka stubs, veneers, etc). Can you use -fno-plt instead? It works with both SelectionDAG and GlobalISel. You will get GOT-generating code sequence that can be optimized to adrp+add by the linker. The proposed -mlong-calls is -fno-pic hack that works with limited scenarios with a large performance downside. I don't think we should support it. |
If I've understood object patching, this would mean inserting a new function implementation, and binary patching all the call-sites to point to the new implementation. As an aside to this patch. I'd be tempted to see if I could indirect all the calls via the PLT. Then you'd be able add the new function and alter the dynamic symbol table entry to point to the new implementation and the dynamic linker would do the rest. That might need some fiddling in the linker or compiler driver to force it to create a PLT entry, --shared would do it, but for an executable we'd need a PT_INTERPRET section. There was a Discourse thread on ROM Patching for embedded systems https://discourse.llvm.org/t/rfc-a-user-guided-rom-patching-mechanism-for-embedded-applications/78467 which had a similar idea. |
Yes, we are indeed still using the -mlong-calls option in our older embedded systems. This is necessary due to the lack of support for GOT-based relocation types in these environments. As a result, we have incorporated this option to ensure compatibility and functionality. Moving forward, it's important of adding support in SelectionDAG and GlobalISel for these scenarios. |
fcf661d
to
e828099
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
This patch implements backend support for -mlong-calls on AArch64 targets. When enabled, calls to external functions are lowered to an indirect call via an address computed using `adrp` and `add` rather than a direct `bl` instruction, which is limited to a ±128MB PC-relative offset. This is particularly useful when code and/or data exceeds the 26-bit immediate range of `bl`, such as in large binaries or link-time-optimized builds. Key changes: - In SelectionDAG lowering (`LowerCall`), detect `-mlong-calls` and emit: - `adrp + add` address calculation - `blr` indirect call instruction This patch ensures that long-calls are emitted correctly for both GlobalAddress and ExternalSymbol call targets. Tested: - New codegen tests under `llvm/test/CodeGen/AArch64/aarch64-long-calls.ll` - Verified `adrp + add + blr` output in `.s` for global and external functions
e828099
to
cf005a3
Compare
Thanks @smithp35 for your solution! I would like to kindly ask for your expertise in reviewing the following code, which implements backend support for |
I'm mostly a linker/ABI person so I'm not much of an expert in code-generation, if I can find some time I can check to see if I can spot any obvious mistakes. The thing I'd want to check for is that the rest of the backend has recorded these additional indirect calls. I'm thinking in particular of BTI which the compiler can sometimes omit when it can show there are no indirect calls to a symbol. I can't help with this being merged as this is a maintainers call. Even if the code is correct today, it will need to be maintained and future changes/transformations will need to make sure it doesn't break. The maintainers have to decide whether the use case is worth it for a wide-variety of use cases or whether it should be a downstream change.
The two linkers I'm most familiar with lld and Arm's proprietary linker armlink will statically resolve the GOT relocations when doing a static link, I would expect GNU ld to do this too. LLD will even transform the GOT access to a PC-relative one when the definition is local https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#579relocation-optimization . |
This patch implements backend support for -mlong-calls on AArch64 targets. When enabled, calls to external functions are lowered to an indirect call via an address computed using
adrp
andadd
rather than a directbl
instruction, which is limited to a ±128MB PC-relative offset.This is particularly useful when code and/or data exceeds the 26-bit immediate range of
bl
, such as in large binaries or link-time-optimized builds.Key changes:
LowerCall
), detect-mlong-calls
and emit:adrp + add
address calculationblr
indirect call instructionThis patch ensures that long-calls are emitted correctly for both GlobalAddress and ExternalSymbol call targets.
Tested:
llvm/test/CodeGen/AArch64/aarch64-long-calls.ll
adrp + add + blr
output in.s
for global and external functions