Skip to content

[feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv #144620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
2 changes: 1 addition & 1 deletion cross-project-tests/lit.cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell)

# suffixes: A list of file extensions to treat as test files.
config.suffixes = [".c", ".cl", ".cpp", ".m"]
config.suffixes = [".c", ".cl", ".cpp", ".m", ".s"]

# excludes: A list of directories to exclude from the testsuite. The 'Inputs'
# subdirectories contain auxiliary inputs for various tests in their parent
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/MC/MCInstrAnalysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/Compiler.h"
#include <cstdint>
#include <vector>
Expand Down Expand Up @@ -182,6 +183,12 @@ class LLVM_ABI MCInstrAnalysis {
evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const;

/// Given an instruction that accesses a memory address, try to compute
/// the target address. Return true on success, and the address in \p Target.
virtual bool evaluateInstruction(const MCInst &Inst, uint64_t Addr,
uint64_t Size, uint64_t &Target,
const MCSubtargetInfo &STI) const;

/// Given an instruction tries to get the address of a memory operand. Returns
/// the address on success.
virtual std::optional<uint64_t>
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/MC/MCInstrAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@ bool MCInstrAnalysis::evaluateBranch(const MCInst & /*Inst*/, uint64_t /*Addr*/,
return false;
}

bool MCInstrAnalysis::evaluateInstruction(const MCInst &Inst, uint64_t Addr,
uint64_t Size, uint64_t &Target,
const MCSubtargetInfo &STI) const {
return false;
}

std::optional<uint64_t> MCInstrAnalysis::evaluateMemoryOperandAddress(
const MCInst &Inst, const MCSubtargetInfo *STI, uint64_t Addr,
uint64_t Size) const {
Expand Down
105 changes: 99 additions & 6 deletions llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
#include <bitset>
#include <cstdint>

#define GET_INSTRINFO_MC_DESC
#define ENABLE_INSTR_PREDICATE_VERIFIER
Expand Down Expand Up @@ -182,6 +184,17 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
}

switch (Inst.getOpcode()) {
case RISCV::C_LUI:
case RISCV::LUI: {
setGPRState(Inst.getOperand(0).getReg(),
SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
break;
}
case RISCV::AUIPC: {
setGPRState(Inst.getOperand(0).getReg(),
Addr + SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
break;
}
default: {
// Clear the state of all defined registers for instructions that we don't
// explicitly support.
Expand All @@ -193,10 +206,6 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
}
break;
}
case RISCV::AUIPC:
setGPRState(Inst.getOperand(0).getReg(),
Addr + SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
break;
}
}

Expand Down Expand Up @@ -234,6 +243,91 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
return false;
}

bool evaluateInstruction(const MCInst &Inst, uint64_t Addr, uint64_t Size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a distinct function to evaluateBranch in the virtual interface? It feels like the two are fundamentally the same from a high-level point of view, since a branch is an instruction. Of course, within the target-specific layer, you could split it then.

If the purpose is purely so you know when a thing is branch specifically, it feels like you could achieve that through some other means, for example an enum return code that has values like Invalid, Branch, Other, or maybe a bitfield with some basic flags.

Copy link
Author

@arjunUpatel arjunUpatel Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes more sense to merge the two since their functionality overlaps heavily. The purpose of splitting the two is that RISCV addi instruction behaves differently when the subtargetinfo is different (32 bit vs 64 bit). Hence, the subtargetinfo is passed as a parameter to evaluateInstruction, while evaluateBranch does not accept this parameter. We can either set the subtargetinfo as a default parameter in the declaration of evaluateBranch or make the subtargetinfo an attribute of MCInstrAnalysis. I like the latter because off ofa quick skim it looks like across all the invocations of the functions that are attributes of MCInstrAnalysis and have subtargetinfo as a parameter, none actively change the subtargetinfo once the parent object is initialized. How about I experiment with these two approaches in a separate PR?

uint64_t &Target,
const MCSubtargetInfo &STI) const override {
unsigned int ArchRegWidth = STI.getTargetTriple().getArchPointerBitWidth();
switch(Inst.getOpcode()) {
default:
return false;
case RISCV::C_ADDI:
case RISCV::ADDI: {
MCRegister Reg = Inst.getOperand(1).getReg();
auto TargetRegState = getGPRState(Reg);
if (TargetRegState && Reg != RISCV::X0) {
Target = *TargetRegState + Inst.getOperand(2).getImm();
Target &= maskTrailingOnes<uint64_t>(ArchRegWidth);
return true;
}
break;
}
case RISCV::C_ADDIW:
case RISCV::ADDIW: {
MCRegister Reg = Inst.getOperand(1).getReg();
auto TargetRegState = getGPRState(Reg);
if (TargetRegState && Reg != RISCV::X0) {
Target = *TargetRegState + Inst.getOperand(2).getImm();
Target = SignExtend64<32>(Target);
return true;
}
break;
}
case RISCV::LB:
case RISCV::LH:
case RISCV::LD:
case RISCV::LW:
case RISCV::LBU:
case RISCV::LHU:
case RISCV::LWU:
case RISCV::SB:
case RISCV::SH:
case RISCV::SW:
case RISCV::SD:
case RISCV::FLH:
case RISCV::FLW:
case RISCV::FLD:
case RISCV::FSH:
case RISCV::FSW:
case RISCV::FSD:
case RISCV::C_LD:
case RISCV::C_SD:
case RISCV::C_FLD:
case RISCV::C_FSD:
case RISCV::C_SW:
case RISCV::C_LW:
case RISCV::C_FSW:
case RISCV::C_FLW:
case RISCV::C_LBU:
case RISCV::C_LH:
case RISCV::C_LHU:
case RISCV::C_SB:
case RISCV::C_SH:
case RISCV::C_LWSP:
case RISCV::C_SWSP:
case RISCV::C_LDSP:
case RISCV::C_SDSP:
case RISCV::C_FLWSP:
case RISCV::C_FSWSP:
case RISCV::C_FLDSP:
case RISCV::C_FSDSP:
case RISCV::C_LD_RV32:
case RISCV::C_SD_RV32:
case RISCV::C_SDSP_RV32:
case RISCV::LD_RV32:
case RISCV::C_LDSP_RV32:
case RISCV::SD_RV32: {
MCRegister Reg = Inst.getOperand(1).getReg();
auto TargetRegState = getGPRState(Reg);
if (TargetRegState) {
Target = *TargetRegState + Inst.getOperand(2).getImm();
return true;
}
break;
}
}
return false;
}

bool isTerminator(const MCInst &Inst) const override {
if (MCInstrAnalysis::isTerminator(Inst))
return true;
Expand Down Expand Up @@ -344,12 +438,11 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTargetMC() {
TargetRegistry::RegisterELFStreamer(*T, createRISCVELFStreamer);
TargetRegistry::RegisterObjectTargetStreamer(
*T, createRISCVObjectTargetStreamer);
TargetRegistry::RegisterMCInstrAnalysis(*T, createRISCVInstrAnalysis);

// Register the asm target streamer.
TargetRegistry::RegisterAsmTargetStreamer(*T, createRISCVAsmTargetStreamer);
// Register the null target streamer.
TargetRegistry::RegisterNullTargetStreamer(*T,
createRISCVNullTargetStreamer);
TargetRegistry::RegisterMCInstrAnalysis(*T, createRISCVInstrAnalysis);
}
}
2 changes: 2 additions & 0 deletions llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
if "RISCV" not in config.targets_to_build:
config.unsupported = True
32 changes: 32 additions & 0 deletions llvm/test/tools/llvm-objdump/RISCV/riscv32-ar-coverage.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# RUN: llvm-mc -riscv-add-build-attributes -triple=riscv32 -filetype=obj -mattr=+zclsd,+zilsd %s -o %t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "ar-coverage" mean in these test file names?

# RUN: llvm-objdump -d %t | FileCheck %s

# CHECK: 00000000 <_start>:
# CHECK-NEXT 0: 00000517 auipc a0, 0x0
# CHECK-NEXT 4: 0559 addi a0, a0, 0x16 <target>
# CHECK-NEXT 6: 00000517 auipc a0, 0x0
# CHECK-NEXT a: 6910 ld a2, 0x10(a0) <target>
# CHECK-NEXT c: 00000517 auipc a0, 0x0
# CHECK-NEXT 10: 00c53523 sd a2, 0xa(a0) <target>
# CHECK-NEXT 14: 0000 unimp

# The structure of this test file is similar to that of riscv64-ar-coverage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in newer llvm-objdump tests, we use ## for true comment markers, versus # for lit and FileCheck directives, to help them stand out better.

# with the major difference being that these tests are focused on instructions
# for 32 bit architecture.

.global _start
.text
_start:
auipc a0, 0x0
addi a0, a0, 0x16 # addi -- behavior changes with different architectures.

# Test Zclsd and Zilsd instructions respectively
auipc a0, 0x0
c.ld a2, 0x10(a0)

auipc a0, 0x0
sd a2, 0xa(a0)

.skip 0x2
target:
ret:
99 changes: 99 additions & 0 deletions llvm/test/tools/llvm-objdump/RISCV/riscv64-ar-coverage.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# RUN: llvm-mc -riscv-add-build-attributes -triple=riscv64 -filetype=obj -mattr=+f,+c,+zcb %s -o %t
# RUN: llvm-objdump -d %t | FileCheck %s

# CHECK: 0000000000000000 <_start>:
# CHECK-NEXT: 0: 00010517 auipc a0, 0x10
# CHECK-NEXT: 4: 00450513 addi a0, a0, 0x4 <target>
# CHECK-NEXT: 8: 00010517 auipc a0, 0x10
# CHECK-NEXT: c: 1571 addi a0, a0, -0x4 <target>
# CHECK-NEXT: e: 6541 lui a0, 0x10
# CHECK-NEXT: 10: 0045059b addiw a1, a0, 0x4 <target>
# CHECK-NEXT: 14: 6541 lui a0, 0x10
# CHECK-NEXT: 16: 2511 addiw a0, a0, 0x4 <target>
# CHECK-NEXT: 18: 00110537 lui a0, 0x110
# CHECK-NEXT: 1c: c50c sw a1, 0x8(a0) <far_target>
# CHECK-NEXT: 1e: 00110537 lui a0, 0x110
# CHECK-NEXT: 22: 4508 lw a0, 0x8(a0) <far_target>
# CHECK-NEXT: 24: 6541 lui a0, 0x10
# CHECK-NEXT: 26: 6585 lui a1, 0x1
# CHECK-NEXT: 28: 0306 slli t1, t1, 0x1
# CHECK-NEXT: 2a: 0511 addi a0, a0, 0x4 <target>
# CHECK-NEXT: 2c: 0505 addi a0, a0, 0x1
# CHECK-NEXT: 2e: 00002427 fsw ft0, 0x8(zero) <_start+0x8>
# CHECK-NEXT: 32: 00110097 auipc ra, 0x110
# CHECK-NEXT: 36: fda080e7 jalr -0x26(ra) <func>
# CHECK-NEXT: 3a: 6445 lui s0, 0x11
# CHECK-NEXT: 3c: 8800 sb s0, 0x0(s0) <zcb>
# CHECK-NEXT: 3e: 4522 lw a0, 0x8(sp)

# The core of the feature being added was address resolution for instruction
# sequences where a register is populated by immediate values via two
# separate instructions. First by an instruction that provides the upper bits
# (auipc, lui, etc) followed by another instruction for the lower bits (addi,
# jalr, ld, etc.).

.global _start
.text

_start:
# Test block 1-3 each focus on a certain starting instruction in a sequence.
# Starting instructions are the ones that provide the upper bits. The other
# instruction in the sequence is the one that provides the lower bits. The
# second instruction is arbitrarily chosen to increase code coverage.

# Test block #1.
lla a0, target
auipc a0, 0x10
c.addi a0, -0x4

# Test block #2.
c.lui a0, 0x10
addiw a1, a0, 0x4
c.lui a0, 0x10
c.addiw a0, 0x4

# Test block #3.
lui a0, 0x110
sw a1, 0x8(a0)
lui a0, 0x110
c.lw a0, 0x8(a0)

# Test block 4 tests instruction interleaving. Essentially the code's
# ability to keep track of a valid sequence even if multiple other unrelated
# instructions separate the two.
lui a0, 0x10
lui a1, 0x1 # Unrelated instruction.
slli t1, t1, 0x1 # Unrelated instruction.
addi a0, a0, 0x4
addi a0, a0, 0x1 # Verify register tracking terminates.

# Test 5 checks instructions providing upper bits do not change the tracked
# value of zero register. Also ensures load/store instructions accessing data
# relative to the zero register trigger address resolution. The latter kind
# of instructions are essentially memory accesses relative to the zero
# register.
fsw f0, 0x8(x0)

# Test 6 ensures that the newly added functionality is compatible with
# code that already worked for branch instructions.
call func

# Test #7 -- zcb extension.
lui x8, 0x11
c.sb x8, 0(x8)

# Test #8 -- stack based load/stores.
c.lwsp a0, 0x8(sp)

# These are the labels that the instructions above are expected to resolve to.
.skip 0xffc4
target:
.word 1
.skip 0xff8
zcb:
.word 1
.skip 0xff004
far_target:
.word 2
func:
ret
12 changes: 7 additions & 5 deletions llvm/tools/llvm-objdump/llvm-objdump.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1520,8 +1520,8 @@ collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
if (MIA) {
if (Disassembled) {
uint64_t Target;
bool TargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
if (TargetKnown && (Target >= Start && Target < End) &&
bool BranchTargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
if (BranchTargetKnown && (Target >= Start && Target < End) &&
!Targets.count(Target)) {
// On PowerPC and AIX, a function call is encoded as a branch to 0.
// On other PowerPC platforms (ELF), a function call is encoded as
Expand Down Expand Up @@ -2356,8 +2356,10 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
llvm::raw_ostream *TargetOS = &FOS;
uint64_t Target;
bool PrintTarget = DT->InstrAnalysis->evaluateBranch(
Inst, SectionAddr + Index, Size, Target);

Inst, SectionAddr + Index, Size, Target) ||
DT->InstrAnalysis->evaluateInstruction(
Inst, SectionAddr + Index, Size, Target,
*DT->SubtargetInfo);
if (!PrintTarget) {
if (std::optional<uint64_t> MaybeTarget =
DT->InstrAnalysis->evaluateMemoryOperandAddress(
Expand Down Expand Up @@ -2430,7 +2432,7 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
break;
}

// Branch targets are printed just after the instructions.
// Instruction targets are printed just after the instructions.
// Print the labels corresponding to the target if there's any.
bool BBAddrMapLabelAvailable = BBAddrMapLabels.count(Target);
bool LabelAvailable = AllLabels.count(Target);
Expand Down
Loading