Skip to content

[llvm-exegesis] [AArch64] Resolving "snippet crashed while running: Segmentation fault" for Load Instructions #142552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

lakshayk-nv
Copy link
Contributor

@lakshayk-nv lakshayk-nv commented Jun 3, 2025

We want to support load and store instructions for aarch64 but currently they throw segmentation fault as register are initialized by 0 but load instruction requires register to be loaded with valid memory address.
This is a WIP patch and not expecting to merging it in current state but to get feedback.

Load registers requiring memory address

There are possibly two ways to support load instructions (setting registers with valid memory address):-

1. With address to auxiliary mmap:

Prerequisite for this is to support --mode=subprocess for AArch64.

Adding support for --mode=subprocess and Memory Annotation of manual snippet.

  • Adding memory setup, required by subprocess for AArch64.
  • Implement Auxiliary memory mmap, manual snippet mmap and configurePerfCounter().
  • Functions for syscall generations.

Generating syscall for aux mmap, save its return value in stack and load required registers with memory address.
Implemented the same, currently.

TODO: how to differentiate between register requiring address and otherwise ?

For example: LD1B opcode (ld1b    {z6.b}, p4/z, [x14, x2]) expects first register to contain address and second having offset value.
Temporary fix: Init first register queried by instruction to loaded by address and rest by setRegTo as done previously.

2. Utilize fillMemoryOperands

Found fillMemoryOperands() used by x86 implementation, seems relevant to init registers required by load instructions.
Implementation for fillMemoryOperands(), getScratchMemoryRegister() is missing for AArch64.
Firstly, Codeflow check for IsMemory (i.e. OPERAND_MEMORY) which is not relevant for AArch64.
Thus, Experimentally added IsMemory to OR mayLoadOrStore too (MCInstrDescView.cpp)

TODO: Implement getScratchMemoryRegister() correctly

  • return MCRegister() result in register to not be valid and exit with "Infeasible : target does not support memory instructions"
  • return X14, any hardcode register. Results in illegal instruction is generated: undefined physical register.

TODO: Implement fillMemoryOperands

PS: Added changes still required by this WIP Patch
Pathway 1: Enable load instructions in --mode=inprocess

  • Update isMemory implementation
  • Implement getScratchMemoryRegister()
  • Implement fillMemoryOperands()

Pathway 2: Enable load instructions in --mode=subprocess

  • Implement SysCall Functions
  • Implement Stack push and pop Functions
  • Implement Memory Setup Functions
    • Upper and Lower munmap (Motivation unclear) Currently, Not called
    • Auxiliary mmap (File Descriptor missing) Currently, fd=-1
    • Manual Snippet mmap (File Descriptor missing) Currently, fd=-1
  • ConfigurePerfCounter (Syscall Fails: Invalid FD (File Descriptor missing)) Currently, Not called
  • Store Aux memory address in stack
  • Resolve register requiring address and otherwise (offset,...)
    • Temporary fix: First register req. address for load instructions
  • Load register accordingly from stack or setRegTo

Please review: @sjoerdmeijer, @boomanaiden154, @davemgreen
Looking forward for feedback.
Thanks,

@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-tools-llvm-exegesis

Author: Lakshay Kumar (lakshayk-nv)

Changes

We want to support load and store instructions for aarch64 but currently they throw segmentation fault as register are initialized by 0 but load instruction requires register to be loaded with valid memory address.
This is a WIP patch and not expecting to merging it in current state but to get feedback.

A. Prerequisite for this is to support --mode=subprocess for AArch64.

Adding support for --mode=subprocess and Memory Annotation of manual snippet.

  • Adding memory setup, required by subprocess for AArch64.
  • Implement Auxiliary memory mmap, manual snippet mmap and configurePerfCounter().
  • Functions for syscall generations.

B. Load registers requiring memory address

There are possibly two ways to support load instructions (setting registers with valid memory address):-

1. With address to auxiliary mmap:

Generating syscall for aux mmap, save its return value in stack and load required registers with memory address.
Implemented the same, currently.

TODO: how to differentiate between register requiring address and otherwise ?

For example: LD1B opcode (ld1b    {z6.b}, p4/z, [x14, x2]) expects first register to contain address and second having offset value.
Temporary fix: Init first register queried by instruction to loaded by address and rest by setRegTo as done previously.

2. Utilize fillMemoryOperands

Found fillMemoryOperands() used by x86 implementation, seems relevant to init registers required by load instructions.
Implementation for fillMemoryOperands(), getScratchMemoryRegister() is missing for AArch64.
Firstly, Codeflow check for IsMemory (i.e. OPERAND_MEMORY) which is not relevant for AArch64.
Thus, Experimentally added IsMemory to OR mayLoadOrStore too (MCInstrDescView.cpp)

TODO: Implement getScratchMemoryRegister() correctly

  • return MCRegister() result in register to not be valid and exit with "Infeasible : target does not support memory instructions"
  • return X14, any hardcode register. Results in illegal instruction is generated: undefined physical register.

TODO: Implement fillMemoryOperands

Looking forward for feedback.
Thanks,


Patch is 32.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142552.diff

7 Files Affected:

  • (modified) llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp (+504-1)
  • (modified) llvm/tools/llvm-exegesis/lib/Assembler.cpp (+35-1)
  • (modified) llvm/tools/llvm-exegesis/lib/MCInstrDescView.cpp (+15-6)
  • (modified) llvm/tools/llvm-exegesis/lib/MCInstrDescView.h (+1)
  • (modified) llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.cpp (+6)
  • (modified) llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp (+8)
  • (modified) llvm/tools/llvm-exegesis/lib/Target.h (+4)
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index a1eb5a46f21fc..48a22d011a491 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -6,10 +6,26 @@
 //
 //===----------------------------------------------------------------------===//
 #include "../Target.h"
+#include "../Error.h"
+#include "../MmapUtils.h"
+#include "../SerialSnippetGenerator.h"
+#include "../SnippetGenerator.h"
+#include "../SubprocessMemory.h"
 #include "AArch64.h"
 #include "AArch64RegisterInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/MC/MCInstBuilder.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include <vector>
 
+#define DEBUG_TYPE "exegesis-aarch64-target"
 #if defined(__aarch64__) && defined(__linux__)
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <unistd.h> // for getpagesize()
+#ifdef HAVE_LIBPFM
+#include <perfmon/perf_event.h>
+#endif                   // HAVE_LIBPFM
 #include <linux/prctl.h> // For PR_PAC_* constants
 #include <sys/prctl.h>
 #ifndef PR_PAC_SET_ENABLED_KEYS
@@ -120,7 +136,7 @@ static MCInst loadPPRImmediate(MCRegister Reg, unsigned RegBitWidth,
 // Generates instructions to load an immediate value into an FPCR register.
 static std::vector<MCInst>
 loadFPCRImmediate(MCRegister Reg, unsigned RegBitWidth, const APInt &Value) {
-  MCRegister TempReg = AArch64::X8;
+  MCRegister TempReg = AArch64::X16;
   MCInst LoadImm = MCInstBuilder(AArch64::MOVi64imm).addReg(TempReg).addImm(0);
   MCInst MoveToFPCR =
       MCInstBuilder(AArch64::MSR).addImm(AArch64SysReg::FPCR).addReg(TempReg);
@@ -153,6 +169,89 @@ static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
   return Instructions;
 }
 
+static void generateRegisterStackPush(unsigned int RegToPush,
+                                      std::vector<MCInst> &GeneratedCode,
+                                      int imm = -16) {
+  // STR [X|W]t, [SP, #simm]!: SP is decremented by default 16 bytes
+  //                           before the store to maintain 16-bytes alignment.
+  if (AArch64::GPR64RegClass.contains(RegToPush)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::STRXpre)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPush)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else if (AArch64::GPR32RegClass.contains(RegToPush)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::STRWpre)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPush)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else {
+    llvm_unreachable("Unsupported register class for stack push");
+  }
+}
+
+static void generateRegisterStackPop(unsigned int RegToPopTo,
+                                     std::vector<MCInst> &GeneratedCode,
+                                     int imm = 16) {
+  // LDR Xt, [SP], #simm: SP is incremented by default 16 bytes after the load.
+  if (AArch64::GPR64RegClass.contains(RegToPopTo)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::LDRXpost)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPopTo)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else if (AArch64::GPR32RegClass.contains(RegToPopTo)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::LDRWpost)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPopTo)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else {
+    llvm_unreachable("Unsupported register class for stack pop");
+  }
+}
+
+void generateSysCall(long SyscallNumber, std::vector<MCInst> &GeneratedCode) {
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X8, 64, APInt(64, SyscallNumber)));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SVC).addImm(0));
+}
+
+/// Functions to save/restore system call registers
+#ifdef __linux__
+constexpr std::array<unsigned, 6> SyscallArgumentRegisters{
+    AArch64::X0, AArch64::X1, AArch64::X2,
+    AArch64::X3, AArch64::X4, AArch64::X5,
+};
+
+static void saveSysCallRegisters(std::vector<MCInst> &GeneratedCode,
+                                 unsigned ArgumentCount) {
+  // AArch64 Linux typically uses X0-X5 for the first 6 arguments.
+  // Some syscalls can take up to 8 arguments in X0-X7.
+  assert(ArgumentCount <= 6 &&
+         "This implementation saves up to 6 argument registers (X0-X5)");
+  // generateRegisterStackPush(AArch64::X16, GeneratedCode);
+  // Preserve X8 (used for the syscall number/return value).
+  generateRegisterStackPush(AArch64::X8, GeneratedCode);
+  // Preserve the registers used to pass arguments to the system call.
+  for (unsigned I = 0; I < ArgumentCount; ++I) {
+    generateRegisterStackPush(SyscallArgumentRegisters[I], GeneratedCode);
+  }
+}
+
+static void restoreSysCallRegisters(std::vector<MCInst> &GeneratedCode,
+                                    unsigned ArgumentCount) {
+  assert(ArgumentCount <= 6 &&
+         "This implementation restores up to 6 argument registers (X0-X5)");
+  // Restore argument registers, in opposite order of the way they are saved.
+  for (int I = ArgumentCount - 1; I >= 0; --I) {
+    generateRegisterStackPop(SyscallArgumentRegisters[I], GeneratedCode);
+  }
+  generateRegisterStackPop(AArch64::X8, GeneratedCode);
+  // generateRegisterStackPop(AArch64::X16, GeneratedCode);
+}
+#endif // __linux__
 #include "AArch64GenExegesis.inc"
 
 namespace {
@@ -162,7 +261,44 @@ class ExegesisAArch64Target : public ExegesisTarget {
   ExegesisAArch64Target()
       : ExegesisTarget(AArch64CpuPfmCounters, AArch64_MC::isOpcodeAvailable) {}
 
+  enum ArgumentRegisters {
+    CodeSize = AArch64::X12,
+    AuxiliaryMemoryFD = AArch64::X13
+  };
+
+  std::vector<MCInst> _generateRegisterStackPop(MCRegister Reg,
+                                                int imm = 0) const override {
+    std::vector<MCInst> Insts;
+    if (AArch64::GPR32RegClass.contains(Reg)) {
+      generateRegisterStackPop(Reg, Insts, imm);
+      return Insts;
+    }
+    if (AArch64::GPR64RegClass.contains(Reg)) {
+      generateRegisterStackPop(Reg, Insts, imm);
+      return Insts;
+    }
+    return {};
+  }
+
 private:
+#ifdef __linux__
+  void generateLowerMunmap(std::vector<MCInst> &GeneratedCode) const override;
+  void generateUpperMunmap(std::vector<MCInst> &GeneratedCode) const override;
+  std::vector<MCInst> generateExitSyscall(unsigned ExitCode) const override;
+  std::vector<MCInst>
+  generateMmap(uintptr_t Address, size_t Length,
+               uintptr_t FileDescriptorAddress) const override;
+  void generateMmapAuxMem(std::vector<MCInst> &GeneratedCode) const override;
+  void moveArgumentRegisters(std::vector<MCInst> &GeneratedCode) const override;
+  std::vector<MCInst> generateMemoryInitialSetup() const override;
+  std::vector<MCInst> setStackRegisterToAuxMem() const override;
+  uintptr_t getAuxiliaryMemoryStartAddress() const override;
+  std::vector<MCInst> configurePerfCounter(long Request,
+                                           bool SaveRegisters) const override;
+  std::vector<MCRegister> getArgumentRegisters() const override;
+  std::vector<MCRegister> getRegistersNeedSaving() const override;
+#endif // __linux__
+
   std::vector<MCInst> setRegTo(const MCSubtargetInfo &STI, MCRegister Reg,
                                const APInt &Value) const override {
     if (AArch64::GPR32RegClass.contains(Reg))
@@ -227,10 +363,377 @@ class ExegesisAArch64Target : public ExegesisTarget {
 
     return nullptr;
   }
+  MCRegister getScratchMemoryRegister(const Triple &) const override;
+  void fillMemoryOperands(InstructionTemplate &IT, MCRegister Reg,
+                          unsigned Offset) const override;
 };
 
 } // namespace
 
+// Implementation follows RISCV pattern for memory operand handling.
+// Note: This implementation requires validation for AArch64-specific
+// requirements.
+void ExegesisAArch64Target::fillMemoryOperands(InstructionTemplate &IT,
+                                               MCRegister Reg,
+                                               unsigned Offset) const {
+  LLVM_DEBUG(dbgs() << "Executing fillMemoryOperands");
+  // AArch64 memory operands typically have the following structure:
+  // [base_register, offset]
+  auto &I = IT.getInstr();
+  auto MemOpIt =
+      find_if(I.Operands, [](const Operand &Op) { return Op.isMemory(); });
+  assert(MemOpIt != I.Operands.end() &&
+         "Instruction must have memory operands");
+
+  const Operand &MemOp = *MemOpIt;
+
+  assert(MemOp.isReg() && "Memory operand expected to be register");
+
+  IT.getValueFor(MemOp) = MCOperand::createReg(Reg);
+  IT.getValueFor(MemOp) = MCOperand::createImm(Offset);
+}
+enum ScratchMemoryRegister {
+  Z = AArch64::Z14,
+  X = AArch64::X14,
+  W = AArch64::W14,
+};
+
+MCRegister
+ExegesisAArch64Target::getScratchMemoryRegister(const Triple &TT) const {
+  // return MCRegister();   // Implemented in target.h
+  // return hardcoded scratch memory register, similar to RISCV (uses a0)
+  return ScratchMemoryRegister::X;
+}
+
+#ifdef __linux__
+// true : let use of fixed address to Virtual Address Space Ceiling
+// false: let kernel choose the address of the auxiliary memory
+bool UseFixedAddress = true; // TODO: Remove this later
+
+static constexpr const uintptr_t VAddressSpaceCeiling = 0x0000800000000000;
+
+static void generateRoundToNearestPage(unsigned int TargetRegister,
+                                       std::vector<MCInst> &GeneratedCode) {
+  int PageSizeShift = static_cast<int>(round(log2(getpagesize())));
+  // Round down to the nearest page by getting rid of the least significant bits
+  // representing location in the page.
+
+  // Single instruction using AND with inverted mask (effectively BIC)
+  uint64_t BitsToClearMask = (1ULL << PageSizeShift) - 1; // 0xFFF
+  uint64_t AndMask = ~BitsToClearMask;                    // ...FFFFFFFFFFFF000
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ANDXri)
+                              .addReg(TargetRegister) // Xd
+                              .addReg(TargetRegister) // Xn
+                              .addImm(AndMask)        // imm bitmask
+  );
+}
+static void generateGetInstructionPointer(unsigned int ResultRegister,
+                                          std::vector<MCInst> &GeneratedCode) {
+  // ADR X[ResultRegister], . : loads address of current instruction
+  // ADR : Form PC-relative address
+  // This instruction adds an immediate value to the PC value to form a
+  // PC-relative address, and writes the result to the destination register.
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ADR)
+                              .addReg(ResultRegister) // Xd
+                              .addImm(0));            // Offset
+}
+
+// TODO: This implementation mirrors the x86 version and requires validation.
+// The purpose of this memory unmapping needs to be verified for AArch64
+void ExegesisAArch64Target::generateLowerMunmap(
+    std::vector<MCInst> &GeneratedCode) const {
+  // Unmap starting at address zero
+  GeneratedCode.push_back(loadImmediate(AArch64::X0, 64, APInt(64, 0)));
+  // Get the current instruction pointer so we know where to unmap up to.
+  generateGetInstructionPointer(AArch64::X1, GeneratedCode);
+  generateRoundToNearestPage(AArch64::X1, GeneratedCode);
+  // Subtract a page from the end of the unmap so we don't unmap the currently
+  // executing section.
+  long page_size = getpagesize();
+  // Load page_size into a temporary register (e.g., X16)
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X16, 64, APInt(64, page_size)));
+  // Subtract X16 (containing page_size) from X1
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SUBXrr)
+                              .addReg(AArch64::X1)    // Dest
+                              .addReg(AArch64::X1)    // Src
+                              .addReg(AArch64::X16)); // page_size
+  generateSysCall(SYS_munmap, GeneratedCode);
+}
+
+// FIXME: This implementation mirrors the x86 version and requires validation.
+// The purpose of this memory unmapping needs to be verified for AArch64
+// The correctness of this implementation needs to be verified.
+void ExegesisAArch64Target::generateUpperMunmap(
+    std::vector<MCInst> &GeneratedCode) const {
+  generateGetInstructionPointer(AArch64::X4, GeneratedCode);
+  // Load the size of the snippet from the argument register into X0
+  // FIXME: Argument register seems not be initialized.
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ORRXrr)
+                              .addReg(AArch64::X0)
+                              .addReg(AArch64::XZR)
+                              .addReg(ArgumentRegisters::CodeSize));
+  // Add the length of the snippet (in X0) to the current instruction pointer
+  // (in X4) to get the address where we should start unmapping at.
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ADDXrr)
+                              .addReg(AArch64::X0)
+                              .addReg(AArch64::X0)
+                              .addReg(AArch64::X4));
+  generateRoundToNearestPage(AArch64::X0, GeneratedCode);
+  // Add one page to the start address to ensure the address is above snippet.
+  // Since the above function rounds down.
+  long page_size = getpagesize();
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X16, 64, APInt(64, page_size)));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ADDXrr)
+                              .addReg(AArch64::X0)    // Dest
+                              .addReg(AArch64::X0)    // Src
+                              .addReg(AArch64::X16)); // page_size
+  // Unmap to just one page under the ceiling of the address space.
+  GeneratedCode.push_back(loadImmediate(
+      AArch64::X1, 64, APInt(64, VAddressSpaceCeiling - getpagesize())));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SUBXrr)
+                              .addReg(AArch64::X1)
+                              .addReg(AArch64::X1)
+                              .addReg(AArch64::X0));
+  generateSysCall(SYS_munmap, GeneratedCode); // SYS_munmap is 215
+}
+
+std::vector<MCInst>
+ExegesisAArch64Target::generateExitSyscall(unsigned ExitCode) const {
+  std::vector<MCInst> ExitCallCode;
+  ExitCallCode.push_back(loadImmediate(AArch64::X0, 64, APInt(64, ExitCode)));
+  generateSysCall(SYS_exit, ExitCallCode); // SYS_exit is 93
+  return ExitCallCode;
+}
+
+// FIXME: This implementation mirrors the x86 version and requires validation.
+// The correctness of this implementation needs to be verified.
+// mmap(address, length, prot, flags, fd, offset=0)
+std::vector<MCInst>
+ExegesisAArch64Target::generateMmap(uintptr_t Address, size_t Length,
+                                    uintptr_t FileDescriptorAddress) const {
+  int flags = MAP_SHARED;
+  if (Address != 0) {
+    flags |= MAP_FIXED_NOREPLACE;
+  }
+  std::vector<MCInst> MmapCode;
+  MmapCode.push_back(
+      loadImmediate(AArch64::X0, 64, APInt(64, Address))); // map adr
+  MmapCode.push_back(
+      loadImmediate(AArch64::X1, 64, APInt(64, Length))); // length
+  MmapCode.push_back(loadImmediate(AArch64::X2, 64,
+                                   APInt(64, PROT_READ | PROT_WRITE))); // prot
+  MmapCode.push_back(loadImmediate(AArch64::X3, 64, APInt(64, flags))); // flags
+  // FIXME: File descriptor address is not initialized.
+  // Copy file descriptor location from aux memory into X4
+  MmapCode.push_back(
+      loadImmediate(AArch64::X4, 64, APInt(64, FileDescriptorAddress))); // fd
+  // // Dereference file descriptor into FD argument register (TODO: Why? &
+  // correct?) MmapCode.push_back(
+  //   MCInstBuilder(AArch64::LDRWui)
+  //       .addReg(AArch64::W4)   // Destination register
+  //       .addReg(AArch64::X4)   // Base register (address)
+  //       .addImm(0)             // Offset (in 4-byte words, so 0 means no
+  //       offset)
+  // );
+  MmapCode.push_back(loadImmediate(AArch64::X5, 64, APInt(64, 0))); // offset
+  generateSysCall(SYS_mmap, MmapCode); // SYS_mmap is 222
+  return MmapCode;
+}
+
+// FIXME: This implementation mirrors the x86 version and requires validation.
+// The correctness of this implementation needs to be verified.
+void ExegesisAArch64Target::generateMmapAuxMem(
+    std::vector<MCInst> &GeneratedCode) const {
+  int fd = -1;
+  int flags = MAP_SHARED;
+  uintptr_t address = getAuxiliaryMemoryStartAddress();
+  if (fd == -1)
+    flags |= MAP_ANONYMOUS;
+  if (address != 0)
+    flags |= MAP_FIXED_NOREPLACE;
+  int prot = PROT_READ | PROT_WRITE;
+
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X0, 64, APInt(64, address))); // map adr
+  GeneratedCode.push_back(loadImmediate(
+      AArch64::X1, 64,
+      APInt(64, SubprocessMemory::AuxiliaryMemorySize))); // length
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X2, 64, APInt(64, prot))); // prot
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X3, 64, APInt(64, flags))); // flags
+  GeneratedCode.push_back(loadImmediate(AArch64::X4, 64, APInt(64, fd))); // fd
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X5, 64, APInt(64, 0))); // offset
+  generateSysCall(SYS_mmap, GeneratedCode);          // SYS_mmap is 222
+}
+
+void ExegesisAArch64Target::moveArgumentRegisters(
+    std::vector<MCInst> &GeneratedCode) const {
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ORRXrr)
+                              .addReg(ArgumentRegisters::CodeSize)
+                              .addReg(AArch64::XZR)
+                              .addReg(AArch64::X0));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ORRXrr)
+                              .addReg(ArgumentRegisters::AuxiliaryMemoryFD)
+                              .addReg(AArch64::XZR)
+                              .addReg(AArch64::X1));
+}
+
+std::vector<MCInst> ExegesisAArch64Target::generateMemoryInitialSetup() const {
+  std::vector<MCInst> MemoryInitialSetupCode;
+  // moveArgumentRegisters(MemoryInitialSetupCode);
+  // generateLowerMunmap(MemoryInitialSetupCode);   // TODO: Motivation Unclear
+  // generateUpperMunmap(MemoryInitialSetupCode);   // FIXME: Motivation Unclear
+  // TODO: Revert argument registers value, if munmap is used.
+
+  generateMmapAuxMem(MemoryInitialSetupCode); // FIXME: Uninit file descriptor
+
+  // If using fixed address for auxiliary memory skip this step,
+  // When using dynamic memory allocation (non-fixed address), we must preserve
+  // the mmap return value (X0) which contains the allocated memory address.
+  // This value is saved to the stack to ensure registers requiring memory
+  // access can retrieve the correct address even if X0 is modified by
+  // intermediate code.
+  generateRegisterStackPush(AArch64::X0, MemoryInitialSetupCode);
+  // FIXME: Ensure stack pointer remains stable to prevent loss of saved address
+  return MemoryInitialSetupCode;
+}
+
+// TODO: This implementation mirrors the x86 version and requires validation.
+// The purpose of moving stack pointer to aux memory needs to be verified for
+// AArch64
+std::vector<MCInst> ExegesisAArch64Target::setStackRegisterToAuxMem() const {
+  return std::vector<MCInst>(); // NOP
+
+  // Below is implementation for AArch64 but motivation unclear
+  // std::vector<MCInst> instructions; // NOP
+  // const uint64_t targetSPValue = getAuxiliaryMemoryStartAddress() +
+  //                               SubprocessMemory::AuxiliaryMemorySize;
+  // // sub, stack args and local storage
+  // // Use X16 as a temporary register since it's a scratch register
+  // const MCRegister TempReg = AArch64::X16;
+
+  // // Load the 64-bit immediate into TempReg using MOVZ/MOVK sequence
+  // // MOVZ Xd, #imm16, LSL #(shift_val * 16)
+  // // MOVK Xd, #imm16, LSL #(shift_val * 16) (* 3 times for 64-bit immediate)
+
+  // // 1. MOVZ TmpReg, #(targetSPValue & 0xFFFF), LSL #0
+  // instructions.push_back(
+  //     MCInstBuilder(AArch64::MOVZXi)
+  //         .addReg(TempReg)
+  //         .addImm(static_cast<uint16_t>(targetSPValue & 0xFFFF)) // imm16
+  //         .addImm(0));                               // hw (shift/16) = 0
+  // // 2. MOVK TmpReg,...
[truncated]

@lakshayk-nv lakshayk-nv changed the title [llvm-exegesis] Resolving "snippet crashed while running: Segmentation fault" for Load Instructions [llvm-exegesis] [AArch64] Resolving "snippet crashed while running: Segmentation fault" for Load Instructions Jun 3, 2025
@boomanaiden154
Copy link
Contributor

A. Prerequisite for this is to support --mode=subprocess for AArch64.

Why exactly did you need to support --mode=subprocess for this? It gives you some extra flexibility, but the scratch memory register/block is a lot simpler.

Copy link

github-actions bot commented Jun 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@lakshayk-nv
Copy link
Contributor Author

lakshayk-nv commented Jun 9, 2025

Why exactly did you need to support --mode=subprocess for this?

It is prerequisite only if loading registers with address to auxiliary mmap (Updated initial comment accordingly).

but the scratch memory register/block is a lot simpler.

We want to get some feedback on scratchMemoryRegister route.
Firstly, This require to change definition of isMemory() (currently true if OPERAND_MEMORY) to OPERAND_MEMORY || mayLoad || mayStore. Wherein, mayLoad is check of MCID::MayLoad flag of instruction.
This change seems reasonable but unsure of spillover of this for other architectures.

// The call stack for `fillMemoryOperands()` is the following :- 
main()
generateSnippets()
SnippetGenerator:generateConfigurations()
SerialSnippetGenerator:appendCodeTemplates()
switch (ExecutionModeBit)
case SERIAL_VIA_MEMORY_INSTR  // i.e. Instr.hasMemoryOperands i.e isMemory i.e. OPERAND_MEMORY (previously)
    if mayLoad
         if (!RegClass.contains(ScratchMemoryRegister)) return;
         fillMemoryOperands()

TODO: Implement getScratchMemoryRegister() correctly

return MCRegister() result in register to not be valid and exit with "Infeasible : target does not support memory instructions"
return X14, any hardcode register. Results in illegal instruction is generated: undefined physical register (if specifically that scratch register is not used generated instruction).

So, To enable load instructions in --mode=inprocess

  • Update isMemory implementation
  • Implement getScratchMemoryRegister()
  • Implement fillMemoryOperands()

Q0. Is updating isMemory to OPERAND_MEMORY || mayLoad || mayStore correct and without unintended consequences ?

Q1.1. [sanity check] Scratch memory register be an arbitrary register (i.e. X14) is correct ?
Q1.2. Any pointers on undefined physical register, why does this get thrown usually and potential wayout ?

Q2. fillMemoryOperands() Implementation
Q2.1. How fillMemoryOperands() differentiate memory address registers from register containing offset value ?
Q2.2. How fillMemoryOperands() get valid correct memory address to init/fill register ?

@sjoerdmeijer
Copy link
Collaborator

sjoerdmeijer commented Jun 9, 2025

This is a large change, which shows that there are a lot of moving part involved here. I understand and agree it is not easy to see what is necessary to fully support loads/stores (for AArch64). Thanks for putting up this WIP patch, maybe we can use this WIP patch to define the minimum change and use-case that we want to add first. I don't think we need to strive for complete support. So, again, I would like to see a minimal change that improves some opcodes, doesn't regress others, and very clearing stating the assumptions and things that can or should be picked up in follow ups.

Correct me if I am wrong, or if you have other opinions, but this is the first scenario I would like to see supported:

  • we generate a syscall to mmap with ADDR is NULL, so that it returns an address to a block of memory that is has reserved.
  • we use that memory address and store it in a register as the base address for the load/store operations,
  • for loads/stores that have a register + register addressing mode, i.e. a base + offset calculation, I am happy if we use the mmap return address as the base, and 0 for the offset, for now, just to get something working.

If we can get some new opcodes working with this strategy, and if they give some sensible results, then I think that is forward progress. If you agree with this exercise, then let's go ahead with this, first thing you can do is to strip-out everything that is not needed to support this minimal change. I don't mind either way, if you implement this minimal change in this merge request here, or keep this one around as a reference and open a new one for the minimal change.

@davemgreen or @boomanaiden154:, if you have opinions on any of this, please let us know.

@boomanaiden154
Copy link
Contributor

Is updating isMemory to OPERAND_MEMORY || mayLoad || mayStore correct and without unintended consequences ?

Could be. Not sure why exactly this would be different between X86 and AArch64 though. Probably something that should be understood before landing the patch, but doesn't hurt to try.

[sanity check] Scratch memory register be an arbitrary register (i.e. X14) is correct ?

No. The scratch memory register is defined by the calling convention. It's the first argument of the call that jumps to the assembly snippet.

Any pointers on undefined physical register, why does this get thrown usually and potential wayout ?

That comes up usually because a physical register is undefined at the MC verification stage. You'll probably need to define the register somewhere.

This is a large change, which shows that there are a lot of moving part involved here. I understand and agree it is not easy to see what is necessary to fully support loads/stores (for AArch64). Thanks for putting up this WIP patch, maybe we can use this WIP patch to define the minimum change and use-case that we want to add first.

The scratch memory register support should already work on AArch64 and thus would be a lot simpler. For using --execution_mode=subprocess, this is about the minimum support needed. It's mostly all or nothing due to what we need to properly generate setup and teardown code. It could probably be done incrementally though as some features like mmap generation could be pulled out.

@lakshayk-nv
Copy link
Contributor Author

Probably something that should be understood before landing the patch, but doesn't hurt to try.

I had checked and not found any regression on x86(-mcpu=znver4) and aarch64 (-mcpu=neoverse-v2) machines. Just wanted to precautious for introducing arch independent changes.

AArch64 doesn't have any instructions OPERAND_MEMORY, Its instructions only have MCID::MayLoad or MCID::MayStore flag associated with load and store instructions. Thus, I want to introduce change in IsMemory which trickle down apply check for fillMemoryOperands

No. The scratch memory register is defined by the calling convention

Sure, Will update implementation to get X0``getScratchMemoryRegister for AArch64.

Moreover, we can put in some effort to resolve illegal instruction when using fillMemoryOperands.
Previously, we left this approach as we were unsure of this approach thus moved to subprocess pathway.

This was due to the question
Q. "How is exegesis setting up the register to have correct memory address from which instruction can load value?" (if fillMemoryOperands is supposed to fill then, how?).
If you can expand on this that would be great.

@lakshayk-nv
Copy link
Contributor Author

lakshayk-nv commented Jun 10, 2025

For using --execution_mode=subprocess, this is about the minimum support needed.

Added changes includes implementation of required functions (mmap, munmap and configurePerfCounter), saving auxiliary memory address to stack and temporary fix to load first register via mmap address and rest by setRegto as done previously, which resolve most AArch64 load instructions.

Moreover, I wanted to understand x86 implementation for Memory Annotation (from @boomanaiden154, as you owner/implementor of them)
Q1. Motivation behind munmap calls ?

Q2. How is File Descriptor being managed ?
Q2.1. mmap for manual snippet requires file descriptor at an offset to Aux Memory starting address.
(llvm/tools/llvm-exegesis/lib/Assembler.cpp:70)

ET.getAuxiliaryMemoryStartAddress() + sizeof(int)(MemVal.Index + SubprocessMemory::AuxiliaryMemoryOffset`)

But file descriptor at this or any address seems not to populated i.e. No call for __NR_perf_event_open which init fd

Q2.2 Similarly for Auxiliary Memory requires X86::RSI to contain fd value. i.e. RSI is moved ArgumentRegisters::AuxiliaryMemoryFD which is used by mmap syscall.

Q3. File Descriptor for configurePerfCounter?
configurePerfCounter calls SYS_ioctl with file descriptor at getAuxiliaryMemoryStartAddress() but seeming not initialized.

@boomanaiden154
Copy link
Contributor

"How is exegesis setting up the register to have correct memory address from which instruction can load value?" (if fillMemoryOperands is supposed to fill then, how?).

We don't support putting an address directly into a register in subprocess mode currently. For now you have to specify a specific address and then add add an instruction/snippet setup to set a register to that value. This has been something I've wanted to add for a while, but I've never had a need for it.

Q1. Motivation behind munmap calls ?

Probably depends upon the specific call. Most likely just to clean things up at the end. It's probably not strictly necessary in a lot of cases.

How is File Descriptor being managed ?

Which file descriptor? I believe there is an FD for the shared memory that is created before we fork the subprocess and thus just exists in the child. The perf counter FD needs to be created afterwards to target the PID of the subprocess. We send it to the child process through a socket to perform FD translation. There was original implementation using pidfd_getfd, but that's still a relatively new syscall.

But file descriptor at this or any address seems not to populated i.e. No call for __NR_perf_event_open which init fd

It's done in the parent process and sent over. Most of the subprocess memory stuff should be handled by SubprocessMemory.

configurePerfCounter calls SYS_ioctl with file descriptor at getAuxiliaryMemoryStartAddress() but seeming not initialized.

Not initialize as in you have a debugger attached and the value is zero/random or not initialize as in you can't see where it's getting initialized? The tests pass, so it's definitely getting initialized at least in some contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants