Skip to content

[llvm-exegesis] [AArch64] Add support for Load Instructions in subprocess execution mode #144895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lakshayk-nv
Copy link
Contributor

@lakshayk-nv lakshayk-nv commented Jun 19, 2025

We want to support load and store instructions for aarch64 but currently they throw segmentation fault as register are initialized by 0 but load instruction requires register to be loaded with valid memory address.

Load registers requiring memory address

  1. With address to auxiliary mmap:
    Prerequisite for this is to support --mode=subprocess for AArch64.
    Adding support for --mode=subprocess and Memory Annotation of manual snippet.

Adding memory setup, required by subprocess for AArch64.
Implement Auxiliary memory mmap, manual snippet mmap and configurePerfCounter().
Functions for syscall generations.
Generating syscall for aux mmap, save its return value in stack and load required registers with memory address.
Implemented the same, currently.

TODO: how to differentiate between register requiring address and otherwise ?
For example: LD1B opcode (ld1b {z6.b}, p4/z, [x14, x2]) expects first register to contain address and second having offset value.
Current fix is Init first register queried by instruction to loaded by address and rest by setRegTo as done previously.

Enable load instructions in --mode=subprocess

  • Implement SysCall Functions
  • Implement Stack push and pop Functions
  • Implement Memory Setup Functions
    • Auxiliary mmap (File Descriptor missing) Currently, fd=-1
    • Manual Snippet mmap (File Descriptor missing) Currently, fd=-1
  • ConfigurePerfCounter (Syscall Fails: Invalid FD (File Descriptor missing)) Currently, NOP
  • Store Aux memory address in stack
  • Resolve register requiring address and otherwise (offset,...)
    • Temporary fix: First register req. address for load instructions
  • Load register accordingly from stack or setRegTo

Please review: @sjoerdmeijer, @boomanaiden154, @davemgreen
This is minimal patch required
Thanks,

@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2025

@llvm/pr-subscribers-tools-llvm-exegesis

Author: Lakshay Kumar (lakshayk-nv)

Changes

We want to support load and store instructions for aarch64 but currently they throw segmentation fault as register are initialized by 0 but load instruction requires register to be loaded with valid memory address.

Load registers requiring memory address

  1. With address to auxiliary mmap:
    Prerequisite for this is to support --mode=subprocess for AArch64.
    Adding support for --mode=subprocess and Memory Annotation of manual snippet.

Adding memory setup, required by subprocess for AArch64.
Implement Auxiliary memory mmap, manual snippet mmap and configurePerfCounter().
Functions for syscall generations.
Generating syscall for aux mmap, save its return value in stack and load required registers with memory address.
Implemented the same, currently.

TODO: how to differentiate between register requiring address and otherwise ?
For example: LD1B opcode (ld1b {z6.b}, p4/z, [x14, x2]) expects first register to contain address and second having offset value.
Current fix is Init first register queried by instruction to loaded by address and rest by setRegTo as done previously.

Enable load instructions in --mode=subprocess

  • Implement SysCall Functions
  • Implement Stack push and pop Functions
  • Implement Memory Setup Functions
    • Upper and Lower munmap (Motivation unclear) Currently, Not called
    • Auxiliary mmap (File Descriptor missing) Currently, fd=-1
    • Manual Snippet mmap (File Descriptor missing) Currently, fd=-1
  • ConfigurePerfCounter (Syscall Fails: Invalid FD (File Descriptor missing)) Currently, Not called
  • Store Aux memory address in stack
  • Resolve register requiring address and otherwise (offset,...)
    • Temporary fix: First register req. address for load instructions
  • Load register accordingly from stack or setRegTo

Please review: @sjoerdmeijer, @boomanaiden154, @davemgreen
Looking forward for feedback.
Thanks,


Patch is 20.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144895.diff

4 Files Affected:

  • (modified) llvm/test/tools/llvm-exegesis/AArch64/setReg_init_check.s (+2-2)
  • (modified) llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp (+278-74)
  • (modified) llvm/tools/llvm-exegesis/lib/Assembler.cpp (+34)
  • (modified) llvm/tools/llvm-exegesis/lib/Target.h (+4)
diff --git a/llvm/test/tools/llvm-exegesis/AArch64/setReg_init_check.s b/llvm/test/tools/llvm-exegesis/AArch64/setReg_init_check.s
index a4350fc6dc2cb..d0dc5c744ab80 100644
--- a/llvm/test/tools/llvm-exegesis/AArch64/setReg_init_check.s
+++ b/llvm/test/tools/llvm-exegesis/AArch64/setReg_init_check.s
@@ -70,6 +70,6 @@ RUN: llvm-objdump -d %d > %t.s
 RUN: FileCheck %s --check-prefix=FPCR-ASM < %t.s
 FPCR-ASM:         <foo>:
 FPCR-ASM:         movi    d{{[0-9]+}}, #0000000000000000
-FPCR-ASM-NEXT:    mov     x8, #0x0
-FPCR-ASM-NEXT:    msr     FPCR, x8
+FPCR-ASM-NEXT:    mov     x16, #0x0
+FPCR-ASM-NEXT:    msr     FPCR, x16
 FPCR-ASM-NEXT:    bfcvt   h{{[0-9]+}}, s{{[0-9]+}}
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index a1eb5a46f21fc..df9fbf6946005 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -6,18 +6,28 @@
 //
 //===----------------------------------------------------------------------===//
 #include "../Target.h"
+#include "../Error.h"
+#include "../MmapUtils.h"
+#include "../SerialSnippetGenerator.h"
+#include "../SnippetGenerator.h"
+#include "../SubprocessMemory.h"
 #include "AArch64.h"
 #include "AArch64RegisterInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/MC/MCInstBuilder.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include <vector>
+#define DEBUG_TYPE "exegesis-aarch64-target"
 
 #if defined(__aarch64__) && defined(__linux__)
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <unistd.h> // for getpagesize()
+#ifdef HAVE_LIBPFM
+#include <perfmon/perf_event.h>
+#endif                   // HAVE_LIBPFM
 #include <linux/prctl.h> // For PR_PAC_* constants
 #include <sys/prctl.h>
-#ifndef PR_PAC_SET_ENABLED_KEYS
-#define PR_PAC_SET_ENABLED_KEYS 60
-#endif
-#ifndef PR_PAC_GET_ENABLED_KEYS
-#define PR_PAC_GET_ENABLED_KEYS 61
-#endif
 #ifndef PR_PAC_APIAKEY
 #define PR_PAC_APIAKEY (1UL << 0)
 #endif
@@ -38,47 +48,6 @@
 namespace llvm {
 namespace exegesis {
 
-bool isPointerAuth(unsigned Opcode) {
-  switch (Opcode) {
-  default:
-    return false;
-
-  // FIXME: Pointer Authentication instructions.
-  // We would like to measure these instructions, but they can behave
-  // differently on different platforms, and maybe the snippets need to look
-  // different for these instructions,
-  // Platform-specific handling:  On Linux, we disable authentication, may
-  // interfere with measurements. On non-Linux platforms, disable opcodes for
-  // now.
-  case AArch64::AUTDA:
-  case AArch64::AUTDB:
-  case AArch64::AUTDZA:
-  case AArch64::AUTDZB:
-  case AArch64::AUTIA:
-  case AArch64::AUTIA1716:
-  case AArch64::AUTIASP:
-  case AArch64::AUTIAZ:
-  case AArch64::AUTIB:
-  case AArch64::AUTIB1716:
-  case AArch64::AUTIBSP:
-  case AArch64::AUTIBZ:
-  case AArch64::AUTIZA:
-  case AArch64::AUTIZB:
-    return true;
-  }
-}
-
-bool isLoadTagMultiple(unsigned Opcode) {
-  switch (Opcode) {
-  default:
-    return false;
-
-  // Load tag multiple instruction
-  case AArch64::LDGM:
-    return true;
-  }
-}
-
 static unsigned getLoadImmediateOpcode(unsigned RegBitWidth) {
   switch (RegBitWidth) {
   case 32:
@@ -120,7 +89,7 @@ static MCInst loadPPRImmediate(MCRegister Reg, unsigned RegBitWidth,
 // Generates instructions to load an immediate value into an FPCR register.
 static std::vector<MCInst>
 loadFPCRImmediate(MCRegister Reg, unsigned RegBitWidth, const APInt &Value) {
-  MCRegister TempReg = AArch64::X8;
+  MCRegister TempReg = AArch64::X16;
   MCInst LoadImm = MCInstBuilder(AArch64::MOVi64imm).addReg(TempReg).addImm(0);
   MCInst MoveToFPCR =
       MCInstBuilder(AArch64::MSR).addImm(AArch64SysReg::FPCR).addReg(TempReg);
@@ -153,6 +122,89 @@ static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
   return Instructions;
 }
 
+static void generateRegisterStackPush(unsigned int RegToPush,
+                                      std::vector<MCInst> &GeneratedCode,
+                                      int imm = -16) {
+  // STR [X|W]t, [SP, #simm]!: SP is decremented by default 16 bytes
+  //                           before the store to maintain 16-bytes alignment.
+  if (AArch64::GPR64RegClass.contains(RegToPush)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::STRXpre)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPush)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else if (AArch64::GPR32RegClass.contains(RegToPush)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::STRWpre)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPush)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else {
+    llvm_unreachable("Unsupported register class for stack push");
+  }
+}
+
+static void generateRegisterStackPop(unsigned int RegToPopTo,
+                                     std::vector<MCInst> &GeneratedCode,
+                                     int imm = 16) {
+  // LDR Xt, [SP], #simm: SP is incremented by default 16 bytes after the load.
+  if (AArch64::GPR64RegClass.contains(RegToPopTo)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::LDRXpost)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPopTo)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else if (AArch64::GPR32RegClass.contains(RegToPopTo)) {
+    GeneratedCode.push_back(MCInstBuilder(AArch64::LDRWpost)
+                                .addReg(AArch64::SP)
+                                .addReg(RegToPopTo)
+                                .addReg(AArch64::SP)
+                                .addImm(imm));
+  } else {
+    llvm_unreachable("Unsupported register class for stack pop");
+  }
+}
+
+void generateSysCall(long SyscallNumber, std::vector<MCInst> &GeneratedCode) {
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X8, 64, APInt(64, SyscallNumber)));
+  GeneratedCode.push_back(MCInstBuilder(AArch64::SVC).addImm(0));
+}
+
+/// Functions to save/restore system call registers
+#ifdef __linux__
+constexpr std::array<unsigned, 6> SyscallArgumentRegisters{
+    AArch64::X0, AArch64::X1, AArch64::X2,
+    AArch64::X3, AArch64::X4, AArch64::X5,
+};
+
+static void saveSysCallRegisters(std::vector<MCInst> &GeneratedCode,
+                                 unsigned ArgumentCount) {
+  // AArch64 Linux typically uses X0-X5 for the first 6 arguments.
+  // Some syscalls can take up to 8 arguments in X0-X7.
+  assert(ArgumentCount <= 6 &&
+         "This implementation saves up to 6 argument registers (X0-X5)");
+  // generateRegisterStackPush(ArgumentRegisters::TempRegister, GeneratedCode);
+  // Preserve X8 (used for the syscall number/return value).
+  generateRegisterStackPush(AArch64::X8, GeneratedCode);
+  // Preserve the registers used to pass arguments to the system call.
+  for (unsigned I = 0; I < ArgumentCount; ++I) {
+    generateRegisterStackPush(SyscallArgumentRegisters[I], GeneratedCode);
+  }
+}
+
+static void restoreSysCallRegisters(std::vector<MCInst> &GeneratedCode,
+                                    unsigned ArgumentCount) {
+  assert(ArgumentCount <= 6 &&
+         "This implementation restores up to 6 argument registers (X0-X5)");
+  // Restore argument registers, in opposite order of the way they are saved.
+  for (int I = ArgumentCount - 1; I >= 0; --I) {
+    generateRegisterStackPop(SyscallArgumentRegisters[I], GeneratedCode);
+  }
+  generateRegisterStackPop(AArch64::X8, GeneratedCode);
+  // generateRegisterStackPop(ArgumentRegisters::TempRegister, GeneratedCode);
+}
+#endif // __linux__
 #include "AArch64GenExegesis.inc"
 
 namespace {
@@ -162,7 +214,39 @@ class ExegesisAArch64Target : public ExegesisTarget {
   ExegesisAArch64Target()
       : ExegesisTarget(AArch64CpuPfmCounters, AArch64_MC::isOpcodeAvailable) {}
 
+  enum ArgumentRegisters {
+    CodeSize = AArch64::X12,
+    AuxiliaryMemoryFD = AArch64::X13,
+    TempRegister = AArch64::X16,
+  };
+
+  std::vector<MCInst> _generateRegisterStackPop(MCRegister Reg,
+                                                int imm = 0) const override {
+    std::vector<MCInst> Insts;
+    if (AArch64::GPR32RegClass.contains(Reg) || 
+        AArch64::GPR64RegClass.contains(Reg)) {
+      generateRegisterStackPop(Reg, Insts, imm);
+      return Insts;
+    }
+    return {};
+  }
+
 private:
+#ifdef __linux__
+  std::vector<MCInst> generateExitSyscall(unsigned ExitCode) const override;
+  std::vector<MCInst>
+  generateMmap(uintptr_t Address, size_t Length,
+               uintptr_t FileDescriptorAddress) const override;
+  void generateMmapAuxMem(std::vector<MCInst> &GeneratedCode) const override;
+  std::vector<MCInst> generateMemoryInitialSetup() const override;
+  std::vector<MCInst> setStackRegisterToAuxMem() const override;
+  uintptr_t getAuxiliaryMemoryStartAddress() const override;
+  std::vector<MCInst> configurePerfCounter(long Request,
+                                           bool SaveRegisters) const override;
+  std::vector<MCRegister> getArgumentRegisters() const override;
+  std::vector<MCRegister> getRegistersNeedSaving() const override;
+#endif // __linux__
+
   std::vector<MCInst> setRegTo(const MCSubtargetInfo &STI, MCRegister Reg,
                                const APInt &Value) const override {
     if (AArch64::GPR32RegClass.contains(Reg))
@@ -199,37 +283,157 @@ class ExegesisAArch64Target : public ExegesisTarget {
     PM.add(createAArch64ExpandPseudoPass());
   }
 
-  const char *getIgnoredOpcodeReasonOrNull(const LLVMState &State,
-                                           unsigned Opcode) const override {
-    if (const char *Reason =
-            ExegesisTarget::getIgnoredOpcodeReasonOrNull(State, Opcode))
-      return Reason;
+} // namespace
 
-    if (isPointerAuth(Opcode)) {
-#if defined(__aarch64__) && defined(__linux__)
-      // Disable all PAC keys. Note that while we expect the measurements to
-      // be the same with PAC keys disabled, they could potentially be lower
-      // since authentication checks are bypassed.
-      if (prctl(PR_PAC_SET_ENABLED_KEYS,
-                PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY |
-                    PR_PAC_APDBKEY, // all keys
-                0,                  // disable all
-                0, 0) < 0) {
-        return "Failed to disable PAC keys";
-      }
-#else
-      return "Unsupported opcode: isPointerAuth";
-#endif
-    }
+#ifdef __linux__
+// true : let use of fixed address to Virtual Address Space Ceiling
+// false: let kernel choose the address of the auxiliary memory
+bool UseFixedAddress = true;
+
+static constexpr const uintptr_t VAddressSpaceCeiling = 0x0000800000000000;
 
-    if (isLoadTagMultiple(Opcode))
-      return "Unsupported opcode: load tag multiple";
+static void generateRoundToNearestPage(unsigned int TargetRegister,
+                                       std::vector<MCInst> &GeneratedCode) {
+  int PageSizeShift = static_cast<int>(round(log2(getpagesize())));
+  // Round down to the nearest page by getting rid of the least significant bits
+  // representing location in the page.
+
+  // Single instruction using AND with inverted mask (effectively BIC)
+  uint64_t BitsToClearMask = (1ULL << PageSizeShift) - 1; // 0xFFF
+  uint64_t AndMask = ~BitsToClearMask;                    // ...FFFFFFFFFFFF000
+  GeneratedCode.push_back(MCInstBuilder(AArch64::ANDXri)
+                              .addReg(TargetRegister) // Xd
+                              .addReg(TargetRegister) // Xn
+                              .addImm(AndMask)        // imm bitmask
+  );
+}
 
-    return nullptr;
+std::vector<MCInst>
+ExegesisAArch64Target::generateExitSyscall(unsigned ExitCode) const {
+  std::vector<MCInst> ExitCallCode;
+  ExitCallCode.push_back(loadImmediate(AArch64::X0, 64, APInt(64, ExitCode)));
+  generateSysCall(SYS_exit, ExitCallCode); // SYS_exit is 93
+  return ExitCallCode;
+}
+
+std::vector<MCInst>
+ExegesisAArch64Target::generateMmap(uintptr_t Address, size_t Length,
+                                    uintptr_t FileDescriptorAddress) const {
+  // mmap(address, length, prot, flags, fd, offset=0)
+  int flags = MAP_SHARED;
+  if (Address != 0) {
+    flags |= MAP_FIXED_NOREPLACE;
   }
-};
+  std::vector<MCInst> MmapCode;
+  MmapCode.push_back(
+      loadImmediate(AArch64::X0, 64, APInt(64, Address))); // map adr
+  MmapCode.push_back(
+      loadImmediate(AArch64::X1, 64, APInt(64, Length))); // length
+  MmapCode.push_back(loadImmediate(AArch64::X2, 64,
+                                   APInt(64, PROT_READ | PROT_WRITE))); // prot
+  MmapCode.push_back(loadImmediate(AArch64::X3, 64, APInt(64, flags))); // flags
+  // FIXME: File descriptor address is not initialized.
+  // Copy file descriptor location from aux memory into X4
+  MmapCode.push_back(
+      loadImmediate(AArch64::X4, 64, APInt(64, FileDescriptorAddress))); // fd
+  // Dereference file descriptor into FD argument register
+  // MmapCode.push_back(MCInstBuilder(AArch64::LDRWui)
+  //                        .addReg(AArch64::W4)   // Destination register
+  //                        .addReg(AArch64::X4)   // Base register (address)
+  //                        .addImm(0));           // Offset (-byte words)
+  // FIXME: This is not correct.
+  MmapCode.push_back(loadImmediate(AArch64::X5, 64, APInt(64, 0))); // offset
+  generateSysCall(SYS_mmap, MmapCode); // SYS_mmap is 222
+  return MmapCode;
+}
 
-} // namespace
+void ExegesisAArch64Target::generateMmapAuxMem(
+    std::vector<MCInst> &GeneratedCode) const {
+  int fd = -1;
+  int flags = MAP_SHARED;
+  uintptr_t address = getAuxiliaryMemoryStartAddress();
+  if (fd == -1)
+    flags |= MAP_ANONYMOUS;
+  if (address != 0)
+    flags |= MAP_FIXED_NOREPLACE;
+  int prot = PROT_READ | PROT_WRITE;
+
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X0, 64, APInt(64, address))); // map adr
+  GeneratedCode.push_back(loadImmediate(
+      AArch64::X1, 64,
+      APInt(64, SubprocessMemory::AuxiliaryMemorySize))); // length
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X2, 64, APInt(64, prot))); // prot
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X3, 64, APInt(64, flags))); // flags
+  GeneratedCode.push_back(loadImmediate(AArch64::X4, 64, APInt(64, fd))); // fd
+  GeneratedCode.push_back(
+      loadImmediate(AArch64::X5, 64, APInt(64, 0))); // offset
+  generateSysCall(SYS_mmap, GeneratedCode);          // SYS_mmap is 222
+}
+
+std::vector<MCInst> ExegesisAArch64Target::generateMemoryInitialSetup() const {
+  std::vector<MCInst> MemoryInitialSetupCode;
+  generateMmapAuxMem(MemoryInitialSetupCode); // FIXME: Uninit file descriptor
+
+  // If using fixed address for auxiliary memory skip this step,
+  // When using dynamic memory allocation (non-fixed address), we must preserve
+  // the mmap return value (X0) which contains the allocated memory address.
+  // This value is saved to the stack to ensure registers requiring memory
+  // access can retrieve the correct address even if X0 is modified by
+  // intermediate code.
+  generateRegisterStackPush(AArch64::X0, MemoryInitialSetupCode);
+  // FIXME: Ensure stack pointer remains stable to prevent loss of saved address
+  return MemoryInitialSetupCode;
+}
+
+std::vector<MCInst> ExegesisAArch64Target::setStackRegisterToAuxMem() const {
+  std::vector<MCInst> instructions; // NOP
+  // TODO: Implement this, Found no need for this in AArch64.
+  return instructions;
+}
+
+uintptr_t ExegesisAArch64Target::getAuxiliaryMemoryStartAddress() const {
+  if (!UseFixedAddress)
+    // Allow kernel to select an appropriate memory address
+    return 0;
+  // Return the second to last page in the virtual address space
+  // to try and prevent interference with memory annotations in the snippet
+  // VAddressSpaceCeiling = 0x0000800000000000
+  // FIXME: Why 2 pages?
+  return VAddressSpaceCeiling - (2 * getpagesize());
+}
+
+std::vector<MCInst>
+ExegesisAArch64Target::configurePerfCounter(long Request,
+                                            bool SaveRegisters) const {
+  std::vector<MCInst> ConfigurePerfCounterCode; // NOP
+  // FIXME: SYSCALL exits with EBADF error - file descriptor is invalid
+  // No file is opened previosly to add as file descriptor
+  return ConfigurePerfCounterCode;
+}
+
+std::vector<MCRegister> ExegesisAArch64Target::getArgumentRegisters() const {
+  return {AArch64::X0, AArch64::X1};
+}
+
+std::vector<MCRegister> ExegesisAArch64Target::getRegistersNeedSaving() const {
+  return {
+      AArch64::X0,
+      AArch64::X1,
+      AArch64::X2,
+      AArch64::X3,
+      AArch64::X4,
+      AArch64::X5,
+      AArch64::X8,
+      ArgumentRegisters::TempRegister,
+      ArgumentRegisters::CodeSize,
+      ArgumentRegisters::AuxiliaryMemoryFD,
+  };
+}
+
+#endif // __linux__
 
 static ExegesisTarget *getTheExegesisAArch64Target() {
   static ExegesisAArch64Target Target;
diff --git a/llvm/tools/llvm-exegesis/lib/Assembler.cpp b/llvm/tools/llvm-exegesis/lib/Assembler.cpp
index fd7924db08441..a73eaf76a46d7 100644
--- a/llvm/tools/llvm-exegesis/lib/Assembler.cpp
+++ b/llvm/tools/llvm-exegesis/lib/Assembler.cpp
@@ -66,6 +66,8 @@ static bool generateSnippetSetupCode(const ExegesisTarget &ET,
       assert(MM.Address % getpagesize() == 0 &&
              "Memory mappings need to be aligned to page boundaries.");
 #endif
+      // FIXME: file descriptor for aux memory seems not initialized.
+      // TODO: Invoke openat syscall to get correct fd for aux memory
       const MemoryValue &MemVal = Key.MemoryValues.at(MM.MemoryValueName);
       BBF.addInstructions(ET.generateMmap(
           MM.Address, MemVal.SizeBytes,
@@ -78,15 +80,47 @@ static bool generateSnippetSetupCode(const ExegesisTarget &ET,
   Register StackPointerRegister = BBF.MF.getSubtarget()
                                       .getTargetLowering()
                                       ->getStackPointerRegisterToSaveRestore();
+#define DEBUG_TYPE "register-initial-values"
+  // FIXME: Only loading first register with memory address is hacky.
+  bool isFirstRegister = true;
   for (const RegisterValue &RV : Key.RegisterInitialValues) {
+    // Debug: register name and class name and value from BenchmarkKey
+    const MCRegisterInfo *RegInfo = BBF.MF.getTarget().getMCRegisterInfo();
+    const char *RegName = RegInfo->getName(RV.Register);
+    const char *regClassName = "Unknown";
+    for (unsigned i = 0, e = RegInfo->getNumRegClasses(); i < e; ++i) {
+      const MCRegisterClass &RC = RegInfo->getRegClass(i);
+      if (RC.contains(RV.Register)) {
+        regClassName = RegInfo->getRegClassName(&RC);
+        break;
+      }
+    }
+    LLVM_DEBUG(
+        dbgs() << "Setting register (Class: " << regClassName << ") " << RegName
+               << std::string(
+                      std::max(0, 3 - static_cast<int>(strlen(RegName))), ' '));
+
     if (GenerateMemoryInstructions) {
       // If we're generating memory instructions, don't load in the value for
       // the register with the stack pointer as it will be used later to finish
       // the setup.
       if (Register(RV.Register) == StackPointerRegister)
         continue;
+#if defined(__aarch64__)
+      auto StackLoadInsts = ET._generateRegisterStackPop(RV.Register, 16);
+      if (!StackLoadInsts.empty() && isFirstRegister) {
+        for (const auto &Inst : StackLoadInsts)
+          BBF.addInstruction(Inst);
+        isFirstRegister = false;
+        LLVM_DEBUG(dbgs() << "from stack with post-increment offset of " << 16
+                          << " bytes\n");
+        continue;
+      }
+#endif
     }
     // Load a constant in the register.
+    LLVM_DEBUG(dbgs() << " to " << RV.Value << "\n");
+#undef DEBUG_TYPE
     const auto SetRegisterCode = ET.setRegTo(*MSI, RV.Register, RV.Value);
     if (SetRegisterCode.empty())
       IsSnippetSetupComplete = false;
diff --git a/llvm/tools/llvm-exegesis/lib/Target.h b/llvm/tools/llvm-exegesis/lib/Target.h
index 77fbaa6e95412..736c9d9ff6c23 100644
--- a/llvm/tools/llvm-exegesis/lib/Target.h
+++ b/llvm/tools/llvm-exegesis/lib/Target.h
@@ -308,6 +308,10 @@ class ExegesisTarget {
     return std::make_unique<SavedState>();
   }
 
+  virtual st...
[truncated]

Copy link

github-actions bot commented Jun 19, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants