Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Allow unaligned VGPR for ds_read_b96_tr_b6 #125169

Merged

Conversation

pravinjagtap
Copy link
Contributor

@pravinjagtap pravinjagtap commented Jan 31, 2025

All load transpose instructions follow gfx950 standard of even aligned VGPR except ds_read_b96_tr_b6, which allows unaligned VGPR.

Co-authored-by: Sirish Pande [email protected]

@llvmbot llvmbot added backend:AMDGPU mc Machine (object) code labels Jan 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 31, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-amdgpu

Author: Pravin Jagtap (pravinjagtap)

Changes

All load transpose instructions follow gfx950 standard of even aligned VGPR except ds_read_b96_tr_b6, which allows unaligned VGPR.


Full diff: https://github.com/llvm/llvm-project/pull/125169.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+4-1)
  • (modified) llvm/test/MC/AMDGPU/gfx950-unsupported.s (-4)
  • (modified) llvm/test/MC/AMDGPU/gfx950_asm_read_tr.s (+8)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_ds_read_tr.txt (+6)
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index d8f441d1ccfe44..ed922245b3e2ba 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -4853,7 +4853,10 @@ bool AMDGPUAsmParser::validateAGPRLdSt(const MCInst &Inst) const {
 
 bool AMDGPUAsmParser::validateVGPRAlign(const MCInst &Inst) const {
   auto FB = getFeatureBits();
-  if (!FB[AMDGPU::FeatureGFX90AInsts])
+  unsigned Opc = Inst.getOpcode();
+  // DS_READ_B96_TR_B6 is the only DS instruction in GFX950, that allows
+  // unaligned VGPR. All others only allow even aligned VGPRs.
+  if (!(FB[AMDGPU::FeatureGFX90AInsts]) || Opc == AMDGPU::DS_READ_B96_TR_B6_vi)
     return true;
 
   const MCRegisterInfo *MRI = getMRI();
diff --git a/llvm/test/MC/AMDGPU/gfx950-unsupported.s b/llvm/test/MC/AMDGPU/gfx950-unsupported.s
index 225784177ae185..8bdab2da2394cc 100644
--- a/llvm/test/MC/AMDGPU/gfx950-unsupported.s
+++ b/llvm/test/MC/AMDGPU/gfx950-unsupported.s
@@ -239,10 +239,6 @@ ds_read_b64_tr_b16 v[2:3], v2 offset:-64
 //===----------------------------------------------------------------------===//
 // ds_read_b96_tr_b6
 //===----------------------------------------------------------------------===//
-ds_read_b96_tr_b6 v[1:3], v0
-// ERR: :[[@LINE-1]]:{{[0-9]+}}: error: invalid register class: vgpr tuples must be 64 bit aligned
-// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: instruction not supported on this GPU
-
 ds_read_b96_tr_b6 v1, v0
 // ERR: :[[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
 // W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: instruction not supported on this GPU
diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_read_tr.s b/llvm/test/MC/AMDGPU/gfx950_asm_read_tr.s
index a6907caafcbb61..b9ceb143ecf8e8 100644
--- a/llvm/test/MC/AMDGPU/gfx950_asm_read_tr.s
+++ b/llvm/test/MC/AMDGPU/gfx950_asm_read_tr.s
@@ -32,3 +32,11 @@ ds_read_b96_tr_b6 v[0:2], v0
 ds_read_b96_tr_b6 v[2:4], v2 offset:64
 // GFX940-ERR: [[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
 // GFX950: encoding: [0x40,0x00,0xc2,0xd9,0x02,0x00,0x00,0x02]
+
+ds_read_b96_tr_b6 v[1:3], v0
+// GFX940-ERR: [[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
+// GFX950: encoding: [0x00,0x00,0xc2,0xd9,0x00,0x00,0x00,0x01]
+
+ds_read_b96_tr_b6 v[1:3], v2 offset:64
+// GFX940-ERR: [[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
+// GFX950: encoding: [0x40,0x00,0xc2,0xd9,0x02,0x00,0x00,0x01]
\ No newline at end of file
diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_ds_read_tr.txt b/llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_ds_read_tr.txt
index 10310f7ad1f3de..0b5dba24612ca5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_ds_read_tr.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_ds_read_tr.txt
@@ -35,3 +35,9 @@
 
 # GFX950: ds_read_b96_tr_b6 v[2:4], v2 offset:64  ; encoding: [0x40,0x00,0xc2,0xd9,0x02,0x00,0x00,0x02]
 0x40,0x00,0xc2,0xd9,0x02,0x00,0x00,0x02
+
+# GFX950: ds_read_b96_tr_b6 v[1:3], v0            ; encoding: [0x00,0x00,0xc2,0xd9,0x00,0x00,0x00,0x01]
+0x00,0x00,0xc2,0xd9,0x00,0x00,0x00,0x01
+
+# GFX950: ds_read_b96_tr_b6 v[1:3], v2 offset:64  ; encoding: [0x40,0x00,0xc2,0xd9,0x02,0x00,0x00,0x01]
+0x40,0x00,0xc2,0xd9,0x02,0x00,0x00,0x01

All load transpose instructions follow gfx950 standard of
even aligned VGPR except ds_read_b96_tr_b6, which allows
unaligned VGPR.
@pravinjagtap pravinjagtap force-pushed the pravinjagtap/vgpr-align-requirements branch from 035fa4f to 6c10533 Compare January 31, 2025 05:09
@pravinjagtap pravinjagtap requested a review from cdevadas January 31, 2025 06:41
@pravinjagtap pravinjagtap merged commit e6d16f9 into llvm:main Jan 31, 2025
8 checks passed
unsigned Opc = Inst.getOpcode();
// DS_READ_B96_TR_B6 is the only DS instruction in GFX950, that allows
// unaligned VGPR. All others only allow even aligned VGPRs.
if (!(FB[AMDGPU::FeatureGFX90AInsts]) || Opc == AMDGPU::DS_READ_B96_TR_B6_vi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra parentheses. Also we probably should not directly special case the opcode here. This should come from the register class constraint of the operand

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 10, 2025
…vm#400)

All load transpose instructions follow gfx950 standard of even aligned
VGPR except ds_read_b96_tr_b6, which allows unaligned VGPR.

Co-authored-by: Sirish Pande
[[email protected]](mailto:[email protected])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants