-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Questionnaire
-
Does ROCm works for you outside of Julia, e.g. C/C++/Python? Yes
-
Post output of
rocminfo
.
$ rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: NO
*******
Agent 3
*******
Name: gfx908
Uuid: GPU-7f99cc8d20f3c038
Marketing Name: AMD Instinct MI100
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 29580(0x738c)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1502
BDFID: 10496
Internal Node ID: 2
Compute Unit: 120
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 67
SDMA engine uCode:: 18
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 33538048(0x1ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
- Post output of
AMDGPU.versioninfo()
if possible.
# paste the output of `AMDGPU.versioninfo()` here
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬────────────────────────────────────
│ Available │ Name │ Version │ Path ⋯
├───────────┼──────────────────┼───────────┼────────────────────────────────────
│ + │ LLD │ - │ /opt/rocm/llvm/bin/ld.lld ⋯
│ + │ Device Libraries │ - │ /home/wfg/.julia-cousteau/artifac ⋯
│ + │ HIP │ 6.3.42134 │ /opt/rocm/lib/libamdhip64.so ⋯
│ + │ rocBLAS │ 4.3.0 │ /opt/rocm/lib/librocblas.so ⋯
│ + │ rocSOLVER │ 3.27.0 │ /opt/rocm/lib/librocsolver.so ⋯
│ + │ rocSPARSE │ 3.3.0 │ /opt/rocm/lib/librocsparse.so ⋯
│ + │ rocRAND │ 2.10.5 │ /opt/rocm/lib/librocrand.so ⋯
│ + │ rocFFT │ 1.0.31 │ /opt/rocm/lib/librocfft.so ⋯
│ + │ MIOpen │ 3.3.0 │ /opt/rocm/lib/libMIOpen.so ⋯
└───────────┴──────────────────┴───────────┴────────────────────────────────────
1 column omitted
[ Info: AMDGPU devices
┌────┬────────────────────┬────────────────────────┬───────────┬────────────┬───
│ Id │ Name │ GCN arch │ Wavefront │ Memory │ ⋯
├────┼────────────────────┼────────────────────────┼───────────┼────────────┼───
│ 1 │ AMD Instinct MI100 │ gfx908:sramecc+:xnack- │ 64 │ 31.984 GiB │ ⋯
│ 2 │ AMD Instinct MI100 │ gfx908:sramecc+:xnack- │ 64 │ 31.984 GiB │ ⋯
└────┴────────────────────┴────────────────────────┴───────────┴────────────┴───
1 column omitted
Reproducing the bug
- Describe what's not working.
Regression with AMDGPU v1.3.4 on JACC's parallel_reduce, works with AMDGPU v1.3.3 on MI100. CI logs have full information. Need to investigate further inside parallel_reduce, but this is only reproducible with the new version of AMDGPU.
errors are of the kind:
reduce: Test Failed at /home/wfg/github-runners/cousteau-JACC/ci/_work/JACC.jl/JACC.jl/test/unittests.jl:166
Expression: mxd == maximum(ah2)
Evaluated: 3.3822674352068316 == 5.468821633677877
See JACC.jl issue
- Provide MWE to reproduce it (if possible).
Please see above for JACC.jl CI.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working