[Issue]: [Bug] GetSegmentId fails for type 1 (MEMORY) on discrete RX 9060 XT (gfx1200) in WSL2 — VramAvail returns HSA_STATUS_ERROR, PyTorch sees 0 bytes free

### Problem Description

## Environment
- **GPU:** AMD Radeon RX 9060 XT (gfx1200 / RDNA 4 / Navi 44)
- **VRAM:** 16 GB GDDR6
- **OS (Windows):** Windows 10 22H2 (Build 19045)
- **AMD Driver:** Adrenalin Edition 26.2.2
- **WSL2:** Ubuntu 24.04 LTS
- **ROCm:** 7.2.1
- **librocdxg:** built from main branch (v1.1.1)
- **PyTorch:** 2.9.1+rocm7.2.1

## Problem Description
On a discrete RX 9060 XT GPU, `GetSegmentId` consistently fails for segment type `1`
(`D3DKMT_QUERYSTATISTICS_SEGMENT_TYPE_MEMORY`), which is the local/dedicated VRAM segment.
This causes `VramAvail()` to return `HSA_STATUS_ERROR`, which PyTorch interprets as
`0 bytes free`, making all model loading fail with `loaded partially; 0.00 MB usable`.

The GPU is correctly detected by `rocminfo`, `HSA_ENABLE_DXG_DETECTION=1` is set,
`/dev/dxg` is present, and `torch.cuda.is_available()` returns `True`.
However, `torch.cuda.mem_get_info()` returns `(4096, 16987488256)` — only 4 KB free
despite 16 GB total VRAM, making it impossible to load any model.


### Operating System

WSL2 Ubuntu 24.04.4

### CPU

Ryzen 9 3900

### GPU

RX 9060 XT

### ROCm Version

7.2.1.70201-81~24.04

### ROCm Component

_No response_

### Steps to Reproduce

## Reproduction Steps
1. Install ROCm 7.2.1 on WSL2 Ubuntu 24.04 with `amdgpu-install --usecase=rocm --no-dkms`
2. Build and install librocdxg from source with Windows SDK path
3. Set `HSA_ENABLE_DXG_DETECTION=1`
4. Run `rocminfo` — GPU is detected correctly (Agent 2, gfx1200)
5. Run `python -c "import torch; print(torch.cuda.mem_get_info())"` → `(4096, 16987488256)`
6. Try loading any model in PyTorch or ComfyUI → `loaded partially; 0.00 MB usable, 0.00 MB loaded`

## Debug Output
Running with `HSAKMT_DEBUG_LEVEL=7`:

### (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

WSL environment detected.
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.18
Runtime Ext Version:     1.15
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 3900 12-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 3900 12-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            8                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    28739804(0x1b688dc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    28739804(0x1b688dc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    28739804(0x1b688dc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    28739804(0x1b688dc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1200                            
  Uuid:                    GPU-aa3d0bc8364af119               
  Marketing Name:          AMD Radeon RX 9060 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L3:                      32768(0x8000) KB                   
  Chip ID:                 30096(0x7590)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2700                               
  BDFID:                   3072                               
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        2147483647(0x7fffffff)             
    y                        65535(0xffff)                      
    z                        65535(0xffff)                      
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 108                                
  SDMA engine uCode::      0                                  
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16589344(0xfd2220) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-am

### Additional Information

[QuerySegmentInfo] Total Segments: 4
[GetSegmentId] Failed to get segment id for type 1

The GPU has 4 segments, but none of them matches
`D3DKMT_QUERYSTATISTICS_SEGMENT_TYPE_MEMORY` (type 1).
The `GetSegmentId` call in `VramAvail()` therefore always fails and returns
`HSA_STATUS_ERROR`.

`rocminfo` correctly reports 16 GB VRAM in Pool Info:

Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16589344(0xfd2220) KB
Allocatable: TRUE

---

## Root Cause Analysis

The issue is in `WDDMDevice::VramAvail()` in `src/wddm/device.cpp`:

```cpp
if (!GetSegmentId(D3DKMT_QUERYSTATISTICS_SEGMENT_TYPE_MEMORY, segmentId))
    return HSA_STATUS_ERROR;
```

`GetSegmentId` iterates over `segment_infos_` (populated by `QuerySegmentInfo`)
and looks for a segment whose `segment_type` matches
`D3DKMT_QUERYSTATISTICS_SEGMENT_TYPE_MEMORY`. However, on the RX 9060 XT (gfx1200),
none of the 4 reported segments has this type — suggesting that `ParseAdapterInfo`
(in the closed-source `libthunk_proxy.a`) is not correctly classifying the dedicated
VRAM segment as type `MEMORY` for this discrete RDNA 4 GPU in WSL2.

As a result:
- `VramAvail()` always returns `HSA_STATUS_ERROR`
- PyTorch's `mem_get_info()` sees only 4 KB free (near-zero sentinel value)
- All model inference fails — GPU compute works fine but memory allocation does not

---

## Additional Notes

- Small allocations work: `torch.zeros(100).cuda()` succeeds
- Large allocations fail silently (process hangs with no output)
- `IsDgpu()` returns `true` correctly (the GPU is recognized as discrete)
- `LocalHeapSize()` returns the correct 16 GB value (used by `rocminfo`)
- The problem is isolated to the segment type classification in `ParseAdapterInfo`
  inside `libthunk_proxy.a`, which cannot be patched without access to its source

This issue appears specific to the **discrete** RX 9060 XT (gfx1200) in WSL2.
APU-based setups (Strix Halo, etc.) seem to have a different but related memory
mapping issue (#6022 on ROCm/ROCm).

---

## Expected Behavior

`torch.cuda.mem_get_info()` should return approximately `(16_000_000_000, 16_987_488_256)`,
and model loading should use GPU VRAM as on native Linux or Windows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: [Bug] GetSegmentId fails for type 1 (MEMORY) on discrete RX 9060 XT (gfx1200) in WSL2 — VramAvail returns HSA_STATUS_ERROR, PyTorch sees 0 bytes free #22

Problem Description

Environment

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

Reproduction Steps

Debug Output

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

WSL environment detected.

HSA System Attributes

==========
HSA Agents

Additional Information

Root Cause Analysis

Additional Notes

Expected Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Issue]: [Bug] GetSegmentId fails for type 1 (MEMORY) on discrete RX 9060 XT (gfx1200) in WSL2 — VramAvail returns HSA_STATUS_ERROR, PyTorch sees 0 bytes free #22

Description

Problem Description

Environment

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

Reproduction Steps

Debug Output

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

WSL environment detected.

HSA System Attributes

========== HSA Agents

Additional Information

Root Cause Analysis

Additional Notes

Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

==========
HSA Agents