Skip to content

Conversation

@slabasan
Copy link
Collaborator

Tuo CPU

$ ./examples/variorum-print-power-example 
_AMDPOWER Host Socket Power_W Timestamp_sec
_AMDPOWER tuolumne1001 0 71.620000 0.000004
_AMDPOWER tuolumne1001 1 127.140000 0.000373
_AMDPOWER tuolumne1001 2 101.421000 0.000601
_AMDPOWER tuolumne1001 3 123.770000 0.002773
Final result: inf
_AMDPOWER tuolumne1001 0 71.595000 0.182775
_AMDPOWER tuolumne1001 1 127.059000 0.183002
_AMDPOWER tuolumne1001 2 121.979000 0.183122
_AMDPOWER tuolumne1001 3 124.778000 0.183241

Tuo CPU+GPU

$ ./examples/variorum-print-power-example 
_AMDPOWER Host Socket Power_W Timestamp_sec
_AMDPOWER tuolumne1001 0 114.257000 0.000007
_AMDPOWER tuolumne1001 1 131.400000 0.000150
_AMDPOWER tuolumne1001 2 72.747000 0.000479
_AMDPOWER tuolumne1001 3 123.682000 0.000908
_AMD_GPU_POWER_USAGE Host Socket DeviceID Power Timestamp_sec
_AMD_GPU_POWER_USAGE tuolumne1001 0 0 118.00 0.000007
_AMD_GPU_POWER_USAGE tuolumne1001 1 1 132.00 0.036275
_AMD_GPU_POWER_USAGE tuolumne1001 2 2 72.00 0.072830
_AMD_GPU_POWER_USAGE tuolumne1001 3 3 126.00 0.132013
Final result: inf
_AMDPOWER tuolumne1001 0 123.609000 0.338298
_AMDPOWER tuolumne1001 1 128.384000 0.338421
_AMDPOWER tuolumne1001 2 117.978000 0.338852
_AMDPOWER tuolumne1001 3 125.189000 0.338971
_AMD_GPU_POWER_USAGE tuolumne1001 0 0 115.00 0.346322
_AMD_GPU_POWER_USAGE tuolumne1001 1 1 130.00 -0.618076
_AMD_GPU_POWER_USAGE tuolumne1001 2 2 99.00 -0.582453
_AMD_GPU_POWER_USAGE tuolumne1001 3 3 126.00 -0.546474

@tpatki
Copy link
Member

tpatki commented Oct 29, 2025

@slabasan and I are in the process of testing ESMI+HSMP to obtain CCD-only (CPU) power on the APU.

We noticed that the existing API esmi_socket_power_get was giving us APU-level power, which is the same value that we get out of rocm-smi or amd-smi.

There's a new API esmi_read_ccd_power, which is per-core as opposed to per-socket, and it seems like this would give us the right CPU-level values we're looking for. We need to change the socket-level API to core-level and print accordingly, or report an aggregated value across the three CCDs, esp for the JSON interface for variorum_get_power_json. More updates here as we learn more about the new ESMI APIs.

(This PR will currently break the Tioga CPU build, which we will fix once we know how to handle the CCD API)

- needs testing with a built hsmp driver
- new host config points at driver that didn't build successfully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants