|
| 1 | +# EP-002: MSR Fallback for CPU Power Meter |
| 2 | + |
| 3 | +* **Status**: Draft |
| 4 | +* **Author**: Sunil Thaha |
| 5 | +* **Created**: 2025-08-13 |
| 6 | + |
| 7 | +## Problem |
| 8 | + |
| 9 | +Kepler fails when `/sys/class/powercap/intel-rapl` is unavailable (disabled |
| 10 | +kernels, restricted containers). Users have reported scenarios where MSR |
| 11 | +access is available but powercap is disabled or inaccessible (see issue [#2262](https://github.com/sustainable-computing-io/kepler/issues/2262)). |
| 12 | + |
| 13 | +## Goals |
| 14 | + |
| 15 | +* Implement MSR-based RAPL reading as automatic fallback, when opted-in/configured, when powercap is |
| 16 | + unavailable |
| 17 | +* Maintain existing CPUPowerMeter interface compatibility |
| 18 | +* Provide configurable control over fallback behavior for security-conscious |
| 19 | + deployments |
| 20 | + |
| 21 | +## Solution |
| 22 | + |
| 23 | +Add MSR as fallback to read RAPL energy counters directly from CPU registers. This solution will support multi-socket CPUs by reading energy values from each CPU/socket. |
| 24 | + |
| 25 | +## Proposed Solution |
| 26 | + |
| 27 | +```mermaid |
| 28 | +graph TB |
| 29 | + CPUPowerMeter[CPUPowerMeter Interface] |
| 30 | + raplPowerMeter[raplPowerMeter<br>Enhanced with MSR] |
| 31 | + powercapReader[powercapReader<br>Primary] |
| 32 | + msrReader[msrReader<br>Fallback] |
| 33 | + powercap[/sys/class/<br>powercap/] |
| 34 | + msrdev["/dev/cpu/*/msr"] |
| 35 | +
|
| 36 | + CPUPowerMeter --> raplPowerMeter |
| 37 | + raplPowerMeter --> powercapReader |
| 38 | + raplPowerMeter --> msrReader |
| 39 | + powercapReader --> powercap |
| 40 | + msrReader --> msrdev |
| 41 | +
|
| 42 | + style CPUPowerMeter fill:#1e88e5,color:#fff |
| 43 | + style raplPowerMeter fill:#43a047,color:#fff |
| 44 | + style powercapReader fill:#fb8c00,color:#fff |
| 45 | + style msrReader fill:#e53935,color:#fff |
| 46 | + style powercap fill:#fdd835,color:#000 |
| 47 | + style msrdev fill:#8e24aa,color:#fff |
| 48 | +``` |
| 49 | + |
| 50 | +## Implementation |
| 51 | + |
| 52 | +### 1. Create abstraction |
| 53 | + |
| 54 | +```go |
| 55 | +type raplReader interface { |
| 56 | + // Zones returns the list of energy zones available from this reader |
| 57 | + Zones() ([]EnergyZone, error) |
| 58 | + |
| 59 | + // Available checks if the reader can be used on the current system |
| 60 | + Available() bool |
| 61 | + |
| 62 | + // Init initializes the reader and verifies it can read energy values |
| 63 | + Init() error |
| 64 | + |
| 65 | + // Close releases any resources held by the reader |
| 66 | + Close() error |
| 67 | + |
| 68 | + // Name returns a human-readable name for the reader implementation |
| 69 | + Name() string |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +### 2. Refactor existing code |
| 74 | + |
| 75 | +* Extract current powercap logic → `powercapReader` |
| 76 | +* Add `msrReader` for `/dev/cpu/*/msr` access |
| 77 | +* Auto-detect in `raplPowerMeter.Init()` with configurable fallback behavior |
| 78 | + |
| 79 | +### 3. MSR registers |
| 80 | + |
| 81 | +* UNIT: 0x606 (IA32_RAPL_POWER_UNIT - contains energy unit scaling factor) |
| 82 | +* PKG: 0x611 (MSR_PKG_ENERGY_STATUS - package energy counter) |
| 83 | +* PP0: 0x639 (MSR_PP0_ENERGY_STATUS - core/Power Plane 0 energy counter) |
| 84 | +* DRAM: 0x619 (MSR_DRAM_ENERGY_STATUS - memory energy counter) |
| 85 | +* UNCORE: 0x641 (MSR_PP1_ENERGY_STATUS - uncore/Power Plane 1 energy counter) |
| 86 | + |
| 87 | +Note: Energy counters are 32-bit values that wrap around at ~4.29 billion units. |
| 88 | +The energy unit from register 0x606 (bits 12:8) is used to convert raw counter |
| 89 | +values to microjoules. |
| 90 | + |
| 91 | +## Configuration |
| 92 | + |
| 93 | +```yaml |
| 94 | +device: |
| 95 | + msr: |
| 96 | + enabled: false # opt-in due to security (defaults to false) |
| 97 | + force: false # force MSR even if powercap available (testing only) |
| 98 | + devicePath: /dev/cpu # MSR base device path (mounted as host/dev/cpu in containers) |
| 99 | +``` |
| 100 | +
|
| 101 | +CLI flags will not be exposed for MSR settings to avoid accidental enabling |
| 102 | +of this security-sensitive feature. |
| 103 | +
|
| 104 | +## Security |
| 105 | +
|
| 106 | +MSR access enables PLATYPUS attacks (CVE-2020-8694/8695). Will be disabled by |
| 107 | +default and require CAP_SYS_RAWIO capability. A security warning will be logged |
| 108 | +when MSR fallback is activated. |
| 109 | +
|
| 110 | +## Testing |
| 111 | +
|
| 112 | +* Mock MSR files for unit testing |
| 113 | +* Integration tests for fallback behavior |
| 114 | +* Maintain existing test coverage |
| 115 | +
|
| 116 | +## Metrics |
| 117 | +
|
| 118 | +```prometheus |
| 119 | +# New metric indicating active power meter backend |
| 120 | +kepler_node_cpu_power_meter{source="rapl-powercap|rapl-msr"} 1 |
| 121 | + |
| 122 | +# Existing metrics remain unchanged and won't have "source" label |
| 123 | +kepler_node_cpu_joules_total{zone="package|core|dram"} |
| 124 | +kepler_node_cpu_watts{zone="package|core|dram"} |
| 125 | +kepler_node_cpu_active_joules_total{zone="package|core|dram"} |
| 126 | +kepler_node_cpu_idle_joules_total{zone="package|core|dram"} |
| 127 | +``` |
| 128 | + |
| 129 | +## Compatibility |
| 130 | + |
| 131 | +* No API changes |
| 132 | +* Powercap remains primary |
| 133 | +* Existing deployments unaffected |
0 commit comments