Add zeus sample energy #701
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces energy profiling support using ZeusProfiler throughout the training and inference codebase. The main goal is to enable measurement and reporting of energy consumption (in joules) during model inference and sampling, and to surface this information in experiment results and monitoring tools. The changes span configuration, metrics tracking, sampling, and training scripts.
Energy Profiling Integration
Added Zeus energy profiling support to
sample.pyandtrain.py, including new command-line arguments for enabling Zeus profiling and specifying which devices (CPU/GPU) to profile. Energy consumption during inference is now measured and reported per sample and as an average. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]Updated the training workflow to collect and log average inference energy after sampling, including TensorBoard support for tracking this new metric. [1] [2]
Metrics and Monitoring Updates
avg_joules_infas a tracked metric in experiment results, monitoring UI, and metrics parsing logic, ensuring energy consumption is visible in experiment summaries and dashboards. [1] [2] [3] [4]Configuration Improvements
energy_efficiency_zeus.yamlexploration configuration to include new static and variation groups relevant for energy profiling experiments, such as precision modes and profiling flags.These changes lay the groundwork for systematic energy efficiency experiments and monitoring, making it easier to evaluate and compare the energy consumption of different model configurations.