Skip to content

Conversation

@klei22
Copy link
Collaborator

@klei22 klei22 commented Dec 24, 2025

This pull request introduces energy profiling support using ZeusProfiler throughout the training and inference codebase. The main goal is to enable measurement and reporting of energy consumption (in joules) during model inference and sampling, and to surface this information in experiment results and monitoring tools. The changes span configuration, metrics tracking, sampling, and training scripts.

Energy Profiling Integration

  • Added Zeus energy profiling support to sample.py and train.py, including new command-line arguments for enabling Zeus profiling and specifying which devices (CPU/GPU) to profile. Energy consumption during inference is now measured and reported per sample and as an average. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

  • Updated the training workflow to collect and log average inference energy after sampling, including TensorBoard support for tracking this new metric. [1] [2]

Metrics and Monitoring Updates

  • Added avg_joules_inf as a tracked metric in experiment results, monitoring UI, and metrics parsing logic, ensuring energy consumption is visible in experiment summaries and dashboards. [1] [2] [3] [4]

Configuration Improvements

  • Expanded the energy_efficiency_zeus.yaml exploration configuration to include new static and variation groups relevant for energy profiling experiments, such as precision modes and profiling flags.

These changes lay the groundwork for systematic energy efficiency experiments and monitoring, making it easier to evaluate and compare the energy consumption of different model configurations.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive Zeus energy profiling support to measure and track energy consumption during model training and inference. The implementation enables systematic energy efficiency experiments across different model configurations.

Key changes:

  • Introduced a reusable ZeusProfiler wrapper that gracefully handles optional Zeus installation and provides context managers for energy measurement windows
  • Integrated energy profiling into both standalone sampling (sample.py) and training workflows (train.py), measuring per-sample energy and computing averages
  • Extended metrics tracking infrastructure to capture avg_joules_inf alongside existing performance metrics

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
utils/energy_profiling/zeus_profiler.py New module providing ZeusProfiler and ZeusWindow classes for optional energy profiling with clean API and graceful degradation when Zeus is unavailable
utils/energy_profiling/__init__.py Package initialization exposing ZeusProfiler
train_args.py Added command-line arguments for Zeus profiling configuration (enable/disable, GPU/CPU selection, device indices)
train.py Integrated Zeus profiler into training workflow, collecting energy metrics during sampling and logging to TensorBoard and metrics files
sample.py Added energy measurement to sample generation using Zeus context managers, returning per-sample and average energy consumption
run_exploration_monitor.py Added avg_joules_inf column to monitoring UI for experiment tracking
optimization_and_search/run_experiments.py Extended METRIC_KEYS and cast array to include avg_joules_inf metric
explorations/energy_efficiency_zeus.yaml New exploration configuration for energy efficiency experiments with various model configurations and profiling enabled

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant