DRS improvements #10397

DaanHoogland · 2025-02-14T10:36:41Z

The required feature described as a wish

As a Operator I would like to have the loads on my systems more evenly/centrally distributed. At the moment there is a simple DRS for clusterwide distribution of loads, this is however not applying zone wide distribution or based on automated queries/improvements.

In addition we should add historic data for the VM in planning possible migrations.
At the moment allocated metrics are used. An first improvement would be to use actual metrics.

iishitahere · 2025-02-14T15:49:46Z

Thanks, @DaanHoogland, for opening this issue and outlining the scope!

I agree that focusing on actual metrics instead of allocated metrics is a solid first step. It will provide a more accurate understanding of system load and help improve DRS efficiency.

I'll start working on refining this further and share updates soon. Also, incorporating historical data for VM migration planning sounds like a great enhancement—I'll explore potential approaches for this.

Looking forward to your feedback as I progress!

iishitahere · 2025-02-20T16:06:00Z

Hi @DaanHoogland,

I have reviewed the current DRS implementation and would like to propose a few improvements based on our discussions:

1️⃣ Enhanced Load Distribution

Implement zone-wide DRS balancing, ensuring that loads are distributed not just within a cluster but across multiple zones when necessary.
Introduce automated distribution policies based on real-time system load, avoiding static allocations.
Consider predictive balancing using historical resource consumption patterns.
2️⃣ Historic Data for VM Migration Planning

Instead of using only allocated metrics, we can introduce time-based usage trends (e.g., CPU/memory/disk utilization over the last X hours/days).
This allows DRS to make informed migration decisions based on actual workload fluctuations.
Possible implementation:
Store historical resource usage in a lightweight time-series database (e.g., Prometheus).
Apply machine learning or heuristic-based models to predict VM load spikes.
3️⃣ Metrics-Driven Decision Making

Move from allocated resource metrics to real usage statistics for migration and load balancing.
Use real-time monitoring tools (e.g., Grafana, Prometheus, or CloudStack metrics) to track actual VM resource consumption.
Implement a threshold-based alert system to trigger proactive migrations before overload situations arise.

Thanks and regards,
Ishita Jaiswal

DaanHoogland · 2025-02-21T13:00:28Z

@iishitahere , I discussed with a colleague and we think this will be a six month project. This is fine in it self. But to reduce the risk xould you define phases, here? Or do you think creating sub issues for the three items you mentioned is a good idea? These can be addressed as separate projects don't you think?

iishitahere · 2025-02-21T16:54:30Z

@DaanHoogland, that makes sense. To manage risk effectively, I propose breaking this into phases:

1️⃣ Phase 1 – Metrics-Driven Decision Making (real-time tracking & threshold-based alerts).
2️⃣ Phase 2 – Historic Data Integration (trend analysis & predictive balancing).
3️⃣ Phase 3 – Enhanced Load Distribution (zone-wide balancing & automated policies).

Alternatively, we can create sub-issues for each, treating them as separate projects. Let me know your preference.

Best,
Ishita Jaiswal

DaanHoogland added the gsoc label Feb 14, 2025

DaanHoogland changed the title ~~[SHORT FUNCTIONAL DESCRIPTION]~~ DRS improvements Feb 14, 2025

DaanHoogland added the gsoc2025 label Feb 17, 2025

DaanHoogland added this to the unplanned milestone Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRS improvements #10397

DRS improvements #10397

DaanHoogland commented Feb 14, 2025 •

edited

Loading

iishitahere commented Feb 14, 2025

iishitahere commented Feb 20, 2025

DaanHoogland commented Feb 21, 2025

iishitahere commented Feb 21, 2025

DRS improvements #10397

DRS improvements #10397

Comments

DaanHoogland commented Feb 14, 2025 • edited Loading

The required feature described as a wish

iishitahere commented Feb 14, 2025

iishitahere commented Feb 20, 2025

DaanHoogland commented Feb 21, 2025

iishitahere commented Feb 21, 2025

DaanHoogland commented Feb 14, 2025 •

edited

Loading