Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRS improvements #10397

Open
DaanHoogland opened this issue Feb 14, 2025 · 4 comments
Open

DRS improvements #10397

DaanHoogland opened this issue Feb 14, 2025 · 4 comments

Comments

@DaanHoogland
Copy link
Contributor

DaanHoogland commented Feb 14, 2025

The required feature described as a wish

As a Operator I would like to have the loads on my systems more evenly/centrally distributed. At the moment there is a simple DRS for clusterwide distribution of loads, this is however not applying zone wide distribution or based on automated queries/improvements.

In addition we should add historic data for the VM in planning possible migrations.
At the moment allocated metrics are used. An first improvement would be to use actual metrics.

@DaanHoogland DaanHoogland changed the title [SHORT FUNCTIONAL DESCRIPTION] DRS improvements Feb 14, 2025
@iishitahere
Copy link

Thanks, @DaanHoogland, for opening this issue and outlining the scope!

I agree that focusing on actual metrics instead of allocated metrics is a solid first step. It will provide a more accurate understanding of system load and help improve DRS efficiency.

I'll start working on refining this further and share updates soon. Also, incorporating historical data for VM migration planning sounds like a great enhancement—I'll explore potential approaches for this.

Looking forward to your feedback as I progress!

@DaanHoogland DaanHoogland added this to the unplanned milestone Feb 20, 2025
@iishitahere
Copy link

Hi @DaanHoogland,

I have reviewed the current DRS implementation and would like to propose a few improvements based on our discussions:

1️⃣ Enhanced Load Distribution

Implement zone-wide DRS balancing, ensuring that loads are distributed not just within a cluster but across multiple zones when necessary.
Introduce automated distribution policies based on real-time system load, avoiding static allocations.
Consider predictive balancing using historical resource consumption patterns.
2️⃣ Historic Data for VM Migration Planning

Instead of using only allocated metrics, we can introduce time-based usage trends (e.g., CPU/memory/disk utilization over the last X hours/days).
This allows DRS to make informed migration decisions based on actual workload fluctuations.
Possible implementation:
Store historical resource usage in a lightweight time-series database (e.g., Prometheus).
Apply machine learning or heuristic-based models to predict VM load spikes.
3️⃣ Metrics-Driven Decision Making

Move from allocated resource metrics to real usage statistics for migration and load balancing.
Use real-time monitoring tools (e.g., Grafana, Prometheus, or CloudStack metrics) to track actual VM resource consumption.
Implement a threshold-based alert system to trigger proactive migrations before overload situations arise.

Thanks and regards,
Ishita Jaiswal

@DaanHoogland
Copy link
Contributor Author

@iishitahere , I discussed with a colleague and we think this will be a six month project. This is fine in it self. But to reduce the risk xould you define phases, here? Or do you think creating sub issues for the three items you mentioned is a good idea? These can be addressed as separate projects don't you think?

@iishitahere
Copy link

@DaanHoogland, that makes sense. To manage risk effectively, I propose breaking this into phases:

1️⃣ Phase 1 – Metrics-Driven Decision Making (real-time tracking & threshold-based alerts).
2️⃣ Phase 2 – Historic Data Integration (trend analysis & predictive balancing).
3️⃣ Phase 3 – Enhanced Load Distribution (zone-wide balancing & automated policies).

Alternatively, we can create sub-issues for each, treating them as separate projects. Let me know your preference.

Best,
Ishita Jaiswal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants