Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Alleviate optimistic concurrent scheduling conflict rates. #50

Open
NickrenREN opened this issue May 21, 2024 · 4 comments

Comments

@NickrenREN
Copy link
Collaborator

For now, if we have several scheduler components, they will work concurrently. In some scenarios, the probability of conflicts can be relatively high, such as: high deployment water level, batch scheduling...
We need to optimize godel scheduler to alleviate the optimistic concurrency conflicts

@binacs
Copy link
Member

binacs commented Jun 3, 2024

1. Description

Godel Scheduler (https://github.com/kubewharf/godel-scheduler) is a distributed parallel scheduler built on shared state architecture and optimistic concurrency ideas. When multiple Scheduler shards work in parallel, each Scheduler shard has a complete view of cluster resources.

It is obvious that the scheduling decisions made by different Scheduler shards at a certain moment may conflict with each other (resource conflicts of a single node / topological domain affinity conflicts / etc.). This led to the introduction of a centralized Binder to resolve conflicts through a serial verification process. When the number of scheduler shards increases / the cluster resource level is extremely high / the number of Unit Pods is extremely large, the probability of conflict will increase significantly, resulting in a large number of invalid scheduling attempts and even ping-pong between components.

Previously, we introduced the Node Partition mechanism to reduce conflicts by constraining the resource perspectives of different shard schedulers (prioritizing scheduling in In Partition Nodes), but this will bring a certain degree of scheduling quality loss. We look forward to exploring other ways to better handle such conflict situations.

This problem requires a deep understanding of the multi-sharding architecture of the scheduler to further alleviate the probability of scheduler conflicts in various scenarios. Ultimately, it will improve the operating efficiency of the entire system.

2. Tasks

  • Conceive a design plan that has both theoretical support and implementation feasibility as a Doc

  • Based on the design plan, complete the code implementation and achieve practical results

3. Skill requirements and programming languages

  • Familiar with Golang programming

  • Familiar with the working mechanism of the Godel Scheduler components

  • Have a passion for open source and continue to participate in subsequent iterations of this topic

  • Preference: Innovative thinking

4. Expected results

Complete corresponding solution design and code implementation


1. 题目描述

Godel Scheduler (https://github.com/kubewharf/godel-scheduler) 是基于共享状态架构与乐观并发思想构建的分布式并行调度器。当有多个 Scheduler 分片并行工作时,每一个 Scheduler 分片都拥有完整的集群资源视角。

显而易见的是,不同的 Scheduler 分片在某一时刻作出的调度决策可能是彼此冲突的 (单个节点的资源冲突 / 拓扑域亲和性冲突 / etc.)。由此引入了中心化的 Binder,通过串行校验过程来解决冲突。当 调度器分片数增加 / 集群资源水位极高 / Unit Pods 数量极大 的情况下,冲突的概率都将显著增长,并由此带来大量无效的调度尝试甚至产生组件间的 ping-pong。

此前,我们引入了 Node Partition 机制以通过约束不同分片调度器资源视角的方式来降低冲突 (优先在 In Partition Nodes 中调度),但这会带来一定程度的调度质量损耗。我们期望通过探索其他方式更好地处理此类冲突情况。

本题目需要基于对调度器多分片架构深入理解的基础上,进一步缓解各类场景下的调度器冲突概率。最终提升整个系统的运行效率。

2. 编码任务

  • 构思理论支持与实施可行性兼备的设计方案,并形成文档

  • 基于设计方案,完成代码实现并取得实际效果

3. 技能要求和编程语言

  • 熟悉 Golang 编程

  • 熟悉 Godel Scheduler 组件工作机制

  • 对开源拥有热情,持续参与本题目后续迭代

  • 偏好:创新思维

4. 预期完成结果

完成相应方案设计与代码实现

@ipsum-0320
Copy link

hello, i want to have a try @binacs

@ipsum-0320
Copy link

If you are willing to tell me more information related to this project, I would be extremely grateful, or is there an online meeting for communication on this project?@binacs @NickrenREN

@NickrenREN
Copy link
Collaborator Author

@ipsum-0320 thanks for you interest, actually, this will be a task of 2024 GLCC, you may go to https://www.gitlink.org.cn/glcc for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants