-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add readme #1
base: main
Are you sure you want to change the base?
Add readme #1
Conversation
## Node Taint Controller(NTC) | ||
- NTC control loop will continuously look for `node conditions` of all nodes and will apply taint based on the `condition` type. | ||
|
||
## Node Auto Healer Operator(NAHO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think taints by NTC should be distinguished from the taints by other processes or humans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default node conditions have key in format node.kubernetes.io/{{condition}}={{value}}:{{effect}}
, for all the other conditions we can use same format either node.stakater.com/{{condition}}={{value}}:{{effect}}
e.g; ref or NTC/{{condition}}={{value}}:{{effect}}
or please suggest any other.
|Ready |True | - | | | ||
| |False | NoExecute | node.kubernetes.io/not-ready | | ||
| |Unknown | NoExecute | node.kubernetes.io/unreachable | | ||
|OutOfDisk |True | NoSchedule | node.kubernetes.io/out-of-disk | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to process NoSchedule
tainted node?
Maybe set the threshold for status duration and drain the node also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a threshold time limit on how long a node can be in NoSchedule
state. And if it exceeds the threshold we can apply NoExecute
taint, which will evict all pods and later the Node will be drained.
Using LastTransitionTime of Node's condition because of which NoSchedule
taint was applied, we can know if there is any change on that specific Node's condition state. If there is no change and CurrentTime
- LastTransitionTime
exceeds threshold time then the Node will get evicted and drained.
|DiskPressure |True | NoSchedule | node.kubernetes.io/disk-pressure | | ||
| |False | - | | | ||
| |Unknown | - | | | ||
|NetworkUnavailable |True | NoSchedule | node.kubernetes.io/network-unavailable | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One node can have not good several conditions at one time.
Maybe need to have new condition types by mix of the basic condition types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have either a ConfigMap
or new resource type as ConditionSet
which could be collection of one or more condtions.
I have mentioned 2 approachs here, please suggest the preferred one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest configmap because the conditions are not objects also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, conditions are not objects but ConditionSet can be an object with conditions, taint effect, taint key etc as peroperties. Or should we rename it to NodeConditionSet.
Please suggest, If ConditionSet
doesn't make sence as an object, I will revert back and use it as a configmap.
@tanalam2411 can you plz resolve the comments which you have taken care of already? |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Updated README.md file with Machine auto healer project's purpose and details.