Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readme #1

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Add readme #1

wants to merge 9 commits into from

Conversation

tanalam2411
Copy link
Collaborator

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Updated README.md file with Machine auto healer project's purpose and details.

## Node Taint Controller(NTC)
- NTC control loop will continuously look for `node conditions` of all nodes and will apply taint based on the `condition` type.

## Node Auto Healer Operator(NAHO)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think taints by NTC should be distinguished from the taints by other processes or humans.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default node conditions have key in format node.kubernetes.io/{{condition}}={{value}}:{{effect}}, for all the other conditions we can use same format either node.stakater.com/{{condition}}={{value}}:{{effect}} e.g; ref or NTC/{{condition}}={{value}}:{{effect}} or please suggest any other.

README.md Show resolved Hide resolved
|Ready |True | - | |
| |False | NoExecute | node.kubernetes.io/not-ready |
| |Unknown | NoExecute | node.kubernetes.io/unreachable |
|OutOfDisk |True | NoSchedule | node.kubernetes.io/out-of-disk |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to process NoSchedule tainted node?
Maybe set the threshold for status duration and drain the node also?

Copy link
Collaborator Author

@tanalam2411 tanalam2411 Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a threshold time limit on how long a node can be in NoSchedule state. And if it exceeds the threshold we can apply NoExecute taint, which will evict all pods and later the Node will be drained.
Using LastTransitionTime of Node's condition because of which NoSchedule taint was applied, we can know if there is any change on that specific Node's condition state. If there is no change and CurrentTime - LastTransitionTime exceeds threshold time then the Node will get evicted and drained.

|DiskPressure |True | NoSchedule | node.kubernetes.io/disk-pressure |
| |False | - | |
| |Unknown | - | |
|NetworkUnavailable |True | NoSchedule | node.kubernetes.io/network-unavailable |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One node can have not good several conditions at one time.
Maybe need to have new condition types by mix of the basic condition types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have either a ConfigMap or new resource type as ConditionSet which could be collection of one or more condtions.
I have mentioned 2 approachs here, please suggest the preferred one

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest configmap because the conditions are not objects also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, conditions are not objects but ConditionSet can be an object with conditions, taint effect, taint key etc as peroperties. Or should we rename it to NodeConditionSet.
Please suggest, If ConditionSet doesn't make sence as an object, I will revert back and use it as a configmap.

@rasheedamir
Copy link
Member

@tanalam2411 can you plz resolve the comments which you have taken care of already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants