Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling kernel monitor for kind #362

Closed
mitar opened this issue Mar 7, 2019 · 6 comments
Closed

Enabling kernel monitor for kind #362

mitar opened this issue Mar 7, 2019 · 6 comments

Comments

@mitar
Copy link
Contributor

mitar commented Mar 7, 2019

I am looking into how to enable kernel monitor for kind cluster so that I could get and log events from kernel. I suspect this could help debug stability issues. But I have problems finding a standard way to do this?

https://kubernetes.io/docs/tasks/debug-application-cluster/monitor-node-health/#kernel-monitor

@BenTheElder
Copy link
Member

We don't run NPD at all. I'm not sure that this actually makes sense given the shared kernel. You'd basically just want to run the NPD on the host.

@neolit123
Copy link
Member

But I have problems finding a standard way to do this

this is the daemon-set you want to deploy for node-problem-detector
https://github.com/kubernetes/node-problem-detector/blob/v0.1/node-problem-detector.yaml

it loads this kernel monitor config:
https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json

i haven't tested this, but -1 on bundling this (or any similar app) with kind.

@mitar
Copy link
Contributor Author

mitar commented Mar 7, 2019

Why not? It could help debug kind? Especially if logs would be stored together with exported kind logs.

@BenTheElder
Copy link
Member

The NPD isn't really designed for a shared kernel, I don't think it will help. It's meant for stuff like catching disk corruption IIRC.

@mitar
Copy link
Contributor Author

mitar commented Mar 8, 2019

OK, I think I see that this is not needed to be bundled with kind because it is really easy to deploy it afterwards. I was just not familiar enough with how to do this when I opened this issue.

But I do still think it is useful to have it with kind. I am hoping to catch OOM killings of processes and I hope that by having events being visible in Kubernetes logs I can easier correlate any issues during CI process with OOM killings. I understand that I might have false positives, OOM events for unrelated processes, but on the other hand it might also provide useful information.

@mitar mitar closed this as completed Mar 8, 2019
@mitar
Copy link
Contributor Author

mitar commented Mar 8, 2019

Moreover, I think that such checks for kubelet restarts could also work well with kind and are not bound to the whole host. It could help get more useful logging messages to debug issues I have in #329 where it seems like kubelet gets killed and restarted, but while that is happening cluster control is not responding.

So while maybe monitoring of kernel messages have issues in shared kernel scenario, monitoring kubelet logs and converting some of them to events might be useful. (Of course, not sure how can event be submitted if API daemon itself is down.)

stg-0 added a commit to stg-0/kind that referenced this issue Nov 20, 2023
…ubernetes-sigs#362)

* chore: increase replicas to 2 for capi controller manager services

* refactor

* review CHANGELOG

* refactor

---------

Co-authored-by: stg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants