Log errors from APIAvailability probes #3437

nojnhuh · 2025-07-08T21:32:58Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR adds a new log when requests against the API server's /readyz endpoint fail. Currently the only logs when that happen look like this:

cluster not available; HTTP status code: 0

This change will log any error that might have occurred for that request, like a TCP connection or DNS error.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

k8s-ci-robot · 2025-07-08T21:33:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nojnhuh
Once this PR has been reviewed and has the lgtm label, please assign marseel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

clusterloader2/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nojnhuh · 2025-07-14T17:54:15Z

clusterloader2/pkg/measurement/common/api_availability_measurement.go

 func (a *apiAvailabilityMeasurement) updateClusterAvailabilityMetrics(c clientset.Interface) {
 	result := c.CoreV1().RESTClient().Get().AbsPath("/readyz").Do(context.Background())
+	if err := result.Error(); err != nil {
+		klog.Warningf("failed to reach cluster API server: %v", err)


Here's a concrete example of the error that gets logged:

W0710 03:20:23.091522 37631 api_availability_measurement.go:114] failed to reach cluster API server: an error on the server ("[+]ping ok\n[+]log ok\n[-]etcd failed: reason withheld\n[+]etcd-readiness ok\n[+]informer-sync ok\n[+]poststarthook/start-apiserver-admission-initializer ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/priority-and-fairness-config-consumer ok\n[+]poststarthook/priority-and-fairness-filter ok\n[+]poststarthook/storage-object-count-tracker-hook ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[+]poststarthook/crd-informer-synced ok\n[+]poststarthook/start-system-namespaces-controller ok\n[+]poststarthook/start-cluster-authentication-info-controller ok\n[+]poststarthook/start-kube-apiserver-identity-lease-controller ok\n[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok\n[+]poststarthook/start-legacy-token-tracking-controller ok\n[+]poststarthook/start-service-ip-repair-controllers ok\n[+]poststarthook/rbac/bootstrap-roles ok\n[+]poststarthook/scheduling/bootstrap-system-priority-classes ok\n[+]poststarthook/priority-and-fairness-config-producer ok\n[+]poststarthook/bootstrap-controller ok\n[+]poststarthook/start-kubernetes-service-cidr-controller ok\n[+]poststarthook/aggregator-reload-proxy-client-cert ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-status-local-available-controller ok\n[+]poststarthook/apiservice-status-remote-available-controller ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-discovery-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[+]autoregister-completion ok\n[+]poststarthook/apiservice-openapi-controller ok\n[+]poststarthook/apiservice-openapiv3-controller ok\n[+]shutdown ok\nreadyz check failed") has prevented the request from succeeding

Prettier:

[+]ping ok [+]log ok [-]etcd failed: reason withheld [+]etcd-readiness ok [+]informer-sync ok [+]poststarthook/start-apiserver-admission-initializer ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/storage-object-count-tracker-hook ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/start-system-namespaces-controller ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/start-kube-apiserver-identity-lease-controller ok [+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok [+]poststarthook/start-legacy-token-tracking-controller ok [+]poststarthook/start-service-ip-repair-controllers ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/priority-and-fairness-config-producer ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/start-kubernetes-service-cidr-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-status-local-available-controller ok [+]poststarthook/apiservice-status-remote-available-controller ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-discovery-controller ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [+]poststarthook/apiservice-openapiv3-controller ok [+]shutdown ok readyz check failed

k8s-triage-robot · 2025-10-12T18:38:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

nojnhuh · 2025-10-12T18:53:57Z

/remove-lifecycle stale

Log errors from APIAvailability probes

7b431ad

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 8, 2025

k8s-ci-robot requested review from mborsz and wojtek-t July 8, 2025 21:33

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 8, 2025

nojnhuh mentioned this pull request Jul 8, 2025

Update DRA scalability job with CL2 with more logs kubernetes/test-infra#35105

Merged

nojnhuh commented Jul 14, 2025

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 12, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log errors from APIAvailability probes #3437

Log errors from APIAvailability probes #3437

Uh oh!

nojnhuh commented Jul 8, 2025

Uh oh!

k8s-ci-robot commented Jul 8, 2025

Uh oh!

nojnhuh Jul 14, 2025

Uh oh!

k8s-triage-robot commented Oct 12, 2025

Uh oh!

nojnhuh commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Log errors from APIAvailability probes #3437

Are you sure you want to change the base?

Log errors from APIAvailability probes #3437

Uh oh!

Conversation

nojnhuh commented Jul 8, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Uh oh!

k8s-ci-robot commented Jul 8, 2025

Uh oh!

nojnhuh Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

k8s-triage-robot commented Oct 12, 2025

Uh oh!

nojnhuh commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants