Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout: Waiting for a default service account to be provisioned in namespace #17325

Open
bparees opened this issue Nov 15, 2017 · 12 comments
Open
Assignees
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/P1

Comments

@bparees
Copy link
Contributor

bparees commented Nov 15, 2017

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17092/test_pull_request_origin_extended_conformance_install/2451/

/tmp/openshift/build-rpm-release/rpm/BUILD/origin-3.8.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:130
Expected error:
    <*errors.errorString | 0xc42027d1c0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
not to have occurred
/tmp/openshift/build-rpm-release/rpm/BUILD/origin-3.8.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:208
		
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:53
[BeforeEach] [Feature:ImageLookup][registry] Image policy
  /tmp/openshift/build-rpm-release/rpm/BUILD/origin-3.8.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:130
STEP: Creating a kubernetes client
Nov 15 12:21:26.205: INFO: >>> kubeConfig: /etc/origin/master/admin.kubeconfig
STEP: Building a namespace api object
Nov 15 12:21:26.259: INFO: configPath is now "/tmp/extended-test-resolve-local-names-42xrk-2hjn6-user.kubeconfig"
Nov 15 12:21:26.259: INFO: The user is now "extended-test-resolve-local-names-42xrk-2hjn6-user"
Nov 15 12:21:26.259: INFO: Creating project "extended-test-resolve-local-names-42xrk-2hjn6"
Nov 15 12:21:26.363: INFO: Waiting on permissions in project "extended-test-resolve-local-names-42xrk-2hjn6" ...
STEP: Waiting for a default service account to be provisioned in namespace

The default timeout for this is 2 minutes.

@bparees bparees added kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Nov 15, 2017
@bparees
Copy link
Contributor Author

bparees commented Jan 23, 2018

@bparees
Copy link
Contributor Author

bparees commented Jan 24, 2018

@mfojtik we're still seeing this a lot in our extended test runs. Any suggestions?

/cc @derekwaynecarr

https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_image_ecosystem/359/#showFailuresLink

@bparees
Copy link
Contributor Author

bparees commented Jan 24, 2018

/cc @liggitt @deads2k

@deads2k
Copy link
Contributor

deads2k commented Jan 24, 2018

@mfojtik we're still seeing this a lot in our extended test runs. Any suggestions?

"Still" or "just started again". What percentage of jobs are failing on it?

@stevekuznetsov I've got master and node metrics, but not controller metrics. Where is the script that describes what to gather?

@bparees
Copy link
Contributor Author

bparees commented Jan 24, 2018

@deads2k i've only started looking into this again but the last 3 times i ran our extended tests, I saw this in several test failures for each run. so, 100% over the last few days, that i've looked at.

As to whether there was ever a period in recent history where it wasn't happening, i'm not sure.

@bparees
Copy link
Contributor Author

bparees commented Jan 24, 2018

(i'm also not sure why our extended jobs would be particularly vulnerable to it, vs conformance jobs)

@deads2k
Copy link
Contributor

deads2k commented Jan 24, 2018

@deads2k i've only started looking into this again but the last 3 times i ran our extended tests, I saw this in several test failures for each run. so, 100% over the last few days, that i've looked at.

Is there any sane way for you to see if it spiked about two weeks ago? I re-sliced some startup code that seemed to significantly improve our normal CI, but if it suddenly started spiking, that's where I'd be starting my search.

@stevekuznetsov
Copy link
Contributor

What do you mean by "controller metrics"? We dump pprof output but I'm not sure we do anything from Prometheus?

@bparees
Copy link
Contributor Author

bparees commented Jan 24, 2018

Is there any sane way for you to see if it spiked about two weeks ago?

not really. our extended test jobs have been a mess for a month and a half due to storage issues and devmapper issues and I don't really want to try to weed through that.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2018
@bparees
Copy link
Contributor Author

bparees commented May 25, 2018

@smarterclayton heh......

/remove-lifecycle rotten
/lifecycle frozen

@deads2k @mfojtik @smarterclayton indicated he has a bug open for this (issues w/ the service account controller getting bogged down)

@openshift-ci-robot openshift-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/P1
Projects
None yet
Development

No branches or pull requests

6 participants