-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/machine/e2e: fix broken cleanup #23154
pkg/machine/e2e: fix broken cleanup #23154
Conversation
On linux and macos the connections are stored under the home dir by default so it is not a problem there but on windows we first check the APPDATA env and use this dir as config storage. This has the problem that it is not cleaned up after each test as such connections might leak into the following test causing failues there. Fixes containers#22844 Signed-off-by: Paul Holzinger <[email protected]>
Currently all podman machine rm errors in AfterEach were ignored. This means some leaked and caused issues later on, see containers#22844. To fix it first rework the logic to only remove machines when needed at the place were they are created using DeferCleanup(), however DeferCleanup() does not work well together with AfterEach() as it always run AfterEach() before DeferCleanup(). As AfterEach() deletes the dir the podman machine rm call can not be done afterwards. As such migrate all cleanup to use DeferCleanup() and while I have to touch this fix the code to remove the per file duplciation and define the setup/cleanup once in the global scope. Signed-off-by: Paul Holzinger <[email protected]>
@edsantiago @baude @ashley-cui PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Two questions inline.
@@ -111,6 +111,9 @@ func setup() (string, *machineTestBuilder) { | |||
if err := os.Unsetenv("SSH_AUTH_SOCK"); err != nil { | |||
Fail("unable to unset SSH_AUTH_SOCK") | |||
} | |||
if err := os.Setenv("PODMAN_CONNECTIONS_CONF", filepath.Join(homeDir, "connections.json")); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of allowing CI tests to run locally, would it make sense to set this to a tempdir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is already a tmpdir, see above. The entire home is overwritten but as windows is using APPDATA it does not help there. Of course I could overwrite APPDATA for windows but this seems simpler and more consistent to me
// Some test create a invalid VM so the VM does not exists in this case we have to ignore the error. | ||
// It would be much better if rm -f would behave like other commands and ignore not exists errors. | ||
if session.ExitCode() == 125 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of those tests! Would it make sense for those tests to set a SkipMachineCleanup
or InvalidVM
state flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not trusting test writers to remove this flag when it is no longer needed and it doesn't help if a bug causes the machine to be created all of the sudden then we leak machines.
IMO the reasonable fix to to make rm -f not error on non existing machine like our other commands do, i.e. podman rm -f blah
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM |
/lgtm |
I think I noticed one weird failure pattern:
I see this pattern in basically all my failed runs here. My best guess is that was caused by #23068. I know we had the flake before but the fact that it got that bad all of the sudden suggest to me that something must have changed that causes this. I also pushed #23162 that should hopefully add useful debug output to find the root cause. |
f5d50a6
into
containers:main
It took 13 tries to get the mac machine test to pass |
pkg/machine/e2e: use tmp file for connections
On linux and macos the connections are stored under the home dir by
default so it is not a problem there but on windows we first check
the APPDATA env and use this dir as config storage. This has the problem
that it is not cleaned up after each test as such connections might leak
into the following test causing failues there.
Fixes #22844
pkg/machine/e2e: fix broken cleanup
Currently all podman machine rm errors in AfterEach were ignored.
This means some leaked and caused issues later on, see #22844.
To fix it first rework the logic to only remove machines when needed at
the place were they are created using DeferCleanup(), however
DeferCleanup() does not work well together with AfterEach() as it always
run AfterEach() before DeferCleanup(). As AfterEach() deletes the dir
the podman machine rm call can not be done afterwards.
As such migrate all cleanup to use DeferCleanup() and while I have to
touch this fix the code to remove the per file duplciation and define
the setup/cleanup once in the global scope.
Does this PR introduce a user-facing change?