-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Migrate watch test to common framework #14345
Conversation
Left some small comments, but overall looks great! Good job. |
ce9eab1
to
69d854b
Compare
Codecov Report
@@ Coverage Diff @@
## main #14345 +/- ##
==========================================
- Coverage 75.60% 75.24% -0.37%
==========================================
Files 457 457
Lines 37084 37116 +32
==========================================
- Hits 28038 27927 -111
- Misses 7312 7424 +112
- Partials 1734 1765 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Signed-off-by: nic-chen <[email protected]>
} | ||
|
||
ch := make(chan clientv3.WatchResponse) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered to wait for the goroutine to exit when the test case finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, @ahrtr
the goroutine will exit by context cancel when the test finishes, see: https://github.com/etcd-io/etcd/pull/14345/files#diff-70546f01c62cf8d453ef453d51b64bbdc664e39613453b572120148a77600a08R76
https://github.com/etcd-io/etcd/pull/14345/files#diff-263c581c78a943b3816973b74a3401e058a64364407518b53bdca51f5546d1feR589
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible (although with low possibility) that the test case finish before the goroutine exit, eventually the test might report goroutine leak errors. Our pipeline fails due to this kind of error from time to time. So for safety, we need to make sure the goroutine exits before the the test case returns/exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a problem. We should be good as long as we propagate signal to goroutine that causes it to exit. It's up to leak detection to wait for this signal to propagate. If you read the etcd test goroutine leak implementation, it gives test up to 5 seconds to cleanup its goroutines and uses runtime.Gosched()
to give goroutine time to execute and exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could merge it first. If there is a real problem, I will fix it. What do you think? @serathius @ahrtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ahrtr is worried that the problem will not just imminently show up after merging. Leak goroutine issues like this one are very subtle (for example I cannot reproduce this issue locally at all), so I understand his position.
I still think my argument holds and this should not be a problem, but I would prefer to wait for confirmation from @ahrtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you read the etcd test goroutine leak implementation, it gives test up to 5 seconds to cleanup its goroutines and uses
runtime.Gosched()
to give goroutine time to execute and exit.
The goroutine leak checker indeed waits up to 1 second in the afterTest
check.
This is just a mitigation solution instead of best practice to me. We should always make sure gracefully shutdown of all threads/goroutines in almost all cases, including production code and test code, unless the cases that we intentionally create.
Previously we also saw a situation that a test case (in test code) or the main goroutine (in production code) already finishes/exits, but a goroutine it creates still prints log, and it isn't allowed by golang. Of course, I do not see such issue in this PR.
I just searched the repo, there are lots of similar cases which create goroutines but do not gracefully shutdown them. It's OK to merge this PR for now. But I'd like to revisit all such cases afterwards in separate PR(s). WDYT? @serathius @spzala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the is a problem worth looking into.
Merging as we agreed to follow up with separate PRs. |
Part of #13637