Replace fixed sleep with flush in user_drv shutdown#10808
Replace fixed sleep with flush in user_drv shutdown#10808jtdowney wants to merge 2 commits intoerlang:maintfrom
Conversation
CT Test Results 2 files 72 suites 1h 4m 22s ⏱️ Results for commit d59b46f. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
|
I ran this locally and run_erl_SUITE:heavy fails just like the other PR |
|
I've had some time to dig into this and figure out what I think is going on. I got suspicious that |
|
I was surprised that the tests started failing when the sleep was removed as output should be synchronous. So I think then that the change that was made in #10776 is enough together with the testcase fix you did here. |
de6f52a to
72d6da9
Compare
|
@garazdawi I agree. I pushed an updated tree with the commit from @tsloughter in #10776 cherry-picked along with the test update I made. |
When
init:stop()is called,user_sup:terminate/2needs to give pending I/O a chance to reach the terminal before it kills the user process. The current implementation does this withreceive after 1000 -> ok end, an unconditional one-second sleep. This works, but it means every shutdown pays a full second of latency, whether or not there is actually output in flight. On an idle node, the delay is pure waste.This change introduces
user_drv:flush/0, which enqueues a synchronousput_chars_syncrequest carrying an empty binary into the existing I/O queue. Because the queue is processed in order, the call returns only after every preceding write has been acknowledged by the writer process.user_sup:terminate/2now callsflush()instead of sleeping, so shutdown completes as soon as the queue drains rather than always waiting the full second. Theflushcall uses a one-second timeout and catches all failures, so it degrades gracefully ifuser_drvis already gone or unresponsive. The worst case is the same one-second delay as before.