-
Notifications
You must be signed in to change notification settings - Fork 81
async: thread-safe schedule() #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e327e07 to
e1c9191
Compare
53f60c3 to
4a95650
Compare
|
I see it now.
If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks. I also can image situations where you'd want to start non-IO compute in a thread pool to not block nginx - in our case, for example, crypto. You'd want to be able to notify the request handler async task of completion by writing to a channel or a similar mechanism. This, in turn, would call the waker from that thread (AFAIK), which calls schedule for the task from that thread, but the woken task would be scheduled to run on the main thread via ngx_notify.
We need to work with the request heavily (mutate headers_in and headers_out, read client bodies, produce response bodies) in response to I/O (external requests, database queries, custom crypto/tunneling), which can only be done on the main thread safely. If all our code is running in a completely separate engine, it all becomes extremely hard. In addition, we need a way to interrupt nginx' epoll reacting I/O events, which aren't all bound to a request (OpenID shared signals, e.g.).
I don't think it would do that. If the waker is invoked from the main thread, schedule in my branch would simply .run() the runnable, and everything stays on the main thread. ngx_notify would not be called (except once during the lifetime of a worker process because it's not known which tid is main). I have to admit I didn't test with nginx-acme yet though. To recap, I'd still like the following:
Given ngx_epoll_module.c:769, ngx_notify from other threads is indeed inherently unsafe. However, what if we do this:
Would this work for you? |
schedule() can now be called from any thread, but will move tasks to the event loop thread. pthread_kill(main_thread, SIGIO) is used to ensure prompt reponse if needed. This enables receiving I/O notification from "sidecar runtimes" like async-compat, for instance. The async example has been rewritten to use async_::spawn, demonstrating usage of reqwest and hyper clients wrapped in Compat to provide a tokio runtime environment while using the async_ Scheduler as executor.
|
@bavshin-f5 I've rewritten the code to not rely on ngx_notify. Instead, I'm using ngx_post_event, followed by pthread_kill(main_thread, SIGIO) as I had a hard time getting the notify_fd from within ngx-rust. Does that address your concern? |
Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment. |
| event.log = ngx_cycle_log().as_ptr(); | ||
|
|
||
| unsafe { | ||
| ngx_post_event(&mut *event, ptr::addr_of_mut!(ngx_posted_events)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting to ngx_posted_events can easily lead to an infinite loop. If the current task is already running from a posted event handler, no IO could happen before the next wakeup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the current task is running on the event thread, there is actually no need to post the event, as the handler is still currently reading from the channel here, necessarily. Therefore we can just skip it like the SIGIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to remove the ngx_post_event when on event thread, but the example started deadlocking. I'm not sure why, but it seems we can't skip it. (but we should still skip the SIGIO). Can you elaborate on the deadlock that you suspect can happen now? Doesn't ngx_post_event just add it to the queue? When we are on the event thread, we know nginx is currently spinning, so the posted event should just get picked up the next turn.
Why is "no IO could happen before the next wakeup" relevant here?
src/async_/spawn.rs
Outdated
| /// Initialize async by storing MAIN_THREAD | ||
| pub fn initialize_async() { | ||
| MAIN_THREAD | ||
| .set(unsafe { pthread_self() }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You call this from the master process, but POSIX does not specify if thread ID remains the same after fork().
It's better to initialize this in spawn, because spawn is the entry point of async runtime and it supposed to be called from a worker process.
Raw pthread use is also non-portable, we have that one platform without pthread.h that we pretend to support. ngx_thread_tid presence depends on the nginx build options, and Rust's std::thread::ThreadId is very expensive to obtain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm running async_initialize in init_process. The dev guide reads:
The master process creates one or more worker processes and the init_process handler is called in each of them.
(emphasis mine)
I read this as: "called once per worker", and I think I'm seeing that happening right now. Am I mistaken?
Fair point on pthreads, what would you recommend?
I'd have used nginx_thread_tid (potentially requiring the corresponding build options for the "async" feature), and had it working for just the "on event thread" detection, but then I don't have anything to pass to pthread_kill...
Would a normal kill(getpid(), SIGIO) be ok, too? It did seem to work fine when I tested it a few weeks back while working on the initial version, and it would actually enable me to remove the required init from init_process again.
On init during spawn - I considered it, but I wasn't sure I can rely on it happening on the event thread. Couldn't the user have set up a ngx_thread_task, and call the first spawn in its handler, shooting themselves into the foot?
I don't claim to fully understand it, but they state: "Otherwise, a new single-threaded runtime will be created on demand. That does not mean the future is polled by the tokio runtime ." The tokio runtime could spawn their own tasks into that runtime, sure. e.g some kind of helper task. But I don't see how my task could end up there. If my tasks Runnable.schedule() arrange it to be scheduled on the event thread, which is precisely what my PR does, it will run just there. I'm not an expert, but I think what happens is this:
This is what I see right now, using the code from the PR. This is also what I'd expect to happen with a "sidecar"-tokio-runtime that I started myself (no async-compat). |
|
I just pushed an experiment with a sidecar tokio runtime and added tid debug logging here: https://github.com/pschyska/ngx-rust/blob/a5ff1bb0cc3e6d5bb15f46e24348a1d2fa694f18/examples/async.rs#L115 This supports my theory: my task is never moved to the tokio runtime. It calls schedule from its own threads though - when using tokio::spawn from the thread of the runtime (494047), when awaiting I've also pushed a change to main to switch to kill and nginx_thread_tid. It works fine also. |
I just had another idea that helped me visualize this: If Futures !Send could move executors at will, it would be able for them to end up in an executor that requires Send (and/or Sync). E.g.: if the "part-2" future of my task, after awaiting a future from a tokio runtime, would magically run in a tokio executor using threads somehow, it would have to be Send. But If I used e.g. async_task::spawn_local, it could be just 'static. The compiler would not compile that code. (of course, crucial parts of an executor are unsafe, but this would still make this behaviour wildly illegal in Rust). I don't know of any method of making a task move executors. If wanted to connect futures of different executors beyond their output for some reason (e.g. to be able to cancel the other task), I would use a remote_handle. But AFAIK this doesn't change the Context (which ties back to schedule() and task), but establishes an oneshot between the tasks. We could use spawn_local instead of spawn_unchecked (which would store Rust's thread id and check that it is the same on .run()), but this is unnecessary overhead in this case, it simply can't happen. The example code I wrote which leads to waking from other threads all the time still runs fine with spawn_unchecked. Another angle on this - the spawn_unchecked docs state: Safety
I think I have now fully convinced myself, let me know if this helps to convince you as well 🙂 |
Proposed changes
As mentioned in #110, my work on making Scheduler.schedule() thread-safe.
This would enable
schedule()to be called from other threads, e.g. async-compat or other "sidecar-runtime" setups. It also makes sure epoll is interrupted when there are IO completion notifications coming in from outside of the event loop, leading to prompt continuation.While this doesn't provide a native
hyper/clientas @bavshin-f5 wanted, it makes the default tokio implementation work via Compat. This would be a viable stopgap solution for us. I've added some examples, including hyper and reqwest. In the future, one could implement a "sidecar-runtime" approach as in async-compat natively that would use a separate epoll loop in a thread, or inject additional fds from the Rust side to nginx's epoll instance (if possible).Some notes:
stdas a dependency forasyncto reflect that (this would be a breaking change, but async Rust probably implies std anyways).Checklist
Before creating a PR, run through this checklist and mark each as complete.