futex+io_uring: futex_q::task can maybe be dangling (but is not actually accessed, so it's fine)
From: Jann Horn
Date: Fri Jan 10 2025 - 17:36:10 EST
Hi!
I think there is some brittle interaction between futex and io_uring;
but to be clear, I don't think that there is actually a bug here.
In io_uring, when a IORING_OP_FUTEX_WAIT SQE is submitted with
IOSQE_ASYNC, an io_uring worker thread can queue up futex waiters via
the following path:
ret_from_fork -> io_wq_worker -> io_worker_handle_work ->
io_wq_submit_work[called as ->do_work] -> io_issue_sqe ->
io_futex_wait[called as .issue] -> futex_queue -> __futex_queue
futex_q instances normally describe synchronously waiting tasks, and
__futex_queue() records the identity of the calling task (which is
normally the waiter) in futex_q::task. But io_uring waits on futexes
asynchronously instead; from io_uring's perspective, a pending futex
wait is not tied to the task that called into futex_queue(), it is
just tied to the userspace task on behalf of which the io_uring worker
is acting (I think). So when a futex wait operation is started by an
io_uring worker task, I think that worker task could go away while the
futex_q is still queued up on the futex, and so I think we can end up
with a futex_q whose "task" member points to a freed task_struct.
The good part is that (from what I understand) that "task" member is
only used for two purposes:
1. futexes that are either created through the normal futex syscalls
use futex_wake_mark as their .wake callback, which needs the task
pointer to know which task should be woken.
2. PI futexes use it for priority inheritance magic (and AFAICS there
is no way for io_uring to interface with PI futexes)
I'm not sure what is the best thing to do is here - maybe the current
situation is fine, and I should just send a patch that adds a comment
describing this to the definition of the "task" member? Or maybe it
would be better for robustness to ensure that the "task" member is
NULLed out in those cases, though that would probably make the
generated machine code a little bit more ugly? (Or maybe I totally
misunderstand what's going on and there isn't actually a dangling
pointer...)