Re: futex+io_uring: futex_q::task can maybe be dangling (but is not actually accessed, so it's fine)
From: Jann Horn
Date: Mon Jan 13 2025 - 08:54:41 EST
On Sat, Jan 11, 2025 at 4:33 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 1/10/25 3:26 PM, Jann Horn wrote:
> > Hi!
> >
> > I think there is some brittle interaction between futex and io_uring;
> > but to be clear, I don't think that there is actually a bug here.
> >
> > In io_uring, when a IORING_OP_FUTEX_WAIT SQE is submitted with
> > IOSQE_ASYNC, an io_uring worker thread can queue up futex waiters via
> > the following path:
> >
> > ret_from_fork -> io_wq_worker -> io_worker_handle_work ->
> > io_wq_submit_work[called as ->do_work] -> io_issue_sqe ->
> > io_futex_wait[called as .issue] -> futex_queue -> __futex_queue
> >
> > futex_q instances normally describe synchronously waiting tasks, and
> > __futex_queue() records the identity of the calling task (which is
> > normally the waiter) in futex_q::task. But io_uring waits on futexes
> > asynchronously instead; from io_uring's perspective, a pending futex
> > wait is not tied to the task that called into futex_queue(), it is
> > just tied to the userspace task on behalf of which the io_uring worker
> > is acting (I think). So when a futex wait operation is started by an
> > io_uring worker task, I think that worker task could go away while the
> > futex_q is still queued up on the futex, and so I think we can end up
> > with a futex_q whose "task" member points to a freed task_struct.
> >
> > The good part is that (from what I understand) that "task" member is
> > only used for two purposes:
> >
> > 1. futexes that are either created through the normal futex syscalls
> > use futex_wake_mark as their .wake callback, which needs the task
> > pointer to know which task should be woken.
> > 2. PI futexes use it for priority inheritance magic (and AFAICS there
> > is no way for io_uring to interface with PI futexes)
> >
> > I'm not sure what is the best thing to do is here - maybe the current
> > situation is fine, and I should just send a patch that adds a comment
> > describing this to the definition of the "task" member? Or maybe it
> > would be better for robustness to ensure that the "task" member is
> > NULLed out in those cases, though that would probably make the
> > generated machine code a little bit more ugly? (Or maybe I totally
> > misunderstand what's going on and there isn't actually a dangling
> > pointer...)
>
> Good find. And yes the io-wq task could go away, if there's more of
> them.
>
> While it isn't an issue, dangling pointers make me nervous. We could do
> something like the (totally untested) below patch, where io_uring just
> uses __futex_queue() and make __futex_queue() take a task_struct as
> well. Other callers pass in 'current'.
>
> It's quite possible that it'd be safe to just use __futex_queue() and
> clear ->task after the queue, but if we pass in NULL from the get-go,
> then there's definitely not a risk of anything being dangling.
Yeah, this looks like a nice cleanup that makes this safer, thanks!