RE: [PATCH] io_uring: simplify the SQPOLL thread check when cancelling requests
From: lizetao
Date: Sun Jan 12 2025 - 23:41:03 EST
Hi,
> -----Original Message-----
> From: Pavel Begunkov <asml.silence@xxxxxxxxx>
> Sent: Monday, January 13, 2025 5:16 AM
> To: Bui Quang Minh <minhquangbui99@xxxxxxxxx>; lizetao
> <lizetao1@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> Cc: Jens Axboe <axboe@xxxxxxxxx>; io-uring@xxxxxxxxxxxxxxx;
> syzbot+3c750be01dab672c513d@xxxxxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH] io_uring: simplify the SQPOLL thread check when
> cancelling requests
>
> On 1/12/25 16:14, Bui Quang Minh wrote:
> ...
> >>> @@ -2898,7 +2899,12 @@ static __cold void io_ring_exit_work(struct
> >>> work_struct *work)
> >>> if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
> >>> io_move_task_work_from_local(ctx);
> >>>
> >>> - while (io_uring_try_cancel_requests(ctx, NULL, true))
> >>> + /*
> >>> + * Even if SQPOLL thread reaches this path, don't force
> >>> + * iopoll here, let the io_uring_cancel_generic handle
> >>> + * it.
> >>
> >> Just curious, will sq_thread enter this io_ring_exit_work path?
> >
> > AFAIK, yes. The SQPOLL thread is created with create_io_thread, this function
> creates a new task with CLONE_FILES. So all the open files is shared. There will
> be case that the parent closes its io_uring file and SQPOLL thread become the
> only owner of that file. So it can reach this path when terminating.
>
> The function is run by a separate kthread, the sqpoll task doesn't call it directly.
I also think so, the sqpoll task may not call io_ring_exit_work().
>
> [...]
> >>>> io_uring,
> >>> - cancel_all);
> >>> + cancel_all,
> >>> + true);
> >>> }
> >>>
> >>> if (loop) {
> >>> --
> >>> 2.43.0
> >>>
> >>
> >> Maybe you miss something, just like Begunkov mentioned in your last
> version patch:
> >>
> >> io_uring_cancel_generic
> >> WARN_ON_ONCE(sqd && sqd->thread != current);
> >>
> >> This WARN_ON_ONCE will never be triggered, so you could remove it.
> >
> > He meant that we don't need to annotate sqd->thread access in this debug
> check. The io_uring_cancel_generic function has assumption that the sgd is not
> NULL only when it's called by a SQPOLL thread. So the check means to ensure
> this assumption. A data race happens only when this function is called by other
> tasks than the SQPOLL thread, so it can race with the SQPOLL termination.
> However, the sgd is not NULL only when this function is called by SQPOLL
> thread. In normal situation following the io_uring_cancel_generic's assumption,
> the data race cannot happen. And in case the assumption is broken, the
> warning almost always is triggered even if data race happens. So we can ignore
> the race here.
>
> Right. And that's the point of warnings, they're supposed to be untriggerable,
> otherwise there is a problem with the code that needs to be fixed.
Okay, I understand the meaning of this WARN.
>
> --
> Pavel Begunkov
---
Li Zetao