Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work
From: Nadav Amit
Date: Tue Aug 10 2021 - 22:33:40 EST
> On Aug 10, 2021, at 2:32 PM, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>
> On 8/10/21 9:28 AM, Nadav Amit wrote:
>>
>> Unfortunately, there seems to be yet another issue (unless my code
>> somehow caused it). It seems that when SQPOLL is used, there are cases
>> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
>> never goes down to zero.
>>
>> Debugging... (while also trying to make some progress with my code)
>
> It's most likely because a request has been lost (mis-refcounted).
> Let us know if you need any help. Would be great to solve it for 5.14.
> quick tips:
>
> 1) if not already, try out Jens' 5.14 branch
> git://git.kernel.dk/linux-block io_uring-5.14
>
> 2) try to characterise the io_uring use pattern. Poll requests?
> Read/write requests? Send/recv? Filesystem vs bdev vs sockets?
>
> If easily reproducible, you can match io_alloc_req() with it
> getting into io_dismantle_req();
So actually the problem is more of a missing IO-uring functionality that I need. When an I/O is queued for async completion (i.e., after returning -EIOCBQUEUED), there should be a way for io-uring to cancel these I/Os if needed. Otherwise they might potentially never complete, as happens in my use-case.
AIO has ki_cancel() for this matter. So I presume the proper solution would be to move ki_cancel() from aio_kiocb to kiocb so it can be used by both io-uring and aio. And then - to use this infrastructure.
But it is messy. There is already a bug in the (few) uses of kiocb_set_cancel_fn() that blindly assume AIO is used and not IO-uring. Then, I am not sure about some things in the AIO code. Oh boy. I’ll work on an RFC.