Re: [PATCH v2 0/2] Optimise io_uring completion waiting

From: Jens Axboe
Date: Tue Sep 24 2019 - 06:11:55 EST


On 9/24/19 3:33 AM, Pavel Begunkov wrote:
>
>
> On 24/09/2019 11:36, Jens Axboe wrote:
>> On 9/24/19 2:27 AM, Jens Axboe wrote:
>>> On 9/24/19 2:02 AM, Jens Axboe wrote:
>>>> On 9/24/19 1:06 AM, Pavel Begunkov wrote:
>>>>> On 24/09/2019 02:00, Jens Axboe wrote:
>>>>>>> I think we can do the same thing, just wrapping the waitqueue in a
>>>>>>> structure with a count in it, on the stack. Got some flight time
>>>>>>> coming up later today, let me try and cook up a patch.
>>>>>>
>>>>>> Totally untested, and sent out 5 min before departure... But something
>>>>>> like this.
>>>>> Hmm, reminds me my first version. Basically that's the same thing but
>>>>> with macroses inlined. I wanted to make it reusable and self-contained,
>>>>> though.
>>>>>
>>>>> If you don't think it could be useful in other places, sure, we could do
>>>>> something like that. Is that so?
>>>>
>>>> I totally agree it could be useful in other places. Maybe formalized and
>>>> used with wake_up_nr() instead of adding a new primitive? Haven't looked
>>>> into that, I may be talking nonsense.
>>>>
>>>> In any case, I did get a chance to test it and it works for me. Here's
>>>> the "finished" version, slightly cleaned up and with a comment added
>>>> for good measure.
>>>
>>> Notes:
>>>
>>> This version gets the ordering right, you need exclusive waits to get
>>> fifo ordering on the waitqueue.
>>>
>>> Both versions (yours and mine) suffer from the problem of potentially
>>> waking too many. I don't think this is a real issue, as generally we
>>> don't do threaded access to the io_urings. But if you had the following
>>> tasks wait on the cqring:
>>>
>>> [min_events = 32], [min_events = 8], [min_events = 8]
>>>
>>> and we reach the io_cqring_events() == threshold, we'll wake all three.
>>> I don't see a good solution to this, so I suspect we just live with
>>> until proven an issue. Both versions are much better than what we have
>>> now.
>>
>> Forgot an issue around signal handling, version below adds the
>> right check for that too.
>
> It seems to be a good reason to not keep reimplementing
> "prepare_to_wait*() + wait loop" every time, but keep it in sched :)

I think if we do the ->private cleanup that Peter mentioned, then
there's not much left in terms of consolidation. Not convinced the case
is interesting enough to warrant a special helper. If others show up,
it's easy enough to consolidate the use cases and unify them.

If you look at wake_up_nr(), I would have thought that would be more
widespread. But it really isn't.

>> Curious what your test case was for this?
> You mean a performance test case? It's briefly described in a comment
> for the second patch. That's just rewritten io_uring-bench, with
> 1. a thread generating 1 request per call in a loop
> 2. and the second thread waiting for ~128 events.
> Both are pinned to the same core.

Gotcha, thanks.

--
Jens Axboe