Re: [PATCH 0/2][for-next] cleanup submission path

From: Pavel Begunkov
Date: Mon Oct 28 2019 - 07:12:40 EST


On 28/10/2019 06:38, Jens Axboe wrote:
> On 10/27/19 1:59 PM, Pavel Begunkov wrote:
>> On 27/10/2019 22:51, Jens Axboe wrote:
>>> On 10/27/19 1:17 PM, Pavel Begunkov wrote:
>>>> On 27/10/2019 22:02, Jens Axboe wrote:
>>>>> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>>>>>> On 27/10/2019 20:26, Jens Axboe wrote:
>>>>>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>>>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>>>>>> io_ring_submit()
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>>>>> io_uring: handle mm_fault outside of submission
>>>>>>>>>>>>> io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>>>>>
>>>>>>>>>>>>> fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>>>>> 1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>>>>>> to grab the mm.
>>>>>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>>>>>
>>>>>>>>>>> Do you prefer to return it back first?
>>>>>>>>>>
>>>>>>>>>> Ah I see, no I don't care about that.
>>>>>>>>>
>>>>>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>>>>>> that, only if the sqes we need to submit need mm context.
>>>>>>>>>
>>>>>>>> That's what my question to the fix was about :)
>>>>>>>> 1. Then, what the case it could fail?
>>>>>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>>>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>>>>>
>>>>>>>> Anyway, I will add it back and resend the patchset.
>>>>>>>
>>>>>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>>>>>> then queue that up for 5.4 since we now lost that optimization. Then
>>>>>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>>>>>> on top of that.
>>>>>>>
>>>>>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>>>>>> in that release. For that case, you can fold the change in with these
>>>>>>> two patches.
>>>>>>>
>>>>>> Hmm, what's the semantics? I think we should fail only those who need
>>>>>> mm, but can't get it. The alternative is to fail all subsequent after
>>>>>> the first mm_fault.
>>>>>
>>>>> For the sqthread setup, there's no notion of "do this many". It just
>>>>> grabs whatever it can and issues it. This means that the mm assign
>>>>> is really per-sqe. What we did before, with the batching, just optimized
>>>>> it so we'd only grab it for one batch IFF at least one sqe in that batch
>>>>> needed the mm.
>>>>>
>>>>> Since you've killed the batching, I think the logic should be something
>>>>> ala:
>>>>>
>>>>> if (io_sqe_needs_user(sqe) && !cur_mm)) {
>>>>> if (already_attempted_mmget_and_failed_ {
>>>>> -EFAULT end sqe
>>>>> } else {
>>>>> do mm_get and mmuse dance
>>>>> }
>>>>> }
>>>>>
>>>>> Hence if the sqe doesn't need the mm, doesn't matter if we previously
>>>>> failed. If we need the mm and previously failed, -EFAULT.
>>>>>
>>>> That makes sense, but a bit hard to implement honoring links and drains
>>>
>>> If it becomes too complicated or convoluted, just drop it. It's not
>>> worth spending that much time on.
>>>
>> I've already done it more or less elegantly, just prefer to test commits
>> before sending.
>
> That's always appreciated!
>
> It struck me that while I've added quite a few regression tests, we don't
> have any that just do basic read/write using the variety of settings we
> have for that. So I added that to liburing.
>
Great, thanks!
I think, I'll postpone patches including these until start of 5.5

--
Yours sincerely,
Pavel Begunkov

Attachment: signature.asc
Description: OpenPGP digital signature