Re: [git pull] iov_iter fixes
From: Pavel Begunkov
Date: Thu Sep 09 2021 - 18:58:22 EST
On 9/9/21 11:54 PM, Pavel Begunkov wrote:
> On 9/9/21 8:37 PM, Linus Torvalds wrote:
>> On Wed, Sep 8, 2021 at 9:24 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Fixes for io-uring handling of iov_iter reexpands
>>
>> Ugh.
>>
>> I have pulled this, because I understand what it does and I agree it
>> fixes a bug, but it really feels very very hacky and wrong to me.
>
> Maybe was worded not too clearly, my apologies.
>
>
>> It really smells like io-uring is doing a "iov_iter_revert()" using a
>> number that it pulls incorrectly out of its arse.
>
> It's not invented by io_uring,
>
> filemap.c : generic_file_direct_[write,read]()
>
> do the same thing. Also, the block layer was not re-expanding before
> ~5.12, so it looks it was possible to trigger a similar thing without
> io_uring, but I haven't tried to reproduce. Was mentioned in the
> cover-letter.
>
>> So when io-uring does that
>>
>> iov_iter_revert(iter, io_size - iov_iter_count(iter));
>>
>> what it *really* wants to do is just basically "iov_iter_reset(iter)".
>>
>> And that's basically what that addition of that "iov_iter_reexpand()"
>> tries to effectively do.
>>
>> Wouldn't it be better to have a function that does exactly that?
>>
>> Alternatively (and I'm cc'ing Jens) is is not possible for the
>> io-uring code to know how many bytes it *actually* used, rather than
>> saying that "ok, the iter originally had X bytes, now it has Y bytes,
>> so it must have used X-Y bytes" which was actively wrong for the case
>> where something ended up truncating the IO for some reason.
>>
>> Because I note that io-uring does that
>>
>> /* may have left rw->iter inconsistent on -EIOCBQUEUED */
>> iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));
>>
>> in io_resubmit_prep() too, and that you guys missed that it's the
>> exact same issue, and needs that exact same iov_iter_reexpand().
>
> Right. It was covered by v1-v2, which were failing requests with
> additional fallback in v2 [1], but I dropped in v3 [2] because there
> is a difference. Namely io_resubmit_prep() might be called deeply down
> the stack, e.g. in the block layer.
>
> It was intended to get fixed once the first part is merged, and I do
> believe that was the right approach, because there were certain
> communication delays. The first version was posted a month ago, but
> we missed the merged window. It appeared to me that if we get anything
> more complex
Dammit, apologies for the teared email.
... It was intended to get fixed once the first part is merged, and I do
believe that was the right approach, because there were certain
communication delays. The first version was posted a month ago, but
we missed the merged window. It appeared to me that if anything
more complex is posted, it would take another window to get it done.
> [1] https://lkml.org/lkml/2021/8/12/620
> [2] https://lkml.org/lkml/2021/8/23/285
>
>>
>> That "req->result" is once again the *original* length, and the above
>> code once again mis-handles the case of "oh, the iov got truncated
>> because of some IO limit".
>>
>> So I've pulled this, but I think it is
>>
>> (a) ugly nasty
>>
>> (b) incomplete and misses a case
>>
>> and needs more thought. At the VERY least it needs that
>> iov_iter_reexpand() in io_resubmit_prep() too, I think.
>>
>> I'd like the comments expanded too. In particular that
>>
>> /* some cases will consume bytes even on error returns */
>>
>> really should expand on the "some cases" thing, and why such an error
>> isn't fatal buye should be retried asynchronously blindly like this?
>>
>> Because I think _that_ is part of the fundamental issue here - the
>> io_uring code tries to just blindly re-submit the whole thing, and it
>> does it very badly and actually incorrectly.
>>
>> Or am I missing something?
>>
>> Linus
>>
>
--
Pavel Begunkov