Re: [git pull] iov_iter fixes

From: Jens Axboe
Date: Fri Sep 10 2021 - 11:04:31 EST


On 9/10/21 7:57 AM, Jens Axboe wrote:
> On 9/9/21 9:36 PM, Al Viro wrote:
>> On Thu, Sep 09, 2021 at 09:30:03PM -0600, Jens Axboe wrote:
>>
>>>> Again, we should never, ever modify the iovec (or bvec, etc.) array in
>>>> ->read_iter()/->write_iter()/->sendmsg()/etc. instances. If you see
>>>> such behaviour anywhere, report it immediately. Any such is a blatant
>>>> bug.
>>>
>>> Yes that was wrong, the iovec is obviously const. But that really
>>> doesn't change the original point, which was that copying the iov_iter
>>> itself unconditionally would be miserable.
>>
>> Might very well be true, but... won't your patch hit the reimport on
>> every short read? And the cost of uaccess in there is *much* higher
>> than copying of 48 bytes into local variable...
>>
>> Or am I misreading your patch? Note that short reads on reaching
>> EOF are obviously normal - it's not a rare case at all.
>
> It was just a quick hack, might very well be too eager to go through
> those motions. But pondering this instead of sleeping, we don't need to
> copy all of iov_iter in order to restore the state, and we can use the
> same advance after restoring. So something like this may be more
> palatable. Caveat - again untested, and I haven't tested the performance
> impact of this at all.

Passes basic testing for me. I added a sysctl switch for this so I can
compare performance, running my usual peak-perf-single-core benchmark.
That one does ~3.5M IOPS, using polled IO. There's always a slight
variability between boots and builds, hence the sysctl so I could
toggle this behavior on the fly.

Did a few runs, and the differences are very stable. With this enabled,
we spend about 0.15% more time in io_read(). That's only worth about
5K IOPS at 3.5M, not enough to notice as the variation for the 1 second
reporting window usually swings more than that:

Old behavior:
IOPS=3536512, IOS/call=32/31, inflight=(75)
IOPS=3541888, IOS/call=32/32, inflight=(64)
IOPS=3529056, IOS/call=32/31, inflight=(119)
IOPS=3521184, IOS/call=32/32, inflight=(96)
IOPS=3527456, IOS/call=32/31, inflight=(128)
IOPS=3525504, IOS/call=32/32, inflight=(128)
IOPS=3524288, IOS/call=32/32, inflight=(128)
IOPS=3536192, IOS/call=32/32, inflight=(96)
IOPS=3535840, IOS/call=32/32, inflight=(96)
IOPS=3533728, IOS/call=32/31, inflight=(128)
IOPS=3528384, IOS/call=32/32, inflight=(128)
IOPS=3518400, IOS/call=32/32, inflight=(64)

Turning it on:
IOPS=3533824, IOS/call=32/31, inflight=(64)
IOPS=3541408, IOS/call=32/32, inflight=(32)
IOPS=3533024, IOS/call=32/31, inflight=(64)
IOPS=3528672, IOS/call=32/32, inflight=(35)
IOPS=3522272, IOS/call=32/31, inflight=(107)
IOPS=3517632, IOS/call=32/32, inflight=(57)
IOPS=3516000, IOS/call=32/31, inflight=(96)
IOPS=3513568, IOS/call=32/32, inflight=(34)
IOPS=3525600, IOS/call=32/31, inflight=(96)
IOPS=3527136, IOS/call=32/31, inflight=(101)

I think that's tolerable, it was never going to be absolutely free.

What do you think of this approach? Parts of iov_iter are going to
remain constant, like iter_type and data_source. io_uring already copies
iter->count, so that just leaves restoring iov (and unionized friends),
nr_segs union, and iov_offset;

We could pretty this up and have the state part be explicit in iov_iter,
and have the store/restore parts end up in uio.h. That'd tie them closer
together, though I don't expect iov_iter changes to be an issue. It
would make it more maintainable, though. I'll try and hack up this
generic solution, see if that looks any better.

--
Jens Axboe