RE: [PATCH] eventfd: convert to using ->write_iter()
From: David Laight
Date: Tue May 04 2021 - 04:07:41 EST
From: Jens Axboe
> Sent: 03 May 2021 19:05
>
> On 5/3/21 12:02 PM, Matthew Wilcox wrote:
> > On Mon, May 03, 2021 at 11:57:08AM -0600, Jens Axboe wrote:
> >> On 5/3/21 10:12 AM, David Laight wrote:
> >>> From: Jens Axboe
> >>>> Sent: 03 May 2021 15:58
> >>>>
> >>>> Had a report on writing to eventfd with io_uring is slower than it
> >>>> should be, and it's the usual case of if a file type doesn't support
> >>>> ->write_iter(), then io_uring cannot rely on IOCB_NOWAIT being honored
> >>>> alongside O_NONBLOCK for whether or not this is a non-blocking write
> >>>> attempt. That means io_uring will punt the operation to an io thread,
> >>>> which will slow us down unnecessarily.
> >>>>
> >>>> Convert eventfd to using fops->write_iter() instead of fops->write().
> >>>
> >>> Won't this have a measurable performance degradation on normal
> >>> code that does write(event_fd, &one, 4);
> >>
> >> If ->write_iter() or ->read_iter() is much slower than the non-iov
> >> versions, then I think we have generic issues that should be solved.
> >
> > We do!
> >
> > https://lore.kernel.org/linux-fsdevel/20210107151125.GB5270@xxxxxxxxxxxxxxxxxxxx/
> > is one thread on it. There have been others.
>
> But then we really must get that fixed, imho ->read() and ->write()
> should go away, and if the iter variants are 10% slower, then that should
> get fixed up.
I think there are two separate issues.
(Although I've not looked in detail into the really bad cases.)
1) I suspect some of the fs code is using entirely different paths for the
'single fragment' and 'iter' variants.
2) For trivial drivers the cost of setting up the iov_iter[] and then
iterating it becomes significant (or at least measurable).
I haven't tried to undo the morass of #defines in the iter code.
But I suspect they could be optimised for the common case of
copying an entire single-fragment to/from userspace in one call.
Not related to this code path, but I've some patches that give a
few % speedup for writev() to /dev/null.
That is all about copying the iov[] from user - it doesn't get 'iterated'.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)