Re: [PATCH] eventfd: convert to ->write_iter()
From: Michal Kubecek
Date: Wed Nov 18 2020 - 18:35:04 EST
On Wed, Nov 18, 2020 at 04:25:37PM -0700, Jens Axboe wrote:
> On 11/18/20 4:18 PM, Michal Kubecek wrote:
> > On Wed, Nov 18, 2020 at 02:27:08PM -0700, Jens Axboe wrote:
> >> On 11/18/20 12:59 PM, Michal Kubecek wrote:
> >>> On Wed, Nov 18, 2020 at 03:18:06PM +0000, Christoph Hellwig wrote:
> >>>> On Wed, Nov 18, 2020 at 10:19:17AM +0100, Michal Kubecek wrote:
> >>>>> While eventfd ->read() callback was replaced by ->read_iter() recently,
> >>>>> it still provides ->write() for writes. Since commit 4d03e3cc5982 ("fs:
> >>>>> don't allow kernel reads and writes without iter ops"), this prevents
> >>>>> kernel_write() to be used for eventfd and with set_fs() removal,
> >>>>> ->write() cannot be easily called directly with a kernel buffer.
> >>>>>
> >>>>> According to eventfd(2), eventfd descriptors are supposed to be (also)
> >>>>> used by kernel to notify userspace applications of events which now
> >>>>> requires ->write_iter() op to be available (and ->write() not to be).
> >>>>> Therefore convert eventfd_write() to ->write_iter() semantics. This
> >>>>> patch also cleans up the code in a similar way as commit 12aceb89b0bc
> >>>>> ("eventfd: convert to f_op->read_iter()") did in read_iter().
> >>>>
> >>>> A far as I can tell we don't have an in-tree user that writes to an
> >>>> eventfd. We can merge something like this once there is a user.
> >>>
> >>> As far as I can say, we don't have an in-tree user that reads from
> >>> sysctl. But you not only did not object to commit 4bd6a7353ee1 ("sysctl:
> >>> Convert to iter interfaces") which adds ->read_iter() for sysctl, that
> >>> commit even bears your Signed-off-by. There may be other examples like
> >>> that.
> >>
> >> A better justification for this patch is that users like io_uring can
> >> potentially write non-blocking to the file if ->write_iter() is
> >> supported.
> >
> > So you think the patch could be accepted with a modified commit message?
> > (As long as there are no technical issues, of course.) I did not really
> > expect there would be so much focus on a justification for a patch which
> > (1) converts f_ops to a more advanced (and apparently preferred)
> > interface and (2) makes eventfd f_ops more consistent.
> >
> > For the record, my original motivation for this patch was indeed an out
> > of tree module (not mine) using kernel write to eventfd. But that module
> > can be patched to use eventfd_signal() instead and it will have to be
> > patched anyway unless eventfd allows kernel_write() in 5.10 (which
> > doesn't seem likely). So if improving the code is not considered
> > sufficient to justify the patch, I can live with that easily.
>
> My point is that improving eventfd writes from io_uring is a win with
> this patch, whereas enabling kernel_write() makes people more nervous,
> and justifiably so as your stated use case is some out of tree module.
>
> So yeah, I'd focus on the former and not the latter, as it is actually
> something I'd personally like to see...
OK, I'll send v2 with a different reasoning in commit message.
Michal Kubecek