Re: [PATCH] eventfd: convert to ->write_iter()
From: Jens Axboe
Date: Wed Nov 18 2020 - 18:25:41 EST
On 11/18/20 4:18 PM, Michal Kubecek wrote:
> On Wed, Nov 18, 2020 at 02:27:08PM -0700, Jens Axboe wrote:
>> On 11/18/20 12:59 PM, Michal Kubecek wrote:
>>> On Wed, Nov 18, 2020 at 03:18:06PM +0000, Christoph Hellwig wrote:
>>>> On Wed, Nov 18, 2020 at 10:19:17AM +0100, Michal Kubecek wrote:
>>>>> While eventfd ->read() callback was replaced by ->read_iter() recently,
>>>>> it still provides ->write() for writes. Since commit 4d03e3cc5982 ("fs:
>>>>> don't allow kernel reads and writes without iter ops"), this prevents
>>>>> kernel_write() to be used for eventfd and with set_fs() removal,
>>>>> ->write() cannot be easily called directly with a kernel buffer.
>>>>>
>>>>> According to eventfd(2), eventfd descriptors are supposed to be (also)
>>>>> used by kernel to notify userspace applications of events which now
>>>>> requires ->write_iter() op to be available (and ->write() not to be).
>>>>> Therefore convert eventfd_write() to ->write_iter() semantics. This
>>>>> patch also cleans up the code in a similar way as commit 12aceb89b0bc
>>>>> ("eventfd: convert to f_op->read_iter()") did in read_iter().
>>>>
>>>> A far as I can tell we don't have an in-tree user that writes to an
>>>> eventfd. We can merge something like this once there is a user.
>>>
>>> As far as I can say, we don't have an in-tree user that reads from
>>> sysctl. But you not only did not object to commit 4bd6a7353ee1 ("sysctl:
>>> Convert to iter interfaces") which adds ->read_iter() for sysctl, that
>>> commit even bears your Signed-off-by. There may be other examples like
>>> that.
>>
>> A better justification for this patch is that users like io_uring can
>> potentially write non-blocking to the file if ->write_iter() is
>> supported.
>
> So you think the patch could be accepted with a modified commit message?
> (As long as there are no technical issues, of course.) I did not really
> expect there would be so much focus on a justification for a patch which
> (1) converts f_ops to a more advanced (and apparently preferred)
> interface and (2) makes eventfd f_ops more consistent.
>
> For the record, my original motivation for this patch was indeed an out
> of tree module (not mine) using kernel write to eventfd. But that module
> can be patched to use eventfd_signal() instead and it will have to be
> patched anyway unless eventfd allows kernel_write() in 5.10 (which
> doesn't seem likely). So if improving the code is not considered
> sufficient to justify the patch, I can live with that easily.
My point is that improving eventfd writes from io_uring is a win with
this patch, whereas enabling kernel_write() makes people more nervous,
and justifiably so as your stated use case is some out of tree module.
So yeah, I'd focus on the former and not the latter, as it is actually
something I'd personally like to see...
--
Jens Axboe