Re: [RFC PATCH 0/7] Inotify support in FUSE and virtiofs

From: Amir Goldstein
Date: Tue Oct 26 2021 - 11:24:13 EST


On Mon, Oct 25, 2021 at 11:47 PM Ioannis Angelakopoulos
<iangelak@xxxxxxxxxx> wrote:
>
> Hello,
>
> I am a PhD student currently interning at Red Hat and working on the
> virtiofs file system. I am trying to add support for the Inotify
> notification subsystem in virtiofs. I seek your feedback and
> suggestions on what is the right direction to take.
>

Hi Ioannis!

I am very happy that you have taken on this task.
People have been requesting this functionality in the past [1]
Not specifically for FUSE, but FUSE is a very good place to start.

[1] https://lore.kernel.org/linux-fsdevel/CAH2r5mt1Fy6hR+Rdig0sHsOS8fVQDsKf9HqZjvjORS3R-7=RFw@xxxxxxxxxxxxxx/

> Currently, virtiofs does not support the Inotify API and there are
> applications which look for the Inotify support in virtiofs (e.g., Kata
> containers).
>
> However, all the event notification subsystems (Dnotify/Inotify/Fanotify)
> are supported by only by local kernel file systems.
>
> --Proposed solution
>
> With this RFC patch we add the inotify support to the FUSE kernel module
> so that the remote virtiofs file system (based on FUSE) used by a QEMU
> guest VM can make use of this feature.
>
> Specifically, we enhance FUSE to add/modify/delete watches on the FUSE
> server and also receive remote inotify events. To achieve this we modify
> the fsnotify subsystem so that it calls specific hooks in FUSE when a
> remote watch is added/modified/deleted and FUSE calls into fsnotify when
> a remote event is received to send the event to user space.
>
> In our case the FUSE server is virtiofsd.
>
> We also considered an out of band approach for implementing the remote
> notifications (e.g., FAM, Gamin), however this approach would break
> applications that are already compatible with inotify, and thus would
> require an update.
>
> These kernel patches depend on the patch series posted by Vivek Goyal:
> https://lore.kernel.org/linux-fsdevel/20210930143850.1188628-1-vgoyal@xxxxxxxxxx/

It would be a shame if remote fsnotify was not added as a generic
capability to FUSE filesystems and not only virtiofs.
Is there a way to get rid of this dependency?

>
> My PoC Linux kernel patches are here:
> https://github.com/iangelak/linux/commits/inotify_v1
>
> My PoC virtiofsd corresponding patches are here:
> https://github.com/iangelak/qemu/commits/inotify_v1
>
> --Advantages
>
> 1) Our approach is compatible with existing applications that rely on
> Inotify, thus improves portability.
>
> 2) Everything is implemented in one place (virtiofs and virtiofsd) and
> there is no need to run additional processes (daemons) specifically to
> handle the remote notifications.
>
> --Weaknesses
>
> 1) Both a local (QEMU guest) and a remote (Host/Virtiofsd) watch on the
> target inode have to be active at the same time. The local watch
> guarantees that events are going to be sent to the guest user space while
> the remote watch captures events occurring on the host (and will be sent
> to the guest).
>
> As a result, when an event occures on a inode within the exported
> directory by virtiofs, two events will be generated at the same time; a
> local event (generated by the guest kernel) and a remote event (generated
> by the host), thus the guest will receive duplicate events.
>
> To account for this issue we implemented two modes; one where local events
> function as expected (when virtiofsd does not support the remote
> inotify) and one where the local events are suppressed and only the
> remote events originating from the host side are let through (when
> virtiofsd supports the remote inotify).

Dropping events from the local side would be weird.
Avoiding duplicate events is not a good enough reason IMO
compared to the problems this could cause.
I am not convinced this is worth it.

>
> 3) The lifetime of the local watch in the guest kernel is very
> important. Specifically, there is a possibility that the guest does not
> receive remote events on time, if it removes its local watch on the
> target or deletes the inode (and thus the guest kernel removes the watch).
> In these cases the guest kernel removes the local watch before the
> remote events arrive from the host (virtiofsd) and as such the guest
> kernel drops all the remote events for the target inode (since the
> corresponding local watch does not exist anymore). On top of that,
> virtiofsd keeps an open proc file descriptor for each inode that is not
> immediately closed on a inode deletion request by the guest. As a result
> no IN_DELETE_SELF is generated by virtiofsd and sent to the guest kernel
> in this case.
>
> 4) Because virtiofsd implements additional operations during the
> servicing of a request from the guest, additional inotify events might
> be generated and sent to the guest other than the ones the guest
> expects. However, this is not technically a limitation and it is dependent
> on the implementation of the remote file system server (in this case
> virtiofsd).
>
> 5) The current implementation only supports Inotify, due to its
> simplicity and not Fanotify. Fanotify's complexity requires support from
> virtiofsd that is not currently available. One such example is
> Fsnotify's access permission decision capabilities, which could
> conflict with virtiofsd's current access permission implementation.

Good example, bad decision.
It is perfectly fine for a remote server to provide a "supported event mask"
and leave permission events out of the game.

Imagine a remote SMB server, it also does not support all of the events
that the local application would like to set.

That should not be a reason to rule out fanotify, only specific
fanotify events.

Same goes to FAN_MARK_MOUNT and FAN_MARK_FILESYSTEM
remote server may or may not support anything other than watching
inode objects, but it should not be a limit of the "remote fsnotify" API.

Thanks,
Amir.