Re: [PATCH] inotify: support returning file_handles

From: Amir Goldstein
Date: Tue Sep 19 2023 - 23:54:52 EST


On Tue, Sep 19, 2023 at 11:23 PM Max Kellermann
<max.kellermann@xxxxxxxxx> wrote:
>
> This patch adds the watch mask bit "IN_FID". It implements semantics
> similar to fanotify's "FAN_REPORT_FID": if the bit is set in a "struct
> inotify_event", the event (and the name) are followed by a "struct
> inotify_extra_fid" which contains the filesystem id (not yet
> implemented) and the file_handle.
>
> It is debatable whether this is a useful feature; not useful for me,
> but Jan Kara cited this feature as an advantage of fanotify over
> inotify:
> https://lore.kernel.org/linux-fsdevel/20230919100112.nlb2t4nm46wmugc2@quack3/
>
> In a follow-up, Amir Goldstein did not want to accept that this
> feature could be added easily to inotify, telling me that I "do not
> understand the complexity involved":
> https://lore.kernel.org/linux-fsdevel/CAOQ4uxgG6ync6dSBJiGW98docJGnajALiV+9tuwGiRt8NE8F+w@xxxxxxxxxxxxxx/
>
> .. which Jan Kara agreed with: "I'll really find out it isn't so easy"
> https://lore.kernel.org/linux-fsdevel/20230919132818.4s3n5bsqmokof6n2@quack3/
>
> So here it is, an easy implementation in less than 90 lines of code
> (which is slightly incomplete; grep TODO). This is a per-watch flag
> and there is no backwards compatibility breakage because the extra
> struct is only added if explicitly requested by user space.
>
> This is just a proof-of-concept, to demonstrate that adding such a
> feature to inotify is not only possible, but also easy. Even though
> this part of the kernel is new to me, apparently I do "understand the
> complexity involved", after all.
>

A bit more humility couldn't hurt. From both you and I.
Obviously, reporting the fhandle in not the complexity we were
talking about. Looks like you are a fast learner, so maybe one day
you wili dive into fanotify code and see the improvements over inotify.

But your basic point that inotify could have been extended is true.
We wanted to extend functionality in the kernel for filesystem watches [1].
We could have done inotify 2.0 (extend inotify).
We could have done fsnotify 1.0 (i.e. new syscalls).
We decided to go with fanotify 2.0, because the fanotify
syscalls and event format already had all the fields needed to
provide the extended API.

It was an arbitrary design choice. It is futile to argue about it.

The main difference between inotify and fanotify APIs which concerns
your complaint is the watch descriptor - a kernel map of unique inode
objects.

Filesystem watches drove the need for reporting fhandles.
I don't have the time to explain why reporting wd/fd in this case
is not a good idea.

It was a design decision to keep the API consistent and uniform
and report the same information for inode watches as well.

Ignoring the functionality gap that needs to be closed between
fanotify -> inotify for the discussion, the remaining part that you like
about inotify is the integer wd descriptor, which makes your life easier.

I am not saying that wd is not useful.
I am not saying that it will not be considered useful enough to keep it
around when the time comes to consider deprecating inotify.

What I am saying is that the inode already has a unique identifier -
more than one even.
For the context of pinned inodes (as in inotify watches), st_dev/st_ino
is unique and fixed size.
fsid/fhandle is variable size (practically fixes size per fs type), but has
the advantage that the identity is persistent so applications can use it to
store change tracking information in databases and to resolve uptodate
path of objects.

Again, it was a design decision to use the stronger fid always.
I have probably offered more than once to implement
FAN_REPORT_DEV_INO, which simplifies things a bit
(maybe not enough for you ;)) and closes the gap of supporting
all filesystems.
For now, we decided to try and stick with FAN_REPORT_FID
for uniformity and add support for fs that do not currently report
fsid/fhandle.

> I don't expect this to be merged. As much as I'd like to see inotify
> improved because fanotify is too complex and cumbersome for my taste,
> I'm not deluded enough to believe this PoC will convince anybody. But
> hacking it was fun and helped me learn about the inotify code which I
> may continue to improve in our private kernel fork.
>

Learning is always good.

Thanks,
Amir.

[1] https://github.com/amir73il/fsnotify-utils/wiki/filesystem-mark