Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

From: Jan Kara
Date: Sun Mar 19 2017 - 06:20:01 EST


On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> On Tue, Mar 14, 2017 at 1:03 AM, Filip ÅtÄdronskà <r.lklm@xxxxxxxxxx> wrote:
> > Besause fanotify requires `struct path`, the event cannot be generated
> > directly in `fsnotify_move` and friends because they only get the inode
> > (and their callers, `vfs_rename`&co. cannot supply any better info).
> > So instead it needs to be generated higher in the call chain, i.e. in
> > the callers of functions like `vfs_rename`.
> >
> > This leads to some code duplication. Currently, there are several places
> > whence functions like `vfs_rename` or `vfs_unlink` are called:
> >
> > * syscall handlers (done)
> > * NFS server (done)
> > * stacked filesystems
> > - ecryptfs (done)
> > - overlayfs
> > (Currently doesn't report even ordinary fanotify events, because
> > it internally clones the upper mount; not sure about the
> > rationale. One can always watch the overlay mount instead.)
> > * few rather minor things
> > - devtmpfs
> > (its internal changes are not tied to any vfsmount so it cannot
> > emit mount-scoped events)
> > - cachefiles (done)
> > - ipc/mqueue.c (done)
> > - fs/nfsd/nfs4recover.c (done)
> > - kernel/bpf/inode.c (done)
> > net/unix/af_unix.c (done)
> >
> > (grep -rE '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> >
> > Signed-off-by: Filip ÅtÄdronskà <r.lkml@xxxxxxxxxx>
> >
> > ---
> >
> > An alternative might be to create wrapper functions like
> > vfs_path_(rename|unlink|...). They could also take care of calling
> > security_path_(rename|unlink|...), which is currently also up to
> > the indvidual callers (possibly with a flag because it might not
> > be always desired).
>
> That's an interesting idea. There is some duplicity between security/audit
> hook and fsnotify hooks. It should be interesting to try and deduplicate
> some of this code.

Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.
And at least for NFSD it seems correct they don't call them since you are
running in a context of an NFSD server process and don't have security
context of the process actually issuing the IO. So I'm not sure
"deduplication" is really possible.

However if you can really call fsnotify hooks with 'path' available in all
the places, it should be equally hard to just pass 'path' to
vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
calls into several call sites but keep them local to vfs_(create|mkdir|...)
helpers. Hmm?

>
> > ---
> > fs/cachefiles/namei.c | 9 +++++++
> > fs/ecryptfs/inode.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/namei.c | 23 +++++++++++++++++-
> > fs/nfsd/nfs4recover.c | 7 ++++++
> > fs/nfsd/vfs.c | 24 ++++++++++++++++--
> > ipc/mqueue.c | 9 +++++++
> > kernel/bpf/inode.c | 3 +++
> > net/unix/af_unix.c | 2 ++
> > 8 files changed, 141 insertions(+), 3 deletions(-)
> >
>
> OK, just for comparison, I am going to put here the diff of the sub set of
> my patches that are needed to support fanotify filename events.
>
>
> $ git diff --stat fsnotify_sb..fanotify_dentry
> fs/notify/fanotify/fanotify.c | 94
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
> fs/notify/fanotify/fanotify.h | 25 ++++++++++++++++++++++++-
> fs/notify/fanotify/fanotify_user.c | 92
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
> fs/notify/fdinfo.c | 25 +++----------------------
> fs/notify/inode_mark.c | 1 +
> fs/notify/mark.c | 15 ++++++++++++---
> include/linux/fsnotify_backend.h | 21 ++++++++++++++++-----
> include/uapi/linux/fanotify.h | 41
> +++++++++++++++++++++++++++++++++++------
> 8 files changed, 255 insertions(+), 59 deletions(-)
>
> Yes, it is a bit more code, much mostly because it adds more functionality
> (optionally reporting the filename).
> But most of the code is contained within the fsnotify/fanotify subsystem.

So I'm not that concerned about actual line numbers. They play some role
but they don't tell much about how maintainable the result is in the end.

> The altenative to sprinkle fsnotify_modify_dir() hooks is much less
> maintainable IMO.

Agreed on this.

> I managed to stay a way from cross subsystems changes,
> by allowing to loose some information about the event.
>
> When a filename event is generated (rename|delete|create)
> the path of the parent fd that will be reported to user is NOT
> the actual path that the process executing the operation used,
> but the path from which the watching process has added the mark.

Yeah, and frankly I'm not yet convinced this is a sane thing to do. I'd
much rather propagate path to vfs_(create|mkdir|...) helpers.

> So for example, if you have a bind mount:
> mount -o bind /a/b/c/d/e/f/g /tmp/g
>
> And you add a watch on mount /a
> you *will* get events for create,delete,rename in directory /tmp/g
> but that path in the associated fd with point to /a/b/c/d/e/f/g
>
> Some would claim that it is wrong to report the events
> if they were not originated from the mount were the
> watch was added.
>
> I claim that fanotify filters event by mount not because it
> was a requirement, but because it was an implementation challenge
> to do otherwise.
> And I claim that what mount watchers are really interested in is
> "all the changes that happen in the file system in the area
> that is visible to me through this mount point".
>
> In other words, an indexer needs to know if files were modified\
> create/deleted if that indexer sits in container host namespace
> regardless if those files were modified from within a container
> namespace.
>
> It's not a matter of security/isolation. It's a matter of functionality.
> I agree that for some event (e.g. permission events) it is possible
> to argue both ways (i.e. that the namespace context should be used
> as a filter for events).
> But for the new proposed events (FS_MODIFY_DIR), I really don't
> see the point in isolation by mount/namespace.

I would be *very* careful with such assumptions. People are very inventive
in ways how they can abuse stuff. What I'm most nervous about is that the
(mnt, dentry) pair you create may not be reachable path. When you combine
that with xyz_at(2) syscalls allowing you to traverse directory hierarchy
from fd which you got from fanotify event, things can get really
interesting. So unless there are very clear rules proving this cannot be
misused, I'm against hand-crafting of path structures inside fsnotify.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR