Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

From: J. Bruce Fields
Date: Tue Mar 21 2017 - 11:40:25 EST


On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > On Tue, Mar 14, 2017 at 1:03 AM, Filip ÅtÄdronskà <r.lklm@xxxxxxxxxx> wrote:
> > > Besause fanotify requires `struct path`, the event cannot be generated
> > > directly in `fsnotify_move` and friends because they only get the inode
> > > (and their callers, `vfs_rename`&co. cannot supply any better info).
> > > So instead it needs to be generated higher in the call chain, i.e. in
> > > the callers of functions like `vfs_rename`.
> > >
> > > This leads to some code duplication. Currently, there are several places
> > > whence functions like `vfs_rename` or `vfs_unlink` are called:
> > >
> > > * syscall handlers (done)
> > > * NFS server (done)
> > > * stacked filesystems
> > > - ecryptfs (done)
> > > - overlayfs
> > > (Currently doesn't report even ordinary fanotify events, because
> > > it internally clones the upper mount; not sure about the
> > > rationale. One can always watch the overlay mount instead.)
> > > * few rather minor things
> > > - devtmpfs
> > > (its internal changes are not tied to any vfsmount so it cannot
> > > emit mount-scoped events)
> > > - cachefiles (done)
> > > - ipc/mqueue.c (done)
> > > - fs/nfsd/nfs4recover.c (done)
> > > - kernel/bpf/inode.c (done)
> > > net/unix/af_unix.c (done)
> > >
> > > (grep -rE '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> > >
> > > Signed-off-by: Filip ÅtÄdronskà <r.lkml@xxxxxxxxxx>
> > >
> > > ---
> > >
> > > An alternative might be to create wrapper functions like
> > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > security_path_(rename|unlink|...), which is currently also up to
> > > the indvidual callers (possibly with a flag because it might not
> > > be always desired).
> >
> > That's an interesting idea. There is some duplicity between security/audit
> > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > some of this code.
>
> Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.

We don't? E.g. nfsd_unlink calls vfs_unlink which calls
security_inode_unlink().

> And at least for NFSD it seems correct they don't call them since you are
> running in a context of an NFSD server process and don't have security
> context of the process actually issuing the IO.

# ps -eo comm,label|grep nfsd
nfsd system_u:system_r:kernel_t:s0
...

So I guess that's the process context they see.

There is actually a way for the protocol to pass the security context,
in the case it's running over kerberos. So some day this might
optionally start using the context of the client process, for what it's
worth.

--b.

> So I'm not sure
> "deduplication" is really possible.
>
> However if you can really call fsnotify hooks with 'path' available in all
> the places, it should be equally hard to just pass 'path' to
> vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> calls into several call sites but keep them local to vfs_(create|mkdir|...)
> helpers. Hmm?
>
> >
> > > ---
> > > fs/cachefiles/namei.c | 9 +++++++
> > > fs/ecryptfs/inode.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > fs/namei.c | 23 +++++++++++++++++-
> > > fs/nfsd/nfs4recover.c | 7 ++++++
> > > fs/nfsd/vfs.c | 24 ++++++++++++++++--
> > > ipc/mqueue.c | 9 +++++++
> > > kernel/bpf/inode.c | 3 +++
> > > net/unix/af_unix.c | 2 ++
> > > 8 files changed, 141 insertions(+), 3 deletions(-)
> > >
> >
> > OK, just for comparison, I am going to put here the diff of the sub set of
> > my patches that are needed to support fanotify filename events.
> >
> >
> > $ git diff --stat fsnotify_sb..fanotify_dentry
> > fs/notify/fanotify/fanotify.c | 94
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
> > fs/notify/fanotify/fanotify.h | 25 ++++++++++++++++++++++++-
> > fs/notify/fanotify/fanotify_user.c | 92
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
> > fs/notify/fdinfo.c | 25 +++----------------------
> > fs/notify/inode_mark.c | 1 +
> > fs/notify/mark.c | 15 ++++++++++++---
> > include/linux/fsnotify_backend.h | 21 ++++++++++++++++-----
> > include/uapi/linux/fanotify.h | 41
> > +++++++++++++++++++++++++++++++++++------
> > 8 files changed, 255 insertions(+), 59 deletions(-)
> >
> > Yes, it is a bit more code, much mostly because it adds more functionality
> > (optionally reporting the filename).
> > But most of the code is contained within the fsnotify/fanotify subsystem.
>
> So I'm not that concerned about actual line numbers. They play some role
> but they don't tell much about how maintainable the result is in the end.
>
> > The altenative to sprinkle fsnotify_modify_dir() hooks is much less
> > maintainable IMO.
>
> Agreed on this.
>
> > I managed to stay a way from cross subsystems changes,
> > by allowing to loose some information about the event.
> >
> > When a filename event is generated (rename|delete|create)
> > the path of the parent fd that will be reported to user is NOT
> > the actual path that the process executing the operation used,
> > but the path from which the watching process has added the mark.
>
> Yeah, and frankly I'm not yet convinced this is a sane thing to do. I'd
> much rather propagate path to vfs_(create|mkdir|...) helpers.
>
> > So for example, if you have a bind mount:
> > mount -o bind /a/b/c/d/e/f/g /tmp/g
> >
> > And you add a watch on mount /a
> > you *will* get events for create,delete,rename in directory /tmp/g
> > but that path in the associated fd with point to /a/b/c/d/e/f/g
> >
> > Some would claim that it is wrong to report the events
> > if they were not originated from the mount were the
> > watch was added.
> >
> > I claim that fanotify filters event by mount not because it
> > was a requirement, but because it was an implementation challenge
> > to do otherwise.
> > And I claim that what mount watchers are really interested in is
> > "all the changes that happen in the file system in the area
> > that is visible to me through this mount point".
> >
> > In other words, an indexer needs to know if files were modified\
> > create/deleted if that indexer sits in container host namespace
> > regardless if those files were modified from within a container
> > namespace.
> >
> > It's not a matter of security/isolation. It's a matter of functionality.
> > I agree that for some event (e.g. permission events) it is possible
> > to argue both ways (i.e. that the namespace context should be used
> > as a filter for events).
> > But for the new proposed events (FS_MODIFY_DIR), I really don't
> > see the point in isolation by mount/namespace.
>
> I would be *very* careful with such assumptions. People are very inventive
> in ways how they can abuse stuff. What I'm most nervous about is that the
> (mnt, dentry) pair you create may not be reachable path. When you combine
> that with xyz_at(2) syscalls allowing you to traverse directory hierarchy
> from fd which you got from fanotify event, things can get really
> interesting. So unless there are very clear rules proving this cannot be
> misused, I'm against hand-crafting of path structures inside fsnotify.
>
> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR