Re: [RFC][PATCH 0/8] Mount, FS, Block and Keyrings notifications [ver #2]

From: David Howells
Date: Wed Jun 05 2019 - 13:25:37 EST

Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:

> > But there are problems with not sending the event:
> >
> > (1) B's internal state is then corrupt (or, at least, unknowingly invalid).
> Then B is a badly written program.

No. It may have the expectation that it will get events but then it is denied
those events and doesn't even know they've happened.

> > (2) B can potentially figure out that the event happened by other means.
> Then why does it need the event mechanism in the first place?

Why does a CPU have interrupt lines? It can always continuously poll the
hardware. Why do poll() and select() exist?

> > I've implemented four event sources so far:
> >
> > (1) Keys/keyrings. You can only get events on a key you have View permission
> > on and the other process has to have write access to it, so I think this
> > is good enough.
> Sounds fine.
> > (2) Block layer. Currently this will only get you hardware error events,
> > which is probably safe. I'm not sure you can manipulate those without
> > permission to directly access the device files.
> There's an argument to be made that this should require CAP_SYS_ADMIN,
> or that an LSM like SELinux might include hardware error events in
> policy, but generally I agree that system generated events like this
> are both harmless and pointless for the general public to watch.

CAP_SYS_ADMIN is probably too broad a hammer - this is something you might
want to let a file manager or desktop environment use. I wonder if we could
add a CAP_SYS_NOTIFY - or is it too late for adding new caps?

> > (3) Superblock. This is trickier since it can see events that can be
> > manufactured (R/W <-> R/O remounting, EDQUOT) as well as events that
> > can't without hardware control (EIO, network link loss, RF kill).
> The events generated by processes (the 1st set) need controls
> like keys. The events generated by the system (the 2nd set) may
> need controls like the block layer.
> > (4) Mount topology. This is the trickiest since it allows you to see
> > events beyond the point at which you placed your watch (in essence,
> > you place a subtree watch).
> Like keys.
> > The question is what permission checking should I do? Ideally, I'd
> > emulate a pathwalk between the watchpoint and the eventing object to
> > see if the owner of the watchpoint could reach it.
> That will depend, as I've been saying, on what causes
> the event to be generated. If it's from a process, the
> question is "can the active process, the one that generated
> the event, write to the passive, watching process?"
> If it's the system on a hardware event, you may want the watcher
> to have CAP_SYS_ADMIN.
> > I'd need to do a reverse walk, calling
> > inode_permission(MAY_NOT_BLOCK) for each directory between the
> > eventing object and the watchpoint to see if one rejects it - but
> > some filesystems have a permission check that can't be called in this
> > state.
> This is for setting the watch, right?

No. Setting the watch requires execute permission on the directory on which
you're setting the watch, but there's no way to know what permissions will be
required for an event at that point.

I'm talking about when an event is generated (hence "eventing object").
Imagine you have a subpath:


where dir* are directories. If you place a watch on dirA and then an event
occurs on dirB (such as someone mounting on it), I do a walk back up the
parental tree, in the order:

dirE, dirD, dirC, dirB, dirA

If I need to check permissions on all the directories, I would find the
watchpoint on dirA, then I would have to repeat the walk to find out whether
the owner of the watchpoint can access all of those directories (perhaps
skipping dirA since I had permission to place a watchpoint thereon).

Note that this is subject to going awry if there's a race versus rename().

> > It would also be necessary to do this separately for each watchpoint in
> > the parental chain.
> >
> > Further, each permissions check would generate an audit event and
> > could generate FAN_ACCESS and/or FAN_ACCESS_PERM fanotify events -
> > which could be a problem if fanotify is also trying to post those
> > events to the same watch queue.
> If you required that the watching process open(dir) what
> you want to watch you'd get this for free. Or did I miss
> something obvious?

A subtree watch, such as the mount topology watch, watches not only the
directory and mount object you pointed directly at, but the subtree rooted

Take the sample program in the last patch. It places a watch on "/" with no
filter against WATCH_INFO_RECURSIVE, so it sees all mount topology events that
happen under the VFS path subtree rooted at "/" - whether or not it can
actually pathwalk to those mounts.