Re: fsnotify_mark_srcu wtf?

From: Amir Goldstein
Date: Fri Dec 02 2016 - 06:57:20 EST


On Fri, Dec 2, 2016 at 1:41 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Fri, Dec 2, 2016 at 12:48 PM, Jan Kara <jack@xxxxxxx> wrote:
>> On Fri 02-12-16 09:26:51, Miklos Szeredi wrote:
> ...
>>>
>>> Hmm, how about this: when removing mark from inode, drop refcount. If
>>> refcount is zero can remove from list. Otherwise mark the mark "dead"
>>> and leave it on the list.
>>>
>>> And fsnotify can just skip dead marks.
>>
>> I had this idea as well and when trying to implement this, I've stumbled
>> over some problems. I think the biggest problem was that destruction of a
>> notification mark is relatively complex operation (doing iput() for
>> example) and quite a few places dropping mark references are in a context
>> where this can cause problems. Also I don't want to defer iput() to a
>> workqueue as that will have unexpected consequences such as unlinked
>> watched inode lingering in the system (possibly colliding with umount
>> etc.).
>>
>
> I am wondering out loud if we are trying to solve a real problem or a made
> up test case. I wonder if Miklos' test program truly represents the original
> bug report. I am asking because fanotify permission events are usually
> associated with system security software and it usually makes sense on
> a vfsmount_mark and not an inode_mark.
>
> Maybe the break even solution is not to split destroy lists per group priority,
> but to split destroy lists by inode marks and vfsmount marks
> and also keep 2 separate lists per group.
>
> I am only asking this because you mentioned iput as a thorn in the solution.
> Since vfsmount mark does not pin the mount, nor hold an elevated reference,
> perhaps dealing with simpler destruction of vfsmount marks can solve the
> problem for "rogue fanotify permission mount watch" and maybe that is
> enough for all practical matters?
>

And before you comment about the need to merge the inode and vfsmount
lists by priority I'll suggest:

- Check if head of inode mark list is priority 0
If it is, there is no need to merge the lists:
-- first iterate vfsmount list with vfsmount mark srcu
-- then iterate inode list with inode mark srcu
--- if high priority inode mark is found on the list we can either skip it or
process it out of priority order, because it was just added, so we could
have missed it anyway

If inode list head is high priority then resort to old problem, as I said,
this is supposed to be a break even solution for practical use cases.

Amir.